GitHub - ai-bond/flash-attention-v100: Implementation of FlashAttention-2 for Nvidia Tesla V100

FlashAttention for unsupported Tesla v100

This repository want to implement the official implementation of FlashAttention and FlashAttention-2 under unsupported in TriDao repo Nvidia Tesla V100

This repo is attempt to build flash attention from scratch without "Vibe Code" for self education.

According to Nvidia Deprecated Architectures:

Architecture support for Volta is considered feature-complete. Offline compilation and library support for these architectures have been removed in CUDA Toolkit 13.0 major version release.

Last one available CUDA for Volta:

# Download package
wget https://developer.download.nvidia.com/compute/cuda/12.9.1/local_installers/cuda_12.9.1_575.57.08_linux.run

# Install
sudo sh cuda_12.9.1_575.57.08_linux.run

# Export and apply
echo -e 'export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH\nexport PATH=/usr/local/cuda/bin:$PATH' >> ~/.bashrc
source ~/.bashrc

This cuda package with NVIDIA driver version 575.57.08 that can be installed together at once.

Deployment and compilation

# Create new python virtual env or use own existed:
python -m venv env
source env/bin/activate

# Update pip
pip install --upgrade pip

# Clone code and install packages:
git clone https://github.com/ai-bond/flash-attention-v100/
cd ./flash-attention-v100

# Install req packages
pip install -r requirements.txt

As NVIDIA deprecated Volta support in CUDA since viersion 13 then PyTorch also restrict and deprecated support in new versions.

PyTorch is dropping Volta support from CUDA-12.8 binaries for release 2.11

PyTorch [release 2.8-2.9] delete support for Maxwell, Pascal, and Volta architectures for CUDA 12.8 and 12.9 builds

# Install last one PyTorch that's support with 12.9 CUDA
pip install torch==2.10.0+cu129 --index-url https://download.pytorch.org/whl/cu129

# Check is package supports Volta
python -c "import torch; p=torch.cuda.get_device_properties(0); print(f'{p.name} SM {p.major}.{p.minor} supported')"

If you will see Tesla V100-XXX-XXGB SM 7.0 supported then all done and we can compile and install project with just:

./run.sh 

or 

pip install . --no-build-isolation -v

Also after

Successfully built flash_attn_v100
Installing collected packages: flash_attn_v100
Successfully installed flash_attn_v100-XX.XX

# just check exactly flash_attn import thru

python -c 'import flash_attn; print(f"Version: {flash_attn.__doc__}")'
Should: Flash Attention for Tesla V100 v2.8.3

and

pip show flash_attn
Name: flash-attn
Version: 2.8.3
Summary: Flash Attention for Tesla V100

And gl and hf :)

Name		Name	Last commit message	Last commit date
Latest commit History 99 Commits
benchmarks		benchmarks
docs		docs
flash_attn		flash_attn
flash_attn_v100		flash_attn_v100
include		include
kernel		kernel
.gitignore		.gitignore
AUTHORS		AUTHORS
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run.sh		run.sh
setup.py		setup.py
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FlashAttention for unsupported Tesla v100

This repo is attempt to build flash attention from scratch without "Vibe Code" for self education.

Last one available CUDA for Volta:

Deployment and compilation

About

Uh oh!

Releases 4

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FlashAttention for unsupported Tesla v100

This repo is attempt to build flash attention from scratch without "Vibe Code" for self education.

Last one available CUDA for Volta:

Deployment and compilation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Contributors

Uh oh!

Languages