GitHub

Collection of toy implementations of memory efficient attention.

All numpy implementations are in attn.py, and there's a cuda implementation in attn_chunk_q_chunk_kv_cuda/attn_chunk_q_chunk_kv_kernel.cu

cd attn_chunk_q_chunk_kv_cuda
python setup.py install

No formal testing framework, just run

python test.py

If no assertions are thrown, tests pass :)

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
attn_chunk_q_chunk_kv_cuda		attn_chunk_q_chunk_kv_cuda
.gitignore		.gitignore
README.md		README.md
attn.py		attn.py
requirements.txt		requirements.txt
test.py		test.py

Provide feedback