AttnCache: Accelerating Self-Attention Inference for LLM Prefill via Attention Cache

Usage

Environment Setup

conda create -n AttnCache python=3.9
conda activate AttnCache

pip install torch torchvision torchaudio
pip install transformers==4.50.3 accelerate datasets scikit-learn scipy matplotlib faiss-cpu  
pip install auto-gptq optimum bitsandbytes

Download MMLU dataset

wget https://people.eecs.berkeley.edu/~hendrycks/data.tar

tar -xf data.tar

Quick Start

Collect Hidden States and Attention Maps

python collect_hs_apms_llama.py --model-path meta-llama/Llama-3.2-3B-Instruct

Train Feature Projector and Build Index DB

python train_fp_and_build_db.py --epoch 3 --batchsize 32

Evaluation

python test_llama.py --threshold 0.995

Citation

If you find AttnCache useful or relevant to your project and research, please kindly cite our paper:

@article{song2025attncache,
        title={AttnCache: Accelerating Self-Attention Inference for LLM Prefill via Attention Cache},
        author={Song, Dinghong and Feng, Yuan and Wang, Yiwei and Chen, Shangye and Guyot, Cyril and Blagojevic, Filip and Jeon, Hyeran and Su, Pengfei and Li, Dong},
        journal={arXiv},
        year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
figure		figure
models		models
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
categories.py		categories.py
collect_hs_apms_bert.py		collect_hs_apms_bert.py
collect_hs_apms_llama.py		collect_hs_apms_llama.py
get_length.py		get_length.py
run_all.sh		run_all.sh
test_bert.py		test_bert.py
test_llama.py		test_llama.py
train_bert_sst2.py		train_bert_sst2.py
train_fp_and_build_db.py		train_fp_and_build_db.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AttnCache: Accelerating Self-Attention Inference for LLM Prefill via Attention Cache

Usage

Environment Setup

Download MMLU dataset

Quick Start

Collect Hidden States and Attention Maps

Train Feature Projector and Build Index DB

Evaluation

Citation

About

Uh oh!

Releases

Packages

Languages

License

dinghongsong/AttnCache

Folders and files

Latest commit

History

Repository files navigation

AttnCache: Accelerating Self-Attention Inference for LLM Prefill via Attention Cache

Usage

Environment Setup

Download MMLU dataset

Quick Start

Collect Hidden States and Attention Maps

Train Feature Projector and Build Index DB

Evaluation

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages