FreqKV

📄 Paper:
FreqKV: Key-Value Compression in Frequency Domain for Context Window Extension

🏆 Status:
Accepted to ICLR 2026

FreqKV is an efficient context extension method that iteratively compresses key-value states in the frequency domain.

Setup

Run the following code:

pip install -r requirements.txt
pip install flash-attn --no-build-isolation

Training

Fine-tuning

bash train-flash.sh

Please remember to change path_to/Llama-2-7b-hf, path_to_saving_checkpoints, path_to_cache to your own directory.
Note that you can change model_max_length to other values.
You could change ds_configs/stage2.json to ds_configs/stage3.json if you want.
When training is finished, to get the full model weight:

cd path_to_saving_checkpoints && python zero_to_fp32.py . pytorch_model.bin

Note that the path_to_saving_checkpoints might be the global_step directory, which depends on the deepspeed versions.

Supervised Fine-tuning

bash sft.sh

Our long instruction following data can be found in LongAlpaca-16k-length.json.

Merge LoRA Weight

In low-rank training, we set embedding and normalization layers as trainable. Please use the following line to extract the trainable weights trainable_params.bin from pytorch_model.bin.

Merge the LoRA weights of pytorch_model.bin and trainable parameters trainable_params.bin, save the resulting model into your desired path in the Hugging Face format:

bash merge.sh

Evaluation

Perplexity Validation

bash test.sh

Note that --seq_len is to set the sequence length for evaluation. --context_size is to set the context length of the model during fine-tuning.
We provide tokenized the validation and test splits of PG19 and proof-pile dataset in data/pg19/validation.bin, data/pg19/test.bin, and data/proof-pile/test_sampled_data.bin, with the tokenizer of LLaMA. data/proof-pile/test_sampled_data.bin contains 128 documents that are randomly sampled from the total proof-pile test split. For each document, it has at least 32768 tokens. You can also use the sampled ids from LongLoRA

Downstream tasks

We follow instructions of LongBench, OpenCompass and KVCache-Factory to evaluate the performance of FreqKV on LongBench, RULER and Needle-in-a-Haystack.

Citation

If you find this work useful, please cite:

@article{freqkv2026,
  title     = {FreqKV: Key-Value Compression in Frequency Domain for Context Window Extension},
  author    = {Jushi Kai, Yixuan Wang, Boyi Zeng, Haoli Bai, Bo Jiang, Ziwei He, Zhouhan Lin},
  journal   = {International Conference on Learning Representations (ICLR)},
  year      = {2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
ds_configs		ds_configs
README.md		README.md
eval.py		eval.py
fine-tune.py		fine-tune.py
get_trainable_weights.py		get_trainable_weights.py
llama_attn_replace_dct_mempe.py		llama_attn_replace_dct_mempe.py
merge.sh		merge.sh
merge_lora_weights_and_save_hf_model.py		merge_lora_weights_and_save_hf_model.py
pipeline.jpg		pipeline.jpg
requirements.txt		requirements.txt
sft.sh		sft.sh
supervised-fine-tune.py		supervised-fine-tune.py
test.sh		test.sh
train-flash.sh		train-flash.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FreqKV

Setup

Training

Fine-tuning

Supervised Fine-tuning

Merge LoRA Weight

Evaluation

Perplexity Validation

Downstream tasks

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FreqKV

Setup

Training

Fine-tuning

Supervised Fine-tuning

Merge LoRA Weight

Evaluation

Perplexity Validation

Downstream tasks

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages