autoattn

Automatic routing between attention backends for LLMs/VLMs.

Install

pip install -e .

Usage

import torch
from autoattn import AutoAttention

attn = AutoAttention(d_model=256, num_heads=8, causal=True)

q = torch.randn(2, 128, 256)
k = torch.randn(2, 128, 256)
v = torch.randn(2, 128, 256)

out = attn(q, k, v)  # Automatically picks best backend

Backends

Backend	When Used	Memory	Exact?
`dense`	CPU, fallback	O(N²)	✅
`flash`	GPU, seq ≤ 2048	O(N)	✅
`local`	GPU, seq > 4096, memory mode	O(N·W)	❌

Modes

# Auto (default) - picks based on device/seq length
AutoAttention(d_model=256, num_heads=8, mode="auto")

# Performance - prefer flash on GPU
AutoAttention(d_model=256, num_heads=8, mode="performance")

# Memory - prefer local/sparse
AutoAttention(d_model=256, num_heads=8, mode="memory")

Requirements

Python ≥ 3.9
PyTorch ≥ 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.github/workflows		.github/workflows
autoattn.egg-info		autoattn.egg-info
autoattn		autoattn
examples		examples
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

autoattn

Install

Usage

Backends

Modes

Requirements

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

autoattn

Install

Usage

Backends

Modes

Requirements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages