Prompt Cache: Modular Attention Reuse for Low-Latency Inference #384

Open

Labels

llm algo+sys co-designllm must read

opened

https://arxiv.org/abs/2311.04934

Metadata

Assignees

No one assigned

Labels

llm algo+sys co-designllm must read

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests