Skip to content

Releases: AmpereComputingAI/llama.cpp

v3.4.2

30 Jan 14:22
343626c

Choose a tag to compare

Based on ggml-org/llama.cpp b7772 (https://github.com/ggml-org/llama.cpp/releases/tag/b7772)

Also available at: DockerHub

v3.4.1

22 Dec 23:08

Choose a tag to compare

Based on ggml-org/llama.cpp b7286 (https://github.com/ggml-org/llama.cpp/releases/tag/b7286)

  • Automatic Flash Attention selection feature (-fa auto). See ampere.md for more details.

Also available at: DockerHub

v3.4.0

26 Nov 12:51
9af0385

Choose a tag to compare

Based on ggml-org/llama.cpp b6735 (https://github.com/ggml-org/llama.cpp/releases/tag/b6735)

  • Flash Attention for SWA models fixed
  • New Flash Attention algorithm. It is optimized for long contexts (above 1024). See
    "Flash Attention algorithm selection" section for details how to select attention algorithm
    manually.

Also available at: DockerHub

v3.3.1

15 Oct 16:32
6219c16

Choose a tag to compare

Also available at: DockerHub

v3.3.0

09 Oct 12:54
6219c16

Choose a tag to compare

Also available at: DockerHub

v3.2.1

03 Sep 10:24
ecbcf6e

Choose a tag to compare

Also available at: DockerHub

v3.2.0

06 Aug 21:39
ecbcf6e

Choose a tag to compare

Also available at: DockerHub

v3.1.2

07 Jul 12:40
aa0a5d7

Choose a tag to compare

Also available at: DockerHub

v3.1.0

11 Jun 21:21
aa0a5d7

Choose a tag to compare

Also available at: DockerHub

v2.2.1

03 Jun 15:44
aa0a5d7

Choose a tag to compare

Update benchmark.py