Releases: Nexesenex/croco.cpp
Croco.Cpp_v1.99003_b6387-75_IKLpr642_RMv1.17.291m
WIP - I'm trying to update Croco.Cpp (the version supporting the second, third, and trellis gen of IQ_K quants).
A lot of LCPP commits are lost in the process, but well, can't do better than that in a reasonable time.
I had to ditch a lot of Cuda, KV cache and Graph updates, as well as GLM4.5, GPT OSS, and Nemotron v2 to move forward. Maybe I'll get GPT-OSS back from IK_Llama, Maybe.
For a more reliable and up to date fork of Kobo, very close to Esobold but with a some perks of Croco, use EsoCrok (available in the GitHub Actions), but lose the aforementioned IK Quants.
Full Changelog: v1.97060_b6110_IKLpr642_RMv1.14.9m...CCPP_v1.99003_b6387-75_IKLpr642_RMv1.17.291m
EsoCroK v1.99420_b6636-6_Q6-IQ23456K_RMv1.17.99m
EsoCroK v1.99410_b6609-6_Q6-IQ23456K_RMv1.17.99m
Nothing special here, just an updated version of the previous release.
Cuda 12.9 (Ampere tested, also compiled for Maxwell, Pascal and Turing).
JG's recent works on Cuda FA is not included, to retain compatibility with Q6_0 and IQ_K quants, until I eventually find the missing bit of code to make everything work as it should.
Linux : https://github.com/Nexesenex/croco.cpp/actions/runs/18084083530/artifacts/4128237663
EsoCroK v1.98035_b6178_RMb1.15.91m
Beyond concedo's work and the new LCPP commits up to b6178..
- a fix for the SWA ago I borked I don't know how.
- More FA KV cache submodes formally activated in the Cuda FA files.
- Some little clean-up of my added code (in koboldcpp.py, makefile)
- PDF features activated because they can be compiled in the workflows (not on my machine).
Download directly in workflows : https://github.com/Nexesenex/croco.cpp/actions
The Windows build should work.
The Linux ones are to be tested, I don't run this system.
EsoCroK v1.98020_b6150_RMb1.15.91m
Nothing new, except Concedo's and Jaxxks's last work, and LCPP b6150
Builds :
Without extra KV cache :
Windows old pc : https://github.com/Nexesenex/croco.cpp/actions/runs/16952122435/artifacts/3760814737
Linux old pc : https://github.com/Nexesenex/croco.cpp/actions/runs/16952128325/artifacts/3760665227
As well as the linux build down there.
With extra KV cache (down there) :
CPU build (includes Vulkan)
Cuda build.
EsoCroK v1.98000_b6123_RMb1.15.9m
Ntohing new compared to last, beyond Concedo's additional work.
EsoCroK v1.97300_b6123_RMb1.15.9m
Added:
- Support for IQ_K quants IQ2_K, IQ3_K, IQ4_K, IQ5_K, IQ6_K on CPU and CUDA, including Cuda MMQ Kernels.
The CPU release also contains the CLBlast and Vulkan backends, which are not updated with Q6_0 and the IQ_K quants.
EsoCroK v1.97200_b6119_RMv1.14.9m
Initial release of EsoCroK.
A rebase of Croco on the last Esobold version (up to the 20270807), with the basics of Croco.
- IQ4_NL activated for KV Cache.
- IK's Q6_0 integrated, and adapted as best as I could for the CUDA backend. (I think it works lol).
- 20 or so KV modes, including those reliant on IQ4_NL and Q6_0, both for the main model and a draft model.
- A few optimisations (mostly IK's).
- Vast range of context steps in GUI.
- Loosened GGUF restrictions.
- Some half-baked additional chat templates and prompts.
- And most importantly, a plain compatibility with GLM 4.5 Air and OpenAI GPT OSS.
Reason for this alternative Croco:
- Too much mess in my previous merges to come back in line to offer compatibility for GLM 4.5 and OpenAI OSS in a reasonable amount of time.
- Q6_0 is the most important missing quant in Llama Mainline, to quantize the ffn_down of Qwen and GLM 4.5 with a high quality/size ratio, and the various KV Quants are the most interesting feature of my fork after the compatibility with IK_Quants.
- Too much bugs for the sane user to handle, the code needed a purge.
- Recover the long-lost compatibility with the different backends, including makefile builds, HIP and Vulkan.
What's next?
- The first gen of IQ_K Quants (their template is similar to the mainline quant, the main job is to factor properly the CUDA MMQ Kernel beyond the shuffle of the files (Croco is still a viable base for that par). Can do, maybe will.
- The second gen of IQ_K Quants and the Trellis quants have a slightly modified template, and I might need the help of a dev familiar with Johannes Gaessler or Ik's work to port ONE 2nd gen IQ_K quant (preferably IQ4_KS) and ONE Trellis quant (preferably IQ2_KT) to the current llama.cpp mainline so I'm able to reproduce the port on the others.
OR.
- Keep reworking my Croco despite the growing delta with both mainline and IK_Llama, the fate of an Hybrid.
Didn't decide myself yet.
Anyway, enjoy EsoCroK!
Croco.Cpp v1.97060_b6110_IKLpr642_RMv1.14.9m
WIP, as usual.
- GLM 4.5 Air (and probably the 355b) works, at least for the mainline llama.cpp quants. The IK quants do not work yet properly on this model, I need to sort this out later.
- No GPT-OSS yet, I need to manually merge due to the divergences between my fork and llama.cpp mainline in the CPU and CUDA backends. For later, so.
Note : Apparently, even the GLM mainline quants is borked on Croco at high context. To be investigated.
Croco.Cpp v1.97020_b6014_IKLpr624_RMv1.14.9m
Croco.Cpp v1.97020_b6014_IKLpr624_RMv1.14.9m