Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
345 commits
Select commit Hold shift + click to select a range
d9904a3
Update to cudarc 0.14 (breaking change). (#2858)
LaurentMazare Apr 3, 2025
648596c
Added readmes to examples (#2835)
greenrazer Apr 3, 2025
9d31361
Fix for clippy 1.86. (#2864)
LaurentMazare Apr 3, 2025
cf9d7bf
Add the CSM model. (#2862)
LaurentMazare Apr 4, 2025
bc33df7
Add the missing voices for CSM. (#2867)
LaurentMazare Apr 5, 2025
338f6a1
Clippy 1.86 fixes for cuda. (#2868)
LaurentMazare Apr 5, 2025
e3370c6
Add the SNAC audio tokenizer. (#2869)
LaurentMazare Apr 6, 2025
2f3bf42
Support more snac variants. (#2871)
LaurentMazare Apr 7, 2025
d339b01
Fix hardcoded f32 dtype for attention_mask. Use the model dtype for c…
msminhas93 Apr 8, 2025
eb478ec
Implementing DistilBertForMaskedLM. (#2866)
greenrazer Apr 11, 2025
acc5bd3
Cuda cleanup. (#2880)
LaurentMazare Apr 11, 2025
19fb6da
Bump the crate version. (#2881)
LaurentMazare Apr 11, 2025
d7b7ce1
Upgrade ug. (#2882)
LaurentMazare Apr 12, 2025
34505fd
Avoid using batched-matmul in nn::Linear. (#2883)
LaurentMazare Apr 12, 2025
15ed0b1
Optimize the batched matmul for the cpu backend. (#2884)
LaurentMazare Apr 12, 2025
d9198de
Im2col cuda optimization. (#2885)
LaurentMazare Apr 13, 2025
b44d38d
Add the Orpheus TTS. (#2886)
LaurentMazare Apr 13, 2025
f3a73f8
Support for cudnn conv1d. (#2888)
LaurentMazare Apr 13, 2025
2f9606b
Exclude candle-book to avoid some CI failures. (#2889)
LaurentMazare Apr 13, 2025
fb660b8
Add a cudnn feature to candle-nn/candle-transformers. (#2890)
LaurentMazare Apr 13, 2025
a52b76a
Expose the cudnn algo in the conv ops. (#2892)
LaurentMazare Apr 14, 2025
2653002
Gumbel-Softmax sampling. (#2894)
LaurentMazare Apr 14, 2025
1d1d6d4
Bump the crate version. (#2895)
LaurentMazare Apr 14, 2025
b01ebba
Use cudarc 0.15.2. (#2896)
LaurentMazare Apr 14, 2025
e4e7b0b
Use cudarc 0.16. (#2900)
LaurentMazare Apr 15, 2025
76e565c
Updated candle-book: Introduction, Installation, MNIST guide, and add…
greenrazer Apr 15, 2025
7f0f83a
Rotating kv cache positions (#2901)
LaurentMazare Apr 15, 2025
9954981
Allow from_vec/from_slice to use a ShapeWithOneHole as shape. (#2905)
LaurentMazare Apr 17, 2025
ce5f8dd
Check the bounds in the cuda indexing kernels. (#2908)
LaurentMazare Apr 18, 2025
9dbaf95
Add an enum for scalar values. (#2909)
LaurentMazare Apr 18, 2025
21055b5
Add PRelu operation (#2904)
A2va Apr 19, 2025
b2904a8
implemented quantized-gemma3 (#2902)
greenrazer Apr 19, 2025
a4c56a9
Add the const-set op. (#2910)
LaurentMazare Apr 19, 2025
99bd69f
fixed quantized-gemma example (#2914)
greenrazer Apr 23, 2025
82def7a
Cudarc update. (#2915)
LaurentMazare Apr 23, 2025
6ff0a69
Fixed Gemma3 model and example (#2917)
greenrazer Apr 25, 2025
3aeb957
Fixed Quantized Gemma3 Model and example (#2918)
greenrazer Apr 25, 2025
3827685
Add the scatter op. (#2921)
LaurentMazare Apr 25, 2025
a2e9254
Add the scatter in place ops. (#2923)
LaurentMazare Apr 26, 2025
fbaf0b0
Bump the crate version to 0.9.0. (#2924)
LaurentMazare Apr 26, 2025
6e0646c
Remove redundant mlx gemm dtype check (#2925)
ivarflakstad Apr 27, 2025
e3db300
Support for "unbatched" rope. (#2926)
LaurentMazare Apr 27, 2025
e98754f
Optimize Tensor::new when called on nested Vec<..>. (#2927)
LaurentMazare Apr 28, 2025
d4bac37
Fix the gumbel softmax by casting to f32. (#2928)
LaurentMazare Apr 28, 2025
de23d34
Switch Tensor::full to return a contiguous tensor. (#2929)
LaurentMazare Apr 28, 2025
5029ac5
Added tracing page to the candle book. (#2922)
greenrazer Apr 29, 2025
38fc866
Add support for Helium-v1. (#2932)
LaurentMazare Apr 30, 2025
8a19bb7
Bump the candle version to 0.9.1. (#2935)
LaurentMazare May 1, 2025
cd96fa8
Add a scattered kv cache. (#2936)
LaurentMazare May 1, 2025
66be13b
fixed quantized_phi3 implementation
ljt019 May 1, 2025
1fdfb58
Updating `Add qwen3` (PR 2903) to use HF weights (#2930)
greenrazer May 2, 2025
e27b470
Indexing with max-value results in zero/no-op. (#2940)
LaurentMazare May 3, 2025
637473c
Bump cudarc to 0.16.3. (#2942)
LaurentMazare May 4, 2025
3d05f5c
Qwen3 quantized implementation (#2939)
ljt019 May 8, 2025
36508a2
Add Resize to onnx ops (#2946)
greenrazer May 10, 2025
485ddf2
Fixed Quantized Qwen3 Model (#2951)
nosnakeob May 13, 2025
6bd6172
Make tensor contiguous before the repeat_kv calls to avoid strided co…
b0r3k May 14, 2025
450a49e
Olmo 2 model (#2954)
janimo May 14, 2025
9ce4fe6
Fix docs quantized qwen3 (#2955)
maximizemaxwell May 15, 2025
92106c8
Fixes for clippy 1.87. (#2956)
LaurentMazare May 15, 2025
9a62c91
Proper support for phi-4 (#2960)
LaurentMazare May 21, 2025
61ddb95
Use a tanh activation in the xlm-roberta classification head. (#2968)
LaurentMazare May 26, 2025
cac51fe
(hotfix) fix the doc test for indexer (#2970)
klion26 May 28, 2025
1a183c9
Add fine-tuned text classifier to xlm roberta example (#2969)
jpe90 May 28, 2025
5aed817
feat: enhance linear algebra operations (#2972)
ssfdust May 29, 2025
cd7b877
candle-onnx: Implement Trilu and ScatterND ops (#2952)
greenrazer May 30, 2025
0224a74
Add Qwen3 MoE (#2934)
greenrazer May 31, 2025
17313a4
Fix cuda memory error for Qwen3 non-quantized (#2987)
akshayballal95 Jun 7, 2025
407c667
candle-onnx: Implement RNN operator (#2964)
BrunoSienkiewicz Jun 24, 2025
23968db
Fix typos (#2958)
omahs Jun 24, 2025
2e5dbc7
candle-onnx: Implement Hard Swish operator (#2980)
Michall00 Jun 24, 2025
a6e8aae
fixed errors with hardswish merge (#3006)
greenrazer Jun 26, 2025
0cd4fc4
Fixed Failing CI (#3007)
greenrazer Jun 26, 2025
ab14581
Qwen3: fix quality loss due to rope freq precision (#3005)
zackangelo Jun 26, 2025
d0a3b33
fixed ring mac error (#3008)
greenrazer Jun 27, 2025
317a3ae
Support new arch of GLM4 models (#2991)
guoqingbao Jul 7, 2025
be411aa
candle-onnx: Implement One Hot operator (#2979)
Michall00 Jul 7, 2025
9c8a02f
fix (candle-datasets): re-export FileReader and simplify from_hub ite…
xavierforge Jul 16, 2025
16b7b77
candle-datasets: add fashion-mnist (#3021)
slckl Jul 16, 2025
1f07074
candle-onnx: Implement Selu operator (#2978)
Michall00 Jul 16, 2025
6c95317
fix: DAC model prefix (#3020)
piedshag Jul 17, 2025
1ef1341
*Major T/s improvement* Use the Metal qmatmul MM kernels (#2615)
EricLBuehler Jul 18, 2025
42bd33e
Fix discord badge (#3033)
strickvl Jul 23, 2025
da5498c
Added GradStore::insert_id(id, grad)
NoodlesOfWrath Jul 29, 2025
26a3222
Support building on CPUs with AVX but not AVX2 (#3040)
jncraton Jul 31, 2025
21032cb
[FEAT] Voxtral Support (#3036)
jorge-menjivar Aug 4, 2025
96415a4
ignored url that was interpreted as a secret by trufflehog (#3046)
greenrazer Aug 4, 2025
af5a69e
fp8 support (#2989)
zackangelo Aug 4, 2025
86bcf1e
Load safetensors i8 (#3042)
chadvoegele Aug 5, 2025
1829812
Fix sort kernel launch bug when nrows exceed gridDim.y limit (65535) …
guoqingbao Aug 11, 2025
be4f920
clippy fixes (#3053)
greenrazer Aug 12, 2025
d7c5c8a
Add timestamp rules and constraints to decoder in Whisper example (#3…
rsb-tbg Aug 18, 2025
f1286e6
Fix wasm build by enabling getrandom wasm_js backend (#3055)
lucky-bai Aug 18, 2025
16e1d73
pick seed <= u32::MAX when using metal (#3045)
rgbkrk Aug 20, 2025
730fa9c
Fix broken slice_scatter example in basics.rs
davenpi Aug 21, 2025
5d6407f
Run cargo fmt on basics.rs
davenpi Aug 22, 2025
98c64c0
Metal device.set_seed full u64 support (#3067)
ivarflakstad Aug 25, 2025
03e9ce0
disable affine fp8 bench on metal as it is not supported yet (#3065)
ivarflakstad Aug 25, 2025
02cf3eb
Bench using chosen device only (#3066)
ivarflakstad Aug 26, 2025
fd350c4
Fixes metal randn determinism. Ensure we use the 2 atomic_uints buffe…
ivarflakstad Aug 27, 2025
bf82629
build: Make build.rs candle-kernels compatible with Nix and sandboxed…
joeldsouzax Aug 28, 2025
06387ae
[Metal] update to objc2_metal (#3064)
ivarflakstad Aug 29, 2025
d4a9179
Fused CPU attention kernels (~4x performance increase) (#2973)
EricLBuehler Aug 29, 2025
41b1e95
Fix typos
szepeviktor Aug 30, 2025
93845ed
Merge pull request #3072 from szepeviktor/typos
ivarflakstad Aug 30, 2025
390b87a
Fix iOS app store validation issues (#3071)
greenrazer Sep 3, 2025
402782c
Merge pull request #3038 from NoodlesOfWrath/gradstore_insert_id
ivarflakstad Sep 6, 2025
f62e725
clean candle-core typos.
zhanluxianshen Sep 7, 2025
0bbf9c7
Ensure metal tensors are send/sync via thread isolated command buffer…
ivarflakstad Sep 8, 2025
3b35cfc
Update kv_cache.rs (#3035)
jhqxxx Sep 8, 2025
0cf516d
[Metal] Refactor (#3070)
ivarflakstad Sep 8, 2025
87fadf6
Merge pull request #3077 from zhanluxianshen/typo-candle-core
ivarflakstad Sep 8, 2025
0950959
Fix metal exports (#3081)
ivarflakstad Sep 8, 2025
a7fbc63
Merge branch 'main' into metal-tensor-fix-send-sync
ivarflakstad Sep 9, 2025
65055f6
Merge pull request #3079 from huggingface/metal-tensor-fix-send-sync
ivarflakstad Sep 9, 2025
b1dbce0
Merge pull request #3062 from davenpi/fix/core-basics-example
ivarflakstad Sep 9, 2025
8045af9
Add CUDA 13 support (#3078)
jfernandez Sep 9, 2025
97594d2
Fix indentation
ivarflakstad Sep 9, 2025
038e28b
Fix indentation (ok but for real)
ivarflakstad Sep 9, 2025
372c9cf
Merge pull request #2937 from ljt019/fix-phi3-kv-cache-reset
ivarflakstad Sep 9, 2025
41a674c
add impl for mish activation function (#3051)
oa-root Sep 12, 2025
dd12467
Upgrade ug dep for CUDA 13 support
grahamking Sep 18, 2025
1a699fb
Merge pull request #3089 from grahamking/main
ivarflakstad Sep 20, 2025
ec3d92e
Various minor improvements, some suggested by clippy
ivarflakstad Sep 22, 2025
f583891
Merge pull request #3023 from xavierforge/bug/metadata-method-not-found
ivarflakstad Sep 22, 2025
944947a
Add command buffer thread map. Remove unecessary failure points
ivarflakstad Sep 30, 2025
b06d2fd
Merge pull request #3092 from huggingface/metal-clippy-fixes
ivarflakstad Sep 30, 2025
bc13c4b
Merge branch 'main' into improve-metal-command-buffer-map
ivarflakstad Sep 30, 2025
d205fb4
Fix multiple clippy warnings (#3101)
ivarflakstad Sep 30, 2025
d16eaf5
Merge branch 'main' into improve-metal-command-buffer-map
ivarflakstad Oct 1, 2025
7bfc5af
Wait until completed on command buffer status: scheduled as well
ivarflakstad Oct 1, 2025
df50343
Add metal conv for more dtypes
ivarflakstad Oct 2, 2025
c16785b
Allow based to run with bf16 on metal
ivarflakstad Oct 2, 2025
26c7868
Add backtracing to metal kernel errors for clarity
ivarflakstad Oct 2, 2025
7c5a8f2
Merge pull request #3103 from huggingface/metal-fix-conv
ivarflakstad Oct 2, 2025
e3fd0da
bump gemm dependency to 0.18.2 to match ug
slckl Oct 2, 2025
0ad167d
Merge pull request #3100 from huggingface/improve-metal-command-buffe…
ivarflakstad Oct 3, 2025
58811e8
Merge pull request #3105 from slckl/gemm-bump
ivarflakstad Oct 3, 2025
e677576
[Metal] Buffer improvements (#3093)
ivarflakstad Oct 3, 2025
a708b7a
Various quantization improvements. Direct copy. Verified block sizes.…
ivarflakstad Oct 3, 2025
742dfef
make cuda benches run again (#3111)
slckl Oct 4, 2025
9b476b2
Capture command buffer errors if they exist (#3106)
ivarflakstad Oct 4, 2025
716e126
[Metal] Improve wait_for_completed command buffers locking (#3107)
ivarflakstad Oct 4, 2025
671de1d
Skip unsupported quantized matmul tests for metal (#3115)
ivarflakstad Oct 5, 2025
bcc34bc
Fix beit on metal by adding additional affine implementations (#3116)
ivarflakstad Oct 6, 2025
a1350d6
Rough example of inlining model files into binary (#3104)
matthewhaynesonline Oct 7, 2025
ca35cf9
Where cond get_strided_index conditionally based on function constant…
ivarflakstad Oct 7, 2025
0374ff3
feat(stable-diffusion): add build_unet_sharded method (#3118)
hoodiecollin Oct 8, 2025
ad1da34
Fix metal get_function error (#3114)
ivarflakstad Oct 8, 2025
256c4e2
Quantization use debug_assert in hot paths (#3109)
ivarflakstad Oct 8, 2025
6fb56c3
Adding inference for GraniteMoeHybrid models from IBM (#3117)
atilag Oct 8, 2025
7b8f2b4
Fix failing `cuda` build (#3121)
LLukas22 Oct 9, 2025
cc967fc
feat: add metal_if_available method for graceful Metal fallback (#3041)
xavierforge Oct 9, 2025
bffa5e1
Fix metal quantized to_float calls (#3123)
ivarflakstad Oct 9, 2025
41fa5f1
Add more conv2d bench cases to candle-nn benches (#3131)
slckl Oct 13, 2025
9fe6232
Fix single file binary builder to only run when env var is set (#3126)
ivarflakstad Oct 13, 2025
f601fd8
Update modernbert.rs (#3010)
whitebox2 Oct 16, 2025
701205a
Update dependencies (#3135)
ivarflakstad Oct 16, 2025
1febb7b
Ensure output of Transpose is contiguous to prevent downstream MatMul…
kshitijl Oct 17, 2025
2bce4e5
In the BERT example: apply the attention mask from tokenization durin…
kshitijl Oct 18, 2025
a52f22f
Skip q8k and q8_1 tests on cuda (#3140)
ivarflakstad Oct 20, 2025
36b7517
Implement qwen3 vl
EricLBuehler Oct 23, 2025
fd379c5
Clippy
EricLBuehler Oct 23, 2025
59aeed4
Bump candle version to 0.9.2-alpha.1 (#3146)
ivarflakstad Oct 23, 2025
5b7858c
Remove unused
EricLBuehler Oct 23, 2025
e3228c1
Add Qwen 3 VL to candle-transformers
EricLBuehler Oct 23, 2025
d312da2
Improve candle example buildtime downloader (#3147)
ivarflakstad Oct 23, 2025
a23a48f
CPU Conv2d: separate module, tiled im2col, specialization (#3136)
slckl Oct 25, 2025
31d6698
rust-ci: add --benches to clippy, fix warnings (#3148)
slckl Oct 25, 2025
df618f8
candle-core: add `broadcast_add` benches (#3149)
slckl Oct 25, 2025
fab0c45
fix: build errors for compute cap 7.5 (#3142)
neksodebe Oct 28, 2025
a05b549
Update cargo build instructions to use double colon syntax (#3132)
matthewhaynesonline Oct 28, 2025
8f27f5c
Add flash attn v3: `candle-flash-attn-v3` (#3152)
EricLBuehler Oct 28, 2025
7669ed1
Add nccl feature to candle-core (#3155)
EricLBuehler Oct 30, 2025
3c7a63d
clippy default fixes (#3160)
ivarflakstad Oct 31, 2025
b8c2ee8
Fix Metal matmul failure in `ModernBertHead::forward` by ensuring con…
whitebox2 Oct 31, 2025
ca3aee8
Add varbuilder get_unchecked methods (#3157)
EricLBuehler Oct 31, 2025
d4545eb
Add unsafe from_storage apis (#3156)
EricLBuehler Nov 1, 2025
b06a02c
[Metal] Ensure metal backend is send/sync via status semaphore (#3164)
ivarflakstad Nov 6, 2025
ade0918
Add sqrt2 as constant for gelu_erf and use `libm` erf (#3168)
vrdn-23 Nov 7, 2025
4ff99ba
candle-core: strided-index inline next + size_hint + exact size itera…
slckl Nov 8, 2025
836540f
Fix DINOv2 no-interpolation shortcut (#3172)
pcuenca Nov 8, 2025
bf3d3f2
Use Tensor::argmax instead of manual cpu impl (#3173)
ivarflakstad Nov 9, 2025
87653ca
Fix argmax. Higher index should also be taken into account (#3179)
ivarflakstad Nov 11, 2025
db08cc0
Add command buffer pool for improved multi-threaded Metal performance…
anonenity Nov 11, 2025
60252cc
feat(candle-nn) ConcatKvCache for 2-5x GPU speedup on autoregressive …
DrJesseGlass Nov 14, 2025
8ebfc22
Add `cublas_handle` api, update safetensors (#3192)
EricLBuehler Nov 17, 2025
ab56dfe
Update CI (#3194)
ivarflakstad Nov 17, 2025
549eacb
Add initial support for imatrix quantization (#3193)
EricLBuehler Nov 18, 2025
eb651c8
add clear kv cache to quantized qwen3 weights (#3189)
anonenity Nov 18, 2025
3390caa
fix typo preventing usage on mac (#3201)
amritsingh183 Nov 20, 2025
27cd43c
CUDA: Fix integer reductions by removing +/-INF initialization (#3200)
TimmyOVO Nov 20, 2025
9ca71de
fix for https://github.com/huggingface/candle/issues/3203 (#3204)
amritsingh183 Nov 20, 2025
b801ef6
Add lld installation and test steps for Linux (#3213)
haricot Nov 25, 2025
01bea21
Add dummy dtypes (#3195)
EricLBuehler Nov 25, 2025
95ea453
Add more misc. changes from candle fork (#3196)
EricLBuehler Nov 25, 2025
2ac3fe0
.gitignore: add .zed to ignored editor configs (#3218)
slckl Nov 30, 2025
c39d5f0
chore(dep): bump cudarc to 0.18.1 (#3219)
mayocream Dec 2, 2025
08d7b64
Hotfix: Bump float8 to 0.5.0 (#3223)
EricLBuehler Dec 3, 2025
2664a21
[Metal] Make fast math mode optional (#3205)
ivarflakstad Dec 4, 2025
9ede204
Update pyo3 (#3202)
ivarflakstad Dec 4, 2025
3d3cc49
[Metal] unary and affine improvements (#3230)
ivarflakstad Dec 6, 2025
72238a7
[Metal] binary improvements (#3231)
ivarflakstad Dec 8, 2025
d91be02
fix(metal): add missing softcapping field to AttnParams struct (#3233)
amritsingh183 Dec 8, 2025
2a797ea
Format sdpa (#3235)
EricLBuehler Dec 8, 2025
d23664f
Fix metal argmax (#3238)
EricLBuehler Dec 9, 2025
73fd9c3
[Metal] further improve unary and binary (#3239)
ivarflakstad Dec 10, 2025
e33d776
[Metal] cast improvements (#3241)
ivarflakstad Dec 10, 2025
4b46187
[Metal] Improve ternary further (#3242)
ivarflakstad Dec 14, 2025
8839457
Bump candle version to 0.9.2-alpha.2 (#3248)
ivarflakstad Dec 16, 2025
689d255
add candle flash attention 3 copyright markers (#3256)
michaelfeil Dec 21, 2025
ab6d97e
fix: replace deprecated cudarc memcpy methods (#3228)
DrJesseGlass Dec 23, 2025
f2d5aab
Support Fused MoE & Qwen3 GGUF MoE models (#3221)
guoqingbao Dec 23, 2025
049c06d
Upgrade GitHub Actions for Node 24 compatibility (#3255)
salmanmkc Dec 24, 2025
0e4dc02
Adds onnx ops to support debertav3/piiranha (#3260)
skeet70 Dec 26, 2025
5498dff
Add bilinear interpolation support (upsample_bilinear2d) (#3237)
SpenserCai Dec 26, 2025
63437a4
Fix remnant memcpy_stod call (#3267)
ivarflakstad Dec 27, 2025
f2bd79e
Sort on cuda fails when tensor size exceeds 1024 (#3271)
slckl Dec 30, 2025
e717779
make candle ops public (#3226)
zackangelo Dec 30, 2025
4ea88fa
fix(quantized_gemma3): auto-detect GGUF metadata prefix for gemma-emb…
clocksmith Dec 30, 2025
5de3d0f
add HuberLoss (#3252)
donjuanplatinum Dec 31, 2025
d8fb848
feat!: Make `ug` dependency optional (#3268)
DanikVitek Dec 31, 2025
3a0d1cb
Add Z-Image Text-to-Image Generation Support (#3261)
SpenserCai Jan 2, 2026
43be23c
fix(candle-kernels): conditionally link stdc++ for non-MSVC targets (…
Elvis339 Jan 3, 2026
a4ad7c7
replace cutlass submodule references with explicit build step (#3234)
jacobgorm Jan 4, 2026
fd8448d
Rename compute capability defines in CUDA kernels (#3275)
FerrisMind Jan 6, 2026
c3ed240
Fix MoE WMMA kernel on V100 (#3282)
guoqingbao Jan 6, 2026
db3d5d9
[Metal] improve normalization (#3283)
ivarflakstad Jan 6, 2026
54131f1
Fix BF16 conv_transpose2d using wrong kernel on Metal (#3279)
amritsingh183 Jan 6, 2026
42a4edc
Mamba2 implementation (#3264)
Anri-Lombard Jan 7, 2026
f526033
feat: paddleocr-vl model and example (#3273)
danielclough Jan 7, 2026
f2d30fb
chore(dep): bump cudarc to 0.18.2 (#3293)
staceymelville Jan 14, 2026
dbb8c2d
Add SmolLM3: Full and Quantized Implementation (#3180)
DrJesseGlass Jan 14, 2026
a2029da
example: add quantized qwen3 wasm with SIMD optimizations (#3159)
DrJesseGlass Jan 14, 2026
aaf5c86
Hotfix: Remove fastmath from candle-kernels (#3309)
EricLBuehler Jan 17, 2026
261f727
feat: add quantized lfm2 model support (#3244)
fffonion Jan 17, 2026
23182cf
Support new arch of GLM4 GGUF models (#2992)
guoqingbao Jan 17, 2026
cc8ec5e
feat: simplify metal reduce kernels and standardize on u32 indexing (…
drbh Jan 18, 2026
a3969ed
rms/layer norm accumulate in f32 for improved precision (#3315)
ivarflakstad Jan 19, 2026
0f6b303
Remove `test.onnx` (uploaded by mistake?) (#3316)
alvarobartt Jan 19, 2026
06cb713
Metal GEMM Dynamic Tile Selection and Batch Collapse Optimization (#3…
SpenserCai Jan 21, 2026
8d5873b
Update deps (#3320)
ivarflakstad Jan 22, 2026
88ed791
CUDA Tensor::toDevice copies directly via memcopy_dtod (#3312)
krampenschiesser Jan 23, 2026
f041b87
[Cuda] Use upstream bindgen_cuda crate (#3328)
ivarflakstad Jan 24, 2026
e53310d
Bump candle version to 0.9.2 (#3329)
ivarflakstad Jan 24, 2026
3b39794
Add dep versioning for candle-flash-attn-build (#3330)
ivarflakstad Jan 24, 2026
061c392
Bump float8 to 0.7.0, cudarc to 0.19.1 (#3360)
EricLBuehler Feb 4, 2026
971e7ed
Bump float8 to 0.7.0, cudarc to 0.19.1 (#3360) (#3361)
EricLBuehler Feb 4, 2026
c3bb5bf
Use cudaforge for kernel build (#3346)
guoqingbao Feb 10, 2026
f2cb5b4
Add candle-video library for text-to-video generation in README.md (#…
FerrisMind Feb 10, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 1 addition & 1 deletion .cargo/config.toml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
rustflags = ["-C", "target-cpu=native"]

[target.wasm32-unknown-unknown]
rustflags = ["-C", "target-feature=+simd128"]
rustflags = ["-C", "target-feature=+simd128", "--cfg", 'getrandom_backend="wasm_js"']

[target.x86_64-apple-darwin]
rustflags = ["-C", "target-feature=-avx,-avx2"]
40 changes: 0 additions & 40 deletions .github/workflows/book-cd.yml

This file was deleted.

29 changes: 0 additions & 29 deletions .github/workflows/book.yml

This file was deleted.

13 changes: 7 additions & 6 deletions .github/workflows/ci_cuda.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,9 @@ jobs:
group: ${{ github.workflow }}-${{ github.job }}-${{ github.head_ref || github.run_id }}
cancel-in-progress: true
runs-on:
group: aws-g4dn-2xlarge
group: aws-g5-4xlarge-cache
container:
image: nvidia/cuda:12.3.1-devel-ubuntu22.04
options: --gpus 0
image: nvidia/cuda:13.0.2-cudnn-devel-ubuntu24.04
if: ${{ github.event.pull_request.head.repo.full_name == github.event.pull_request.base.repo.full_name }}
permissions:
contents: write
Expand All @@ -22,13 +21,15 @@ jobs:
# with sigstore/fulcio when running outside of PRs.
id-token: write
security-events: write
env:
CUDA_COMPUTE_CAP: 86
steps:
- name: Checkout repository
uses: actions/checkout@v3
uses: actions/checkout@v6
- name: Install dependencies
run: apt-get update && apt install curl build-essential libssl-dev protobuf-compiler pkg-config -y
run: apt update && apt install curl build-essential libssl-dev protobuf-compiler pkg-config -y
- name: Install Rust Stable
uses: actions-rust-lang/setup-rust-toolchain@v1
uses: dtolnay/rust-toolchain@stable
- uses: Swatinem/rust-cache@v2
- name: Test (cuda)
run: cargo test --features cuda
Binary file modified .github/workflows/maturin.yml
Binary file not shown.
16 changes: 7 additions & 9 deletions .github/workflows/python.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,30 +20,28 @@ jobs:
os: [ubuntu-latest] # For now, only test on Linux
steps:
- name: Checkout repository
uses: actions/checkout@v4
uses: actions/checkout@v6

- name: Install Rust
uses: actions-rs/toolchain@v1
with:
toolchain: stable
uses: dtolnay/rust-toolchain@stable

- name: Install Python
uses: actions/setup-python@v4
uses: actions/setup-python@v6
with:
python-version: 3.11
python-version: 3.13
architecture: "x64"

- name: Cache Cargo Registry
uses: actions/cache@v1
uses: actions/cache@v5
with:
path: ~/.cargo/registry
key: ${{ runner.os }}-cargo-registry-${{ hashFiles('**/Cargo.lock') }}

- name: Install Protoc
uses: arduino/setup-protoc@v2
with:
version: "25.0"
repo-token: ${{ secrets.GITHUB_TOKEN }}
version: "25.0"
repo-token: ${{ secrets.GITHUB_TOKEN }}

- name: Install
working-directory: ./candle-pyo3
Expand Down
105 changes: 64 additions & 41 deletions .github/workflows/rust-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,68 +11,91 @@ jobs:
name: Check
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
matrix:
os: [ubuntu-latest, windows-latest, macOS-latest]
rust: [stable]
os: [ubuntu-latest, ubuntu-24.04, windows-latest, macOS-latest, ubuntu-24.04-arm]
steps:
- uses: actions/checkout@v4
- uses: actions-rs/toolchain@v1
with:
profile: minimal
toolchain: ${{ matrix.rust }}
override: true
- uses: actions-rs/cargo@v1
- uses: actions/checkout@v6
- uses: actions/setup-python@v6
with:
command: check
args: --workspace
python-version: "3.13"
- name: Remove cargo config (macOS ring crate fix)
if: runner.os == 'macOS'
run: rm -f .cargo/config.toml
- uses: dtolnay/rust-toolchain@stable

- name: Run macos with metal
if: matrix.os == 'macOS-latest'
run: cargo check --workspace --features metal

- name: Run normal cpu
if: matrix.os == 'ubuntu-latest' || matrix.os == 'windows-latest'
run: cargo check --workspace

- name: Run with avx2
if: matrix.os == 'ubuntu-24.04'
run: |
export RUSTFLAGS="-C target-feature=avx2"
cargo check --workspace

- name: Run with arm neon
if: matrix.os == 'ubuntu-24.04-arm'
run: |
export RUSTFLAGS="-C target-feature=neon"
cargo check --workspace

test:
name: Test Suite
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest, windows-latest, macOS-latest]
rust: [stable]
steps:
- uses: actions/checkout@v4
- uses: actions-rs/toolchain@v1
with:
profile: minimal
toolchain: ${{ matrix.rust }}
override: true
- uses: actions-rs/cargo@v1
- name: Free disk space (Linux)
if: runner.os == 'Linux'
run: |
sudo rm -rf /opt/hostedtoolcache
sudo rm -rf /usr/share/dotnet
sudo rm -rf /usr/local/lib/android
sudo rm -rf /opt/ghc
df -h
- uses: actions/checkout@v6
- uses: actions/setup-python@v6
with:
command: test
args: --workspace
python-version: "3.13"
- name: Remove cargo config (macOS ring crate fix)
if: runner.os == 'macOS'
run: rm -f .cargo/config.toml
- uses: dtolnay/rust-toolchain@stable
- name: Install lld (Linux only)
if: runner.os == 'Linux'
run: sudo apt-get update && sudo apt-get install -y lld
- name: Run tests (with lld on Linux)
if: runner.os == 'Linux'
env:
RUSTFLAGS: "-C link-arg=-fuse-ld=lld"
run: cargo test --workspace
- name: Run tests (Windows & macOS)
if: runner.os != 'Linux'
run: cargo test --workspace

fmt:
name: Rustfmt
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions-rs/toolchain@v1
- uses: actions/checkout@v6
- uses: dtolnay/rust-toolchain@stable
with:
profile: minimal
toolchain: stable
override: true
- run: rustup component add rustfmt
- uses: actions-rs/cargo@v1
with:
command: fmt
args: --all -- --check
components: rustfmt
- run: cargo fmt --all -- --check

clippy:
name: Clippy
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions-rs/toolchain@v1
with:
profile: minimal
toolchain: stable
override: true
- run: rustup component add clippy
- uses: actions-rs/cargo@v1
- uses: actions/checkout@v6
- uses: dtolnay/rust-toolchain@stable
with:
command: clippy
args: --workspace --tests --examples -- -D warnings
components: clippy
- run: cargo clippy --workspace --tests --examples --benches -- -D warnings

12 changes: 6 additions & 6 deletions .github/workflows/trufflehog.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,9 @@ jobs:
trufflehog:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Secret Scanning
uses: trufflesecurity/trufflehog@main
- name: Checkout code
uses: actions/checkout@v6
with:
fetch-depth: 0
- name: Secret Scanning
uses: trufflesecurity/trufflehog@main
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ Cargo.lock
# editor config
.helix
.vscode
.zed

# These are backup files generated by rustfmt
**/*.rs.bk
Expand Down
3 changes: 0 additions & 3 deletions .gitmodules

This file was deleted.

11 changes: 0 additions & 11 deletions .vscode/settings.json

This file was deleted.

Loading