Skip to content

[pull] main from huggingface:main#41

Open
pull[bot] wants to merge 345 commits intoEricLBuehler:mainfrom
huggingface:main
Open

[pull] main from huggingface:main#41
pull[bot] wants to merge 345 commits intoEricLBuehler:mainfrom
huggingface:main

Conversation

@pull
Copy link

@pull pull bot commented Nov 19, 2024

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

LaurentMazare and others added 27 commits April 3, 2025 09:12
* Start updating to cudarc 0.14.

* Adapt a couple more things.

* And a couple more fixes.

* More tweaks.

* And a couple more fixes.

* Bump the major version number.

* Proper module system for the cuda kernels.

* Proper ptx loading.

* Launch the sort kernel.

* Custom op.

* Start using the builder pattern.

* More builder.

* More builder.

* Get candle-core to compile.

* Get the tests to pass.

* Get candle-nn to work too.

* Support for custom cuda functions.

* cudnn fixes.

* Get flash attn to run.

* Switch the crate versions to be alpha.

* Bump the ug dependency.
* added chatGLM readme

* changed wording in readme

* added readme for chinese-clip

* added readme for convmixer

* added readme for custom ops

* added readme for efficientnet

* added readme for llama

* added readme to mnist-training

* added readme to musicgen

* added readme to quantized-phi

* added readme to starcoder2

* added readme to whisper-microphone

* added readme to yi

* added readme to yolo-v3

* added readme to whisper-microphone

* added space to example in glm4 readme

* fixed mamba example readme to run mamba instead of mamba-minimal

* removed slash escape character

* changed moondream image to yolo-v8 example image

* added procedure for making the reinforcement-learning example work with a virtual environment on my machine

* added simple one line summaries to the example readmes without

* changed non-existant image to yolo example's bike.jpg

* added backslash to sam command

* removed trailing - from siglip

* added SoX to silero-vad example readme

* replaced procedure for uv on mac with warning that uv isn't currently compatible with pyo3

* added example to falcon readme

* added --which arg to stella-en-v5 readme

* fixed image path in vgg readme

* fixed the image path in the vit readme

* Update README.md

* Update README.md

* Update README.md

---------

Co-authored-by: Laurent Mazare <laurent.mazare@gmail.com>
* Fix for clippy 1.86.

* More clippy fixes.

* More fixes.
* Add the CSM model.

* Add some code to load the model.

* Load the text tokenizer.

* Add frame generation.

* Get the sampling to work.

* Rope fix.

* Autoregressive generation.

* Generate some audio file.

* Use the actual prompt.

* Support multiple turns.

* Add a very barebone readme.

* Move some of the shared bits to the model.
* Add the SNAC audio tokenizer.

* More snac.

* Again more snac.

* Add some example code for snac.

* Get the weights to load.

* Add to the snac model.

* Fixes.

* Get round-tripping to work.

* Save/load code files.

* Clippy fix.

* Fmt fix.
* Initial commit: model weights working, prediciton incorrect

* moved distilbertformaskedlm into distilbert modeling file

* made maskedLM like bert example, still incorrect predictions

* finally not getting NaNs, fixed attention mask

* getting correct output sentences

* get top k predictions

* fixed output formatting slightly

* added default arg for model_id

* lint

* moved masked token example code from distilbertformaskedlm example to distilbert example

* lint

* removed distilbertformaskedlm example

* cleanup

* clippy

* removed embedding normalization from example

* made output and model dependent on args instead of prompt

* lint

* replaced or_ok anyhow error with anyhow context

* changed error message for mask token not found
* Cuda cleanup.

* More fixes.
* Avoid using batched-matmul in nn::Linear.

* Also avoid batched matmul in conv1d.

* Also tweak the conv2d.

* Batched tests.

* Also cover conv2d.
* Add the Orpheus TTS.

* Add a small readme.

* Token fix.

* Support more voices.

* Clippy fixes.
* Support for cudnn conv1d.

* More conv1d work.

* Get the conv1d to work with cudnn.

* Cleanup.
* Exclude candle-book to avoid some CI failures.

* Remove the book CIs.
* Set the algo.

* Expose the cudnn preferred algo for conv ops.
* Gumbel-Softmax sampling.

* Add a sampling test.

* Share the gumbel-softmax bits.
* Use cudarc 0.16.

* Allow for disabling event tracking.

* Tweaks.

* Bump the ug version.

* And bump the candle version too.
…ed CONTRIBUTING.md (#2897)

* added CONTRIBUTING.md to candle-book

* added description to candle-book introduction

* Updated formatting and added different features to candle-book installation

* mnist guide first draft candle-book

* updated mnist guide syntax and grammar for candle-book

* changed HelloWorld - Mnist to Tutorial - Mnist in SUMMARY.md

* updated intro to mnist guide in candle-book
* Retrieve the current positions for rotating KV caches.

* Add the function to the kv cache too.

* More testing.
donjuanplatinum and others added 30 commits December 31, 2025 12:14
* add HuberLoss and add Loss Trait

* 1. remove the LaTeX comment in loss.rs
2. add huberloss test

* change the huberloss Loss trait into the same approach as the other loss functions in this file

* cargo fmt

---------

Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>
* feat!: Make `ug` dep optional

* fix(example/mnist-training): Run all epochs

* doc(`candle-ug`): Crate documentation

* fix: feature-gate the `ComputePipeline` import

---------

Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>
* init z-image

* fixed patchify, unpatchify and latent

* update z_image examples readme

* fixed clippy and rustfmt

* fixed z_image example readme links

* support sdpa and flash-attn in Z-Image and fixed sdpa clippy warning

* fix some readme

* Update candle-transformers/src/models/z_image/transformer.rs

Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>

* support --model in example

---------

Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>
* replace cutlass submodule references with explicit build step

* address review comment: use Box leak trick also in attn-v3

* address review comment:

use "git clone --depth 1" instead of sparse checkout

for compatiblity with older git versions

* correct version in candle-flash-attn-build/Cargo.toml

* add top-level candle-flash-attn-build crate

* rustfmt

---------

Co-authored-by: Jacob Gorm Hansen <jhansen@dropbox.com>
Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>
* Changed CONVT1D_OP to CONVT2D_OP for conv_transpose2d_bf16

* removing the test: the cause of the bug is so apparent.
* quantized and full SmolLM3

* include chrono for prompt

* resolve pub consist and unused var

* formatted

* last spacing in format

* add credits

* chat template

* integrate new chat template for smollm3 example

* fmt and clippy

* improve documentation / correct chat API in doc

* skip compile on documentation

---------

Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>
* minimal example

* toyed with cache

* serve cli

* cleaned up index

* readme

* cleaner interface

* improved formatting

* serve format

* index header unsloth

* demonstrate reasoning and format prompt

* no delay on button disable

* thinking based prompt; boolean switch; handle thinking reentry UI

* remove clear memory profile

* quant-qwen3 wasm using chat template

* thinking and no thinking tweakable mid conversation

* add discussion

* refactored example to use chat template; removed depracated fns

* clean up inputs and fmt

* remove unused logging import

---------

Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>
* feat: add quantized lfm2 model support

* fmt

---------

Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>
Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>
…3285)

* feat: reduce prefers min dtype based on stride size

* fix: cargo format fix

* Add temporary large/small reduce benchmarks

* Improved / simplified strided indexing

* Begin untangling pow2 concept from indexer

* remove unused get_strided_index_u64

* Make indexer_t handle both cont and strided. indexer.last_dim is a constant

* Remove Pow2Meta

* Remove redundant contiguous_indexer

* Remove redundant strided index impl

* Remove pow2 from kernel call/signature

* Some size_t -> ulong changes

* Use indexer_t for all reduce based kernels

* contiguous indexer last_dim default is 0

* u16 indexing does not provide speedup, and is an unlikely use case

* Let indexer_t dictate indexing dtype

* Introduce finalize concept. Tidy up redundant fns/macros

* Tidying up

* Use store concept instead of finalize

* Simplify arg reduce macros

* Tidying up

* Add reduce kernels for large tensors. Use existing reduce macro to add implementations

Remove large arg reduce kernels as we currently only support writing uint arg reduce results.

* Remove u64 indexed reduce kernels (u32 should suffice). Also max block_dim is 1024, so removing 2048 case from reduce kernels

* Remove suffix from reduce macro

* Remove IDX from reduce macros

* Tidy up reduce kernel call code

* Explicit u32 in reduce call code. Remove IDX from arg reduce macros

* Remove small reduce benchmark

---------

Co-authored-by: Ivar Flakstad <69173633+ivarflakstad@users.noreply.github.com>
)

* mlx gemm opt init

* fix bug

* update

* opt

* opt

* update more shape to matmul benchmark

* remove metal_matmul_benchmark

---------

Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>
* Update deps

* add imageproc text feature

* Fix compilation

---------

Co-authored-by: Eric Buehler <ericlbuehler@gmail.com>
* direct transfer for cuda

* allocate some random int data

* implement dummy device, test  same device logic

* update to latest cudarc version

* add cuda and metal checks

* change dependencies to use newer cudarc version

* reduce test to check for different devices

* Fix deps

* Clippy, fix workflow

* Format

* Fix handling for cuda

---------

Co-authored-by: Eric Buehler <ericlbuehler@gmail.com>
* Use upstream bindgen_cuda crate. Use separate builders for each steps. Remove some cargo build warning messages

* bump bindgen_cuda

* fix

* Fix transfer_cuda_to_device test
* Use cudaforge for kernel build

* Fix clippy

* Update cudaforge to v0.1.2

* Fix build candle-examples

---------

Co-authored-by: Eric Buehler <ericlbuehler@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⤵️ pull merge-conflict Resolve conflicts manually

Projects

None yet

Development

Successfully merging this pull request may close these issues.