[pull] main from huggingface:main#41
Open
pull[bot] wants to merge 345 commits intoEricLBuehler:mainfrom
Open
Conversation
* Start updating to cudarc 0.14. * Adapt a couple more things. * And a couple more fixes. * More tweaks. * And a couple more fixes. * Bump the major version number. * Proper module system for the cuda kernels. * Proper ptx loading. * Launch the sort kernel. * Custom op. * Start using the builder pattern. * More builder. * More builder. * Get candle-core to compile. * Get the tests to pass. * Get candle-nn to work too. * Support for custom cuda functions. * cudnn fixes. * Get flash attn to run. * Switch the crate versions to be alpha. * Bump the ug dependency.
* added chatGLM readme * changed wording in readme * added readme for chinese-clip * added readme for convmixer * added readme for custom ops * added readme for efficientnet * added readme for llama * added readme to mnist-training * added readme to musicgen * added readme to quantized-phi * added readme to starcoder2 * added readme to whisper-microphone * added readme to yi * added readme to yolo-v3 * added readme to whisper-microphone * added space to example in glm4 readme * fixed mamba example readme to run mamba instead of mamba-minimal * removed slash escape character * changed moondream image to yolo-v8 example image * added procedure for making the reinforcement-learning example work with a virtual environment on my machine * added simple one line summaries to the example readmes without * changed non-existant image to yolo example's bike.jpg * added backslash to sam command * removed trailing - from siglip * added SoX to silero-vad example readme * replaced procedure for uv on mac with warning that uv isn't currently compatible with pyo3 * added example to falcon readme * added --which arg to stella-en-v5 readme * fixed image path in vgg readme * fixed the image path in the vit readme * Update README.md * Update README.md * Update README.md --------- Co-authored-by: Laurent Mazare <laurent.mazare@gmail.com>
* Fix for clippy 1.86. * More clippy fixes. * More fixes.
* Add the CSM model. * Add some code to load the model. * Load the text tokenizer. * Add frame generation. * Get the sampling to work. * Rope fix. * Autoregressive generation. * Generate some audio file. * Use the actual prompt. * Support multiple turns. * Add a very barebone readme. * Move some of the shared bits to the model.
* Add the SNAC audio tokenizer. * More snac. * Again more snac. * Add some example code for snac. * Get the weights to load. * Add to the snac model. * Fixes. * Get round-tripping to work. * Save/load code files. * Clippy fix. * Fmt fix.
* Initial commit: model weights working, prediciton incorrect * moved distilbertformaskedlm into distilbert modeling file * made maskedLM like bert example, still incorrect predictions * finally not getting NaNs, fixed attention mask * getting correct output sentences * get top k predictions * fixed output formatting slightly * added default arg for model_id * lint * moved masked token example code from distilbertformaskedlm example to distilbert example * lint * removed distilbertformaskedlm example * cleanup * clippy * removed embedding normalization from example * made output and model dependent on args instead of prompt * lint * replaced or_ok anyhow error with anyhow context * changed error message for mask token not found
* Cuda cleanup. * More fixes.
* Avoid using batched-matmul in nn::Linear. * Also avoid batched matmul in conv1d. * Also tweak the conv2d. * Batched tests. * Also cover conv2d.
* Add the Orpheus TTS. * Add a small readme. * Token fix. * Support more voices. * Clippy fixes.
* Support for cudnn conv1d. * More conv1d work. * Get the conv1d to work with cudnn. * Cleanup.
* Exclude candle-book to avoid some CI failures. * Remove the book CIs.
* Set the algo. * Expose the cudnn preferred algo for conv ops.
* Gumbel-Softmax sampling. * Add a sampling test. * Share the gumbel-softmax bits.
* Use cudarc 0.16. * Allow for disabling event tracking. * Tweaks. * Bump the ug version. * And bump the candle version too.
…ed CONTRIBUTING.md (#2897) * added CONTRIBUTING.md to candle-book * added description to candle-book introduction * Updated formatting and added different features to candle-book installation * mnist guide first draft candle-book * updated mnist guide syntax and grammar for candle-book * changed HelloWorld - Mnist to Tutorial - Mnist in SUMMARY.md * updated intro to mnist guide in candle-book
* Retrieve the current positions for rotating KV caches. * Add the function to the kv cache too. * More testing.
* add HuberLoss and add Loss Trait * 1. remove the LaTeX comment in loss.rs 2. add huberloss test * change the huberloss Loss trait into the same approach as the other loss functions in this file * cargo fmt --------- Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>
* feat!: Make `ug` dep optional * fix(example/mnist-training): Run all epochs * doc(`candle-ug`): Crate documentation * fix: feature-gate the `ComputePipeline` import --------- Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>
* init z-image * fixed patchify, unpatchify and latent * update z_image examples readme * fixed clippy and rustfmt * fixed z_image example readme links * support sdpa and flash-attn in Z-Image and fixed sdpa clippy warning * fix some readme * Update candle-transformers/src/models/z_image/transformer.rs Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com> * support --model in example --------- Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>
* replace cutlass submodule references with explicit build step * address review comment: use Box leak trick also in attn-v3 * address review comment: use "git clone --depth 1" instead of sparse checkout for compatiblity with older git versions * correct version in candle-flash-attn-build/Cargo.toml * add top-level candle-flash-attn-build crate * rustfmt --------- Co-authored-by: Jacob Gorm Hansen <jhansen@dropbox.com> Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>
* Changed CONVT1D_OP to CONVT2D_OP for conv_transpose2d_bf16 * removing the test: the cause of the bug is so apparent.
* quantized and full SmolLM3 * include chrono for prompt * resolve pub consist and unused var * formatted * last spacing in format * add credits * chat template * integrate new chat template for smollm3 example * fmt and clippy * improve documentation / correct chat API in doc * skip compile on documentation --------- Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>
* minimal example * toyed with cache * serve cli * cleaned up index * readme * cleaner interface * improved formatting * serve format * index header unsloth * demonstrate reasoning and format prompt * no delay on button disable * thinking based prompt; boolean switch; handle thinking reentry UI * remove clear memory profile * quant-qwen3 wasm using chat template * thinking and no thinking tweakable mid conversation * add discussion * refactored example to use chat template; removed depracated fns * clean up inputs and fmt * remove unused logging import --------- Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>
* feat: add quantized lfm2 model support * fmt --------- Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>
Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>
…3285) * feat: reduce prefers min dtype based on stride size * fix: cargo format fix * Add temporary large/small reduce benchmarks * Improved / simplified strided indexing * Begin untangling pow2 concept from indexer * remove unused get_strided_index_u64 * Make indexer_t handle both cont and strided. indexer.last_dim is a constant * Remove Pow2Meta * Remove redundant contiguous_indexer * Remove redundant strided index impl * Remove pow2 from kernel call/signature * Some size_t -> ulong changes * Use indexer_t for all reduce based kernels * contiguous indexer last_dim default is 0 * u16 indexing does not provide speedup, and is an unlikely use case * Let indexer_t dictate indexing dtype * Introduce finalize concept. Tidy up redundant fns/macros * Tidying up * Use store concept instead of finalize * Simplify arg reduce macros * Tidying up * Add reduce kernels for large tensors. Use existing reduce macro to add implementations Remove large arg reduce kernels as we currently only support writing uint arg reduce results. * Remove u64 indexed reduce kernels (u32 should suffice). Also max block_dim is 1024, so removing 2048 case from reduce kernels * Remove suffix from reduce macro * Remove IDX from reduce macros * Tidy up reduce kernel call code * Explicit u32 in reduce call code. Remove IDX from arg reduce macros * Remove small reduce benchmark --------- Co-authored-by: Ivar Flakstad <69173633+ivarflakstad@users.noreply.github.com>
* Update deps * add imageproc text feature * Fix compilation --------- Co-authored-by: Eric Buehler <ericlbuehler@gmail.com>
* direct transfer for cuda * allocate some random int data * implement dummy device, test same device logic * update to latest cudarc version * add cuda and metal checks * change dependencies to use newer cudarc version * reduce test to check for different devices * Fix deps * Clippy, fix workflow * Format * Fix handling for cuda --------- Co-authored-by: Eric Buehler <ericlbuehler@gmail.com>
* Use upstream bindgen_cuda crate. Use separate builders for each steps. Remove some cargo build warning messages * bump bindgen_cuda * fix * Fix transfer_cuda_to_device test
* Use cudaforge for kernel build * Fix clippy * Update cudaforge to v0.1.2 * Fix build candle-examples --------- Co-authored-by: Eric Buehler <ericlbuehler@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.4)
Can you help keep this open source service alive? 💖 Please sponsor : )