Open
Conversation
Owner
|
@KimHenrikOtte thanks for the PR. This is super exciting, please let me know when you are ready for review. |
Owner
|
@KimHenrikOtte is this ready for an initial review? |
* tracing page * warned about asynchronous execution * cleanup * added Nsignt Systems recommendation
* Add a scattered kv cache. * Update some comments.
* add Qwen3.rs * fixed compile error * attempting to gett pr 2903 working with qwen weights * different qwen variants working * added moe model * clippy * added additional eos token * translated Korean comments to English as well as I can * removed specialized Qwen3RmsNorm and replaced with generic Candle RmsNorm * replaced custom repeat_kv implementation with candle's repeat_kv implementation * replace linear with linear_b in attention initalization * replaced custom custom kv_cache implementation with candle kv_cache * style * replaced explicit broadcast add with normal add in decoder layer * removed keeping the Rotary embedding layer in the model struct * used tie_word_embeddings bool from config instead of relying on existence of weights for lm head in CasualLM * removed duplicate code from qwen3_moe * removed sliding window from qwen3 attention * removed MoE code * removed unused option * Fixed Typo Co-authored-by: Laurent Mazare <laurent.mazare@gmail.com> * fixed tie word embeddings to use the correct embedding weights instead of the opposite --------- Co-authored-by: Max <naturale@hufs.ac.kr> Co-authored-by: Laurent Mazare <laurent.mazare@gmail.com>
* Indexing with max-value results in zero/no-op. * Add some testing. * Also adapt the metal kernels. * Another test. * Fix.
* fixed quantized_phi3 implementation * quantized_qwen3 implementation * Update quantized_phi3.rs * Update quantized_phi3.rs * add quantized_qwen3 example * Clippy fixes. * Cleanup. --------- Co-authored-by: Laurent <laurent.mazare@gmail.com>
* added resize to candle-onnx, not currently working * changed unreachable to bail, and bailed when both scales and sizes are set * cleanup and added other unused options for this op * cleanup * fixed image loading to make output work * cleanup and removed unused variables * removed path path creation code, and changed unwrap to ?
* optimize KV cache to reduce GPU memory usage * revert to using candle_nn::kv_cache::KvCache with initial capacity of 512
* OLMo 2 model * Update olmo-2 to example * Clippy fix. --------- Co-authored-by: laurent <laurent.mazare@gmail.com>
* fixed docs quantized-qwen3 README * fixed docs quantized-qwen2-instruct README
* Add phi-4 support. * Long-rope support. * Get clippy to be happy.:
- Add `dot()` for vector/matrix products - Implement the `Frobenius` norm - Add `mv()` for matrix-vector multiply
* onnx attention * setup an example, adding and fixing onnx ops bit by bit * model working, output is garbage data * trilu working * close but not quite, Issues still with scatterND * closer but the outputs are still slightly wrong * added tests for trilu and scatterND * lint * readme * clippy * removed unnessisary comments * changed device selection, took hyperparameters from model config
* qwen-moe rebase * lint * fixed rebase error * swapped normal MoE model with CausalMoE Model in example, and swapped the tie word embeddings if statement * updated readme
* Update KvCache initialization in Qwen3 model to use a fixed max position embedding value of 512 * add doc
* add: wip RNN parameters * fix: corrected access to tensor dim in rnn * add: rnn function call * merged files * added parameter parsing * update: rnn parameter parsing * remove: ONNX descriptions * update: implemented basic operations * update: removed comment * add: RNN test * update: prepared test values * fix: operations on tensors * update: passing tests * add: test gen script * changed error message --------- authored-by: misadowsk <michalsad.protondynamic@gmail.com>
* feat: added Elu operator * feat: added hard swish * added more tests for hard swish * clened up --------- authored-by: misadowsk <michalsad.protondynamic@gmail.com>
…ggingface#3313) * mlx gemm opt init * fix bug * update * opt * opt * update more shape to matmul benchmark * remove metal_matmul_benchmark --------- Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>
* Update deps * add imageproc text feature * Fix compilation --------- Co-authored-by: Eric Buehler <ericlbuehler@gmail.com>
) * direct transfer for cuda * allocate some random int data * implement dummy device, test same device logic * update to latest cudarc version * add cuda and metal checks * change dependencies to use newer cudarc version * reduce test to check for different devices * Fix deps * Clippy, fix workflow * Format * Fix handling for cuda --------- Co-authored-by: Eric Buehler <ericlbuehler@gmail.com>
* Use upstream bindgen_cuda crate. Use separate builders for each steps. Remove some cargo build warning messages * bump bindgen_cuda * fix * Fix transfer_cuda_to_device test
- fixed MutexGuard across await point clippy warnings for native build (but not solved for wasm)
- add copy to staging_buffer at the of flush_gpu_command
# Conflicts: # Cargo.toml
- fixed ternary_op_wgpu test (allow u8 buffer creation) - fix warning in candle-wasm-tests - removed wgpu feature from candle-test example
- improved doc comments - restructured public api
- a ShaderLoader::load now returns a Cow(a shader loader might return a static shader string, or may generate a shader in place)
- Simplified quantized shader loading - Fixed `MutexGuard` across async methods; on WASM, the locks will be dropped before the await point - Added multi-thread test
# Conflicts: # candle-core/Cargo.toml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.