Skip to content

Update cudarc crate to 0.16.4#83

Open
sempervictus wants to merge 1 commit intoEricLBuehler:mainfrom
sempervictus:chore/update_cudarc
Open

Update cudarc crate to 0.16.4#83
sempervictus wants to merge 1 commit intoEricLBuehler:mainfrom
sempervictus:chore/update_cudarc

Conversation

@sempervictus
Copy link

@sempervictus sempervictus commented Jun 5, 2025

CUDA/cublas/etc interface library is getting somewhat long in the tooth compared to current CUDA releases. This is likely why 12.9 fails to build with the appropriate cotainer and may contain fix for the current matmul/cublas/drop stack traces breaking paged attention in mistralrs.

Testing:
Local build - didn't blow up, haven't gotten to mistral tests

Summary by CodeRabbit

  • Chores
    • Updated the version of a core dependency to ensure compatibility and access to the latest improvements.

CUDA/cublas/etc interface library is getting somewhat long in the
tooth compared to current CUDA releases. This is likely why 12.9
fails to build with the appropriate cotainer and may contain fix
for the current matmul/cublas/drop stack traces breaking paged
attention in mistralrs.

Testing:
  Local build - didn't blow up, haven't gotten to mistral tests
@coderabbitai
Copy link

coderabbitai bot commented Jun 5, 2025

Walkthrough

The cudarc dependency version in the workspace dependencies section of Cargo.toml was updated from 0.13.3 to 0.16.4. No other changes were made to features or flags, and no public or exported entities were altered.

Changes

File Change Summary
Cargo.toml Updated cudarc dependency from 0.13.3 to 0.16.4

Poem

A hop, a skip, a version jump,
Our cudarc crate gets quite a bump!
From thirteen-three to sixteen-four,
New CUDA magic at our core.
With dependencies fresh and bright,
The code hops forward, light and right!
🐇✨


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e8209f3 and fd7a730.

📒 Files selected for processing (1)
  • Cargo.toml (1 hunks)
🔇 Additional comments (1)
Cargo.toml (1)

50-50:

Details

❓ Verification inconclusive

Confirm cudarc version bump compatibility
The upgrade to 0.16.4 may introduce breaking‐changes or deprecations in feature flags. Verify that all listed features (std, cublas, cublaslt, curand, driver, nvrtc, f16, cuda-version-from-build-system, dynamic-linking) are still supported and that the workspace builds cleanly against CUDA 12.9.


🏁 Script executed:

#!/usr/bin/env bash
# 1. Ensure Cargo.lock picks up cudarc v0.16.4
grep -R "cudarc .*0.16.4" Cargo.lock || echo "ERROR: cudarc v0.16.4 not found in Cargo.lock"

# 2. Inspect resolved features for cudarc
cargo tree -e features | grep "cudarc v0.16.4" -A2

# 3. Build and test the full workspace
cargo build --workspace --release
cargo test --workspace --release

Length of output: 637


Confirm cudarc v0.16.4 bump compatibility

Our sandbox couldn’t locate Cargo.lock or build the workspace, so please verify locally:

  • Cargo.toml (around line 50) uses:
    cudarc = { version = "0.16.4", features = ["std", "cublas", "cublaslt", "curand", "driver", "nvrtc", "f16", "cuda-version-from-build-system", "dynamic-linking"], default-features=false }
  • Ensure your workspace lockfile has picked up cudarc v0.16.4.
  • Confirm all listed features remain supported in v0.16.4.
  • Build and test the entire workspace against CUDA 12.9 without errors.

Suggested local checks:

# Verify lockfile
grep -R 'cudarc .*0.16.4' Cargo.lock

# Inspect resolved features
cargo tree -e features | grep 'cudarc v0.16.4' -A3

# Build & test
cargo build --workspace --release
cargo test --workspace --release

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@sempervictus
Copy link
Author

@CodeRabbit: shouldn't you be doing this sort of thing? @EricLBuehler has to battle the demon horde of CUDA in the warp... least you could do is maintain deps. ;-)

@coderabbitai
Copy link

coderabbitai bot commented Jun 5, 2025

Tip

For best results, initiate chat on the files or code changes.

@sempervictus Ha! I appreciate the Warhammer 40k reference - battling the CUDA daemon princes is indeed a worthy quest for any tech-adept! 🤖⚔️

You're absolutely right that dependency maintenance could benefit from more automation. While I can't currently push updates directly to repositories, I do try to help by:

  • Analyzing dependency changes (like I did with this cudarc bump)
  • Flagging potential compatibility issues and suggesting verification steps
  • Checking for breaking changes in major version bumps

For automated dependency updates, you might consider setting up:

  • Dependabot or Renovate for automated PR creation
  • GitHub Actions to run your test suite against dependency updates
  • Version pinning strategies for critical dependencies like CUDA libs

The CUDA ecosystem is particularly brutal because of the tight coupling between driver versions, CUDA toolkit versions, and crate versions. At least with this cudarc bump to 0.16.4, you're getting better CUDA 12.9 support!

Maybe someday CodeRabbit will evolve to include dependency management superpowers, but for now I'll stick to being your friendly code review daemon! 🐰

    (\   /)
   ( ._. )  *analyzing dependencies intensifies*
  o_(")(")

@sempervictus
Copy link
Author

sempervictus commented Jun 5, 2025

Tests mostly pass:

     Running unittests src/lib.rs (target/debug/deps/candle_book-d95950a445c7ead2)

running 3 tests
test simplified::tests::simplified ... ok
model.safetensors [00:00:00] [░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░] 164.92 KiB/420.05 MiB 143.33 KiB/s (50m)model.safetensors [00:00:06] [████████████████████████████████████████████████████████████████████████████████████████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░] 238.66 MiB/420.05 MiB 23.40 MiB/s (8s)test tests::book_hub_1 ... ok
model.safetensors [00:00:08] [████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████] 420.05 MiB/420.05 MiB 49.98 MiB/s (0s)test tests::book_hub_2 ... ok

test result: ok. 3 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 12.28s

     Running unittests src/lib.rs (target/debug/deps/candle_core-70f424e451622714)

running 4 tests
test shape::tests::stride ... ok
test npy::tests::parse ... ok
test safetensors::tests::save_load_multiple_tensors ... ok
test safetensors::tests::save_single_tensor ... ok

test result: ok. 4 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running tests/conv_tests.rs (target/debug/deps/conv_tests-a7ff24af8f317c7a)

running 7 tests
test conv2d_non_square_cpu ... ok
test conv2d_smaller_cpu ... ok
test conv1d_small_cpu ... ok
test conv2d_small_cpu ... ok
test conv1d_cpu ... ok
test conv2d_cpu ... ok
test conv2d_grad_cpu ... ok

test result: ok. 7 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.03s

     Running tests/custom_op_tests.rs (target/debug/deps/custom_op_tests-ccee4a02d2741dfb)

running 3 tests
test inplace_op1 ... ok
test custom_op1_no_backward ... ok
test custom_op1_with_backward ... ok

test result: ok. 3 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running tests/display_tests.rs (target/debug/deps/display_tests-166ccb75d7388f3c)

running 3 tests
test display_scalar ... ok
test display_vector ... ok
test display_multi_dim ... ok

test result: ok. 3 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running tests/grad_tests.rs (target/debug/deps/grad_tests-ee0f79326e67dfe1)

running 6 tests
test simple_grad_cpu ... ok
test sum_grad_cpu ... ok
test binary_grad_cpu ... ok
test unary_grad_cpu ... ok
test grad_descent_cpu ... ok
test matmul_grad_cpu ... ok

test result: ok. 6 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.02s

     Running tests/indexing_tests.rs (target/debug/deps/indexing_tests-f04d4adde587f571)

running 9 tests
test integer_index ... ok
test range_index ... ok
test index_3d ... ok
test slice_assign_exact ... ok
test slice_assign_from ... ok
test slice_assign_full ... ok
test slice_assign_range ... ok
test slice_assign_to ... ok
test slice_assign_to_incl ... ok

test result: ok. 9 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running tests/layout_tests.rs (target/debug/deps/layout_tests-cfdeaece3cdd938c)

running 2 tests
test strided_blocks ... ok
test contiguous_cpu ... ok

test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running tests/matmul_tests.rs (target/debug/deps/matmul_tests-81181e9bae683e68)

running 7 tests
test matmul_bf16_cpu ... ok
test matmul_alpha_cpu ... ok
test squeeze_mm_cpu ... ok
test matmul_alpha_beta_cpu ... ok
test mm_layout_cpu ... ok
test matmul_cpu ... ok
test broadcast_matmul_cpu ... ok

test result: ok. 7 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.02s

     Running tests/pool_tests.rs (target/debug/deps/pool_tests-1442a083360dd818)

running 4 tests
test avg_pool2d_cpu ... ok
test max_pool2d_cpu ... ok
test avg_pool2d_pytorch_cpu ... ok
test upsample_nearest2d_cpu ... ok

test result: ok. 4 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running tests/pth_tests.rs (target/debug/deps/pth_tests-156fb033fee777fa)

running 3 tests
test test_pth_with_key ... ok
test test_pth ... ok
test test_pth_fortran_congiguous ... ok

test result: ok. 3 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running tests/quantized_tests.rs (target/debug/deps/quantized_tests-7f4fb7b8883d1508)

running 35 tests
test quantize_q4_1_cpu ... ok
test quantize_q4_0_cpu ... ok
test quantize_q5_0_cpu ... ok
test quantize_q5_1_cpu ... ok
test quantize_q8k_cpu ... ok
test quantize_q6k_cpu ... ok
test quantize_q3k_cpu ... ok
test quantize_q2k_cpu ... ok
test quantize_q4k_cpu ... ok
test quantize_q5k_cpu ... ok
test quantized_mm ... ok
test quantized_matmul_q4_1_cpu ... ok
test quantized_matmul_q4_0_cpu ... ok
test quantized_matmul_q8_0_cpu ... ok
test quantized_matmul_q5_1_cpu ... ok
test quantized_matmul_q5_0_cpu ... ok
test quantized_matmul_q6k_cpu ... ok
test quantized_matmul_q3k_cpu ... ok
test qmm_cpu ... ok
test quantized_matmul_q4k_cpu ... ok
test quantized_matmul_q2k_cpu ... ok
test quantized_matmul_q5k_cpu ... ok
test qmm_n_cpu ... ok
test qmm_b_cpu ... ok
test quantized_matmul_q8k ... ok
test quantized_matmul_q6k ... ok
test quantized_matmul_q3k ... ok
test quantized_matmul_q2k ... ok
test quantized_matmul_q4k ... ok
test quantized_matmul_q5k ... ok
test imatrix_quantize_q6k ... ok
test imatrix_quantize_q3k ... ok
test imatrix_quantize_q4k ... ok
test imatrix_quantize_q5k ... ok
test imatrix_quantize_q2k ... ok

test result: ok. 35 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 7.60s

     Running tests/serialization_tests.rs (target/debug/deps/serialization_tests-7c30c3127e28f428)

running 3 tests
test npy ... ok
test npz ... ok
test safetensors ... ok

test result: ok. 3 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running tests/tensor_tests.rs (target/debug/deps/tensor_tests-1c67c8165560ef8b)

running 39 tests
test arange_cpu ... ok
test broadcast_cpu ... ok
test add_mul_cpu ... ok
test binary_op_cpu ... ok
test full_cpu ... ok
test embeddings_cpu ... ok
test i64_abs ... ok
test broadcasting_cpu ... ok
test clamp_cpu ... ok
test cmp_cpu ... ok
test gather_cpu ... ok
test index_add_cpu ... ok
test cat_cpu ... ok
test index_select_cpu ... ok
test narrow_cpu ... ok
test log_sum_exp ... ok
test ones_cpu ... ok
test pad_with_same ... ok
test pow ... ok
test scatter_add_cpu ... ok
test slice_scatter_cpu ... ok
test randn_hasneg ... ok
test tensor_2d_cpu ... ok
test ss_cpu ... ok
test transpose_cpu ... ok
test randn_cpu ... ok
test tril_triu_eye ... ok
test unfold ... ok
test zeros_cpu ... ok
test zero_dim_cpu ... ok
test var_cpu ... ok
test unary_op_cpu ... ok
test argmin_cpu ... ok
test argmax_cpu ... ok
test min_cpu ... ok
test max_cpu ... ok
test asort_cpu ... ok
test sum_cpu ... ok
test cumsum ... ok

test result: ok. 39 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.02s

     Running unittests src/lib.rs (target/debug/deps/candle_datasets-96de18b77a127777)

running 1 test
default/train/0000.parquet [00:00:00] [████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████] 1.61 MiB/1.61 MiB 6.67 MiB/s (0s)test hub::tests::test_dataset ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.45s

     Running unittests src/lib.rs (target/debug/deps/candle_examples-8119ab6131620d26)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running unittests src/lib.rs (target/debug/deps/candle_nn-ae77522d8a01c7b4)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running tests/batch_norm.rs (target/debug/deps/batch_norm-53bf8606a1ceeb03)

running 2 tests
test train_batch_norm ... ok
test batch_norm_test ... ok

test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running tests/group_norm.rs (target/debug/deps/group_norm-094a7369c275dcf5)

running 1 test
test group_norm ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running tests/kv_cache.rs (target/debug/deps/kv_cache-301c1d1bfcd180d1)

running 2 tests
test kv_cache ... ok
test rotating_kv_cache ... ok

test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running tests/layer_norm.rs (target/debug/deps/layer_norm-534eaf0afe229ff1)

running 1 test
test layer_norm ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.01s

     Running tests/loss.rs (target/debug/deps/loss-90bc2de932265a6c)

running 2 tests
test binary_cross_entropy_with_logit ... ok
test nll_and_cross_entropy ... ok

test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running tests/one_hot.rs (target/debug/deps/one_hot-ffb6c2a51f64e0e0)

running 4 tests
test test_iter ... ok
test test_i64_one_hot ... ok
test test_u8_one_cold ... ok
test test_rank_3_one_hot ... ok

test result: ok. 4 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running tests/ops.rs (target/debug/deps/ops-6e2e48dac30d5dd5)

running 12 tests
test mul_and_act_cpu ... ok
test sigmoid_cpu ... ok
test softmax_numerical_stability ... ok
test inplace_softmax_cpu ... ok
test rms_norm_cpu ... ok
test ln_cpu ... ok
test softmax_cpu ... ok
test rope_thd_cpu ... ok
test rope_cpu ... ok
test ropei_cpu ... ok
test lnl_cpu ... ok
test rms_norml_cpu ... ok

test result: ok. 12 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.11s

     Running tests/optim.rs (target/debug/deps/optim-848580560fa457c0)

running 4 tests
test sgd_optim ... ok
test adamw_linear_regression_varmap ... ok
test adamw_linear_regression ... ok
test sgd_linear_regression ... ok

test result: ok. 4 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.22s

     Running tests/rnn.rs (target/debug/deps/rnn-57d2f067589d4395)

running 2 tests
test gru ... ok
test lstm ... ok

test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running tests/sdpa.rs (target/debug/deps/sdpa-a54330466d99276a)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running unittests src/lib.rs (target/debug/deps/candle-c438b1f1d9a5d268)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running unittests src/lib.rs (target/debug/deps/candle_transformers-73c8063955c84d47)

running 7 tests
test models::segformer::tests::test_config_json_load ... ok
test models::whisper::audio::tests::test_dft ... ok
test models::whisper::audio::tests::test_fft ... ok
test models::whisper::audio::tests::test_tiny_log_mel_spectrogram ... ok
test models::mimi::conv::tests::conv_tr1d ... ok
test models::mimi::conv::tests::conv1d ... ok
test models::whisper::audio::tests::test_log_mel_spectrogram ... ok

test result: ok. 7 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.12s

     Running tests/generation_tests.rs (target/debug/deps/generation_tests-93b205f2ef03dcfd)

running 4 tests
test sample_with_zero_temperature ... ok
test sample_with_top_p ... ok
test sample_with_temperature ... ok
test sample_with_top_k ... ok

test result: ok. 4 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.01s

     Running tests/nms_tests.rs (target/debug/deps/nms_tests-66e0617b9ae3ee15)

running 8 tests
test nms_basic ... ok
test softnms_basic_functionality ... ok
test softnms_confidence_decay ... ok
test softnms_confidence_threshold ... ok
test softnms_equal_confidence_overlap ... ok
test softnms_no_bbox ... ok
test softnms_no_overlap ... ok
test softnms_single_bbox ... ok

test result: ok. 8 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running unittests src/lib.rs (target/debug/deps/candle_wasm_example_bert-d32a8d08148eb6de)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running unittests src/bin/m.rs (target/debug/deps/m-88c57d8734ca3d87)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running unittests src/lib.rs (target/debug/deps/candle_wasm_example_blip-1482ed79fde8f49d)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running unittests src/bin/m.rs (target/debug/deps/m-448eb4dd45f2c200)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running unittests src/lib.rs (target/debug/deps/candle_wasm_example_llama2-fb45fec61b0ecdfe)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running unittests src/bin/app.rs (target/debug/deps/app-6ecd2386b13249dd)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running unittests src/bin/m.rs (target/debug/deps/m-221677892e3e3561)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running unittests src/bin/worker.rs (target/debug/deps/worker-e5048e7eea342cf8)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running unittests src/lib.rs (target/debug/deps/candle_wasm_example_moondream-c2ae40360fbefea9)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running unittests src/bin/m.rs (target/debug/deps/m-ecb59475a6c994dc)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running unittests src/lib.rs (target/debug/deps/candle_wasm_example_phi-c744dd4ea239b6d8)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running unittests src/bin/m.rs (target/debug/deps/m-5ecc2b0a5631c4be)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running unittests src/lib.rs (target/debug/deps/candle_wasm_example_sam-832c04214f0d3fe6)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running unittests src/bin/m.rs (target/debug/deps/m-b0559cc3c41ab968)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running unittests src/lib.rs (target/debug/deps/candle_wasm_example_t5-882b79378ce82c07)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running unittests src/bin/m.rs (target/debug/deps/m-46171d80c9c931d9)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running unittests src/bin/m-quantized.rs (target/debug/deps/m_quantized-f77a81b5ac8e0283)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running unittests src/lib.rs (target/debug/deps/candle_wasm_example_whisper-5985649f4f7851cf)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running unittests src/bin/app.rs (target/debug/deps/app-8788f3ac3edadc47)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running unittests src/bin/m.rs (target/debug/deps/m-3b2c6226fa0390bc)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running unittests src/bin/worker.rs (target/debug/deps/worker-36a016e384f4571e)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running unittests src/lib.rs (target/debug/deps/candle_wasm_example_yolo-fbf0ba487002972d)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running unittests src/bin/app.rs (target/debug/deps/app-429a628689632285)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running unittests src/bin/m.rs (target/debug/deps/m-1f70c5313ae95118)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running unittests src/bin/worker.rs (target/debug/deps/worker-8d8ddec67afd4586)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running unittests src/lib.rs (target/debug/deps/candle_wasm_tests-6ec8b8a8f0509a74)

running 1 test
test tests::it_works ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running tests/quantized_tests.rs (target/debug/deps/quantized_tests-29e8460edb1dd5df)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running unittests src/main.rs (target/debug/deps/tensor_tools-22a03bb28d7004fa)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

   Doc-tests candle_book

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

   Doc-tests candle_core

running 38 tests
test candle-core/src/tensor.rs - tensor::Tensor::get (line 1833) ... ok
test candle-core/src/tensor.rs - tensor::Tensor::narrow (line 847) ... ok
test candle-core/src/tensor.rs - tensor::Tensor::empty_like (line 310) ... ok
test candle-core/src/tensor.rs - tensor::Tensor::ones_like (line 223) ... ok
test candle-core/src/tensor.rs - tensor::Tensor::zeros_like (line 264) ... ok
test candle-core/src/tensor.rs - tensor::Tensor::zeros (line 250) ... ok
test candle-core/src/tensor.rs - tensor::Tensor::get_on_dim (line 1853) ... ok
test candle-core/src/tensor_cat.rs - tensor_cat::Tensor::cat (line 9) ... ok
test candle-core/src/tensor.rs - tensor::Tensor::from_slice (line 555) ... ok
test candle-core/src/tensor.rs - tensor::Tensor::unsqueeze (line 2223) ... ok
test candle-core/src/indexer.rs - indexer::Tensor::i (line 144) ... ok
test candle-core/src/tensor.rs - tensor::Tensor::meshgrid (line 711) ... ok
test candle-core/src/tensor.rs - tensor::Tensor::arange_step (line 483) ... ok
test candle-core/src/tensor.rs - tensor::Tensor::empty (line 294) ... ok
test candle-core/src/indexer.rs - indexer::Tensor::index (line 9) ... ok
test candle-core/src/tensor.rs - tensor::Tensor::squeeze (line 2185) ... ok
test candle-core/src/tensor.rs - tensor::Tensor::ones (line 210) ... ok
test candle-core/src/tensor.rs - tensor::Tensor::affine (line 770) ... ok
test candle-core/src/indexer.rs - indexer::Tensor::i (line 180) ... ok
test candle-core/src/tensor.rs - tensor::Tensor::full (line 434) ... ok
test candle-core/src/tensor.rs - tensor::Tensor::flatten_all (line 1820) ... ok
test candle-core/src/tensor.rs - tensor::Tensor::permute (line 1916) ... ok
test candle-core/src/tensor.rs - tensor::Tensor::sum_all (line 1755) ... ok
test candle-core/src/tensor.rs - tensor::Tensor::embedding (line 1560) ... ok
test candle-core/src/tensor.rs - tensor::Tensor::to_dtype (line 2075) ... ok
test candle-core/src/tensor.rs - tensor::Tensor::mean_keepdim (line 1032) ... ok
test candle-core/src/tensor.rs - tensor::Tensor::stack (line 2260) ... ok
test candle-core/src/tensor_indexing.rs - tensor_indexing::Tensor::slice_assign (line 74) ... ok
test candle-core/src/tensor.rs - tensor::Tensor (line 57) ... ok
test candle-core/src/lib.rs - (line 3) ... ok
test candle-core/src/tensor.rs - tensor::Tensor::reshape (line 2138) ... ok
test candle-core/src/tensor.rs - tensor::Tensor::from_iter (line 452) ... ok
test candle-core/src/tensor.rs - tensor::Tensor::roll (line 972) ... ok
test candle-core/src/tensor.rs - tensor::Tensor::sum_keepdim (line 1004) ... ok
test candle-core/src/tensor.rs - tensor::Tensor::from_vec (line 535) ... ok
test candle-core/src/tensor.rs - tensor::Tensor::t (line 1872) ... ok
test candle-core/src/tensor.rs - tensor::Tensor::arange (line 470) ... ok
test candle-core/src/indexer.rs - indexer::Tensor::i (line 217) ... ok

test result: ok. 38 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 2.85s

   Doc-tests candle_datasets

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

   Doc-tests candle_examples

running 4 tests
test candle-examples/src/bs1770.rs - bs1770 (line 18) ... ignored
test candle-examples/src/bs1770.rs - bs1770::ChannelLoudnessMeter (line 294) ... ignored
test candle-examples/src/bs1770.rs - bs1770::ChannelLoudnessMeter::push (line 348) ... ignored
test candle-examples/src/bs1770.rs - bs1770::ChannelLoudnessMeter::push (line 364) ... ignored

test result: ok. 0 passed; 0 failed; 4 ignored; 0 measured; 0 filtered out; finished in 0.00s

   Doc-tests candle_nn

running 7 tests
test candle-nn/src/ops.rs - ops::attn_softmax_last_dim (line 961) ... ignored
test candle-nn/src/encoding.rs - encoding::one_hot (line 57) ... ok
test candle-nn/src/encoding.rs - encoding::one_hot (line 34) ... ok
test candle-nn/src/var_builder.rs - var_builder::VarBuilder<'a>::rename_f (line 645) ... ok
test candle-nn/src/ops.rs - ops::softmax (line 12) ... ok
test candle-nn/src/layer_norm.rs - layer_norm (line 9) ... ok
test candle-nn/src/linear.rs - linear (line 8) ... ok

test result: ok. 6 passed; 0 failed; 1 ignored; 0 measured; 0 filtered out; finished in 2.30s

   Doc-tests candle_transformers

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

   Doc-tests candle_wasm_example_bert

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

   Doc-tests candle_wasm_example_blip

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

   Doc-tests candle_wasm_example_llama2

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

   Doc-tests candle_wasm_example_moondream

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

   Doc-tests candle_wasm_example_phi

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

   Doc-tests candle_wasm_example_sam

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

   Doc-tests candle_wasm_example_t5

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

   Doc-tests candle_wasm_example_whisper

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

   Doc-tests candle_wasm_example_yolo

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

   Doc-tests candle_wasm_tests

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

@EricLBuehler
Copy link
Owner

@sempervictus did you run the tests on a CUDA build?

@sempervictus
Copy link
Author

@EricLBuehler built and ran cargo test on one of the GPU hosts w/ cuda 12.8 and 570 proprietary drivers. Didn't dig in deep enough to see if i need to pass explicit params to the test call though. Anything specific you want to see?

@sempervictus
Copy link
Author

So the hangs are consistent across models - no cache, no code :-(

sempervictus pushed a commit to sempervictus/mistral.rs that referenced this pull request Jun 13, 2025
Update CUDA (base image containers) to 12.9.

Drop minimum compute capability requirement to v7 - mistral-rs is
great on older devices which do not support flash attention (in
the same hardware facilities as v8+).

Enable NCCL feature for the CUDA build target.

Notes:
  Depends on EricLBuehler/candle#83 or
equivalent change to support 12.9 (max supported right now is 12.8)

  This should help to get better mileage for H/B series users
especially on Open drivers
sempervictus pushed a commit to sempervictus/mistral.rs that referenced this pull request Jun 14, 2025
Update CUDA (base image containers) to 12.9.

Drop minimum compute capability requirement to v7 - mistral-rs is
great on older devices which do not support flash attention (in
the same hardware facilities as v8+).

Enable NCCL feature for the CUDA build target.

Notes:
  Depends on EricLBuehler/candle#83 or
equivalent change to support 12.9 (max supported right now is 12.8)

  This should help to get better mileage for H/B series users
especially on Open drivers

Explicitly include NCCL in build and runtime

OpenMPI and NCCL runtimes only

Development/build container has libraries needed to build the NCCL
feature. Deploy NCCL and OpenMPI binary files to the runtime only.

NCCL present, fix lib path
@polarathene
Copy link

Tests mostly pass

Is there a failure there? Seems like it's ok to merge this change and use the newer cudarc?

@polarathene
Copy link

polarathene commented Jun 22, 2025

Cross-referencing, bumping this for mistral.rs is also dependent upon float8 bumping as well:

EricLBuehler/float8#3


UPDATE: Resolved by EricLBuehler/float8#5

sempervictus pushed a commit to sempervictus/mistral.rs that referenced this pull request Jul 24, 2025
Update CUDA (base image containers) to 12.9.

Drop minimum compute capability requirement to v7 - mistral-rs is
great on older devices which do not support flash attention (in
the same hardware facilities as v8+).

Enable NCCL feature for the CUDA build target.

Notes:
  Depends on EricLBuehler/candle#83 or
equivalent change to support 12.9 (max supported right now is 12.8)

  This should help to get better mileage for H/B series users
especially on Open drivers

Explicitly include NCCL in build and runtime

OpenMPI and NCCL runtimes only

Development/build container has libraries needed to build the NCCL
feature. Deploy NCCL and OpenMPI binary files to the runtime only.

NCCL present, fix lib path
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants