Conversation
CUDA/cublas/etc interface library is getting somewhat long in the tooth compared to current CUDA releases. This is likely why 12.9 fails to build with the appropriate cotainer and may contain fix for the current matmul/cublas/drop stack traces breaking paged attention in mistralrs. Testing: Local build - didn't blow up, haven't gotten to mistral tests
WalkthroughThe Changes
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🔇 Additional comments (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
|
@CodeRabbit: shouldn't you be doing this sort of thing? @EricLBuehler has to battle the demon horde of CUDA in the warp... least you could do is maintain deps. ;-) |
|
Tip For best results, initiate chat on the files or code changes. @sempervictus Ha! I appreciate the Warhammer 40k reference - battling the CUDA daemon princes is indeed a worthy quest for any tech-adept! 🤖⚔️ You're absolutely right that dependency maintenance could benefit from more automation. While I can't currently push updates directly to repositories, I do try to help by:
For automated dependency updates, you might consider setting up:
The CUDA ecosystem is particularly brutal because of the tight coupling between driver versions, CUDA toolkit versions, and crate versions. At least with this cudarc bump to 0.16.4, you're getting better CUDA 12.9 support! Maybe someday CodeRabbit will evolve to include dependency management superpowers, but for now I'll stick to being your friendly code review daemon! 🐰 |
|
Tests mostly pass: Running unittests src/lib.rs (target/debug/deps/candle_book-d95950a445c7ead2)
running 3 tests
test simplified::tests::simplified ... ok
model.safetensors [00:00:00] [░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░] 164.92 KiB/420.05 MiB 143.33 KiB/s (50m)model.safetensors [00:00:06] [████████████████████████████████████████████████████████████████████████████████████████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░] 238.66 MiB/420.05 MiB 23.40 MiB/s (8s)test tests::book_hub_1 ... ok
model.safetensors [00:00:08] [████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████] 420.05 MiB/420.05 MiB 49.98 MiB/s (0s)test tests::book_hub_2 ... ok
test result: ok. 3 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 12.28s
Running unittests src/lib.rs (target/debug/deps/candle_core-70f424e451622714)
running 4 tests
test shape::tests::stride ... ok
test npy::tests::parse ... ok
test safetensors::tests::save_load_multiple_tensors ... ok
test safetensors::tests::save_single_tensor ... ok
test result: ok. 4 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running tests/conv_tests.rs (target/debug/deps/conv_tests-a7ff24af8f317c7a)
running 7 tests
test conv2d_non_square_cpu ... ok
test conv2d_smaller_cpu ... ok
test conv1d_small_cpu ... ok
test conv2d_small_cpu ... ok
test conv1d_cpu ... ok
test conv2d_cpu ... ok
test conv2d_grad_cpu ... ok
test result: ok. 7 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.03s
Running tests/custom_op_tests.rs (target/debug/deps/custom_op_tests-ccee4a02d2741dfb)
running 3 tests
test inplace_op1 ... ok
test custom_op1_no_backward ... ok
test custom_op1_with_backward ... ok
test result: ok. 3 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running tests/display_tests.rs (target/debug/deps/display_tests-166ccb75d7388f3c)
running 3 tests
test display_scalar ... ok
test display_vector ... ok
test display_multi_dim ... ok
test result: ok. 3 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running tests/grad_tests.rs (target/debug/deps/grad_tests-ee0f79326e67dfe1)
running 6 tests
test simple_grad_cpu ... ok
test sum_grad_cpu ... ok
test binary_grad_cpu ... ok
test unary_grad_cpu ... ok
test grad_descent_cpu ... ok
test matmul_grad_cpu ... ok
test result: ok. 6 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.02s
Running tests/indexing_tests.rs (target/debug/deps/indexing_tests-f04d4adde587f571)
running 9 tests
test integer_index ... ok
test range_index ... ok
test index_3d ... ok
test slice_assign_exact ... ok
test slice_assign_from ... ok
test slice_assign_full ... ok
test slice_assign_range ... ok
test slice_assign_to ... ok
test slice_assign_to_incl ... ok
test result: ok. 9 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running tests/layout_tests.rs (target/debug/deps/layout_tests-cfdeaece3cdd938c)
running 2 tests
test strided_blocks ... ok
test contiguous_cpu ... ok
test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running tests/matmul_tests.rs (target/debug/deps/matmul_tests-81181e9bae683e68)
running 7 tests
test matmul_bf16_cpu ... ok
test matmul_alpha_cpu ... ok
test squeeze_mm_cpu ... ok
test matmul_alpha_beta_cpu ... ok
test mm_layout_cpu ... ok
test matmul_cpu ... ok
test broadcast_matmul_cpu ... ok
test result: ok. 7 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.02s
Running tests/pool_tests.rs (target/debug/deps/pool_tests-1442a083360dd818)
running 4 tests
test avg_pool2d_cpu ... ok
test max_pool2d_cpu ... ok
test avg_pool2d_pytorch_cpu ... ok
test upsample_nearest2d_cpu ... ok
test result: ok. 4 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running tests/pth_tests.rs (target/debug/deps/pth_tests-156fb033fee777fa)
running 3 tests
test test_pth_with_key ... ok
test test_pth ... ok
test test_pth_fortran_congiguous ... ok
test result: ok. 3 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running tests/quantized_tests.rs (target/debug/deps/quantized_tests-7f4fb7b8883d1508)
running 35 tests
test quantize_q4_1_cpu ... ok
test quantize_q4_0_cpu ... ok
test quantize_q5_0_cpu ... ok
test quantize_q5_1_cpu ... ok
test quantize_q8k_cpu ... ok
test quantize_q6k_cpu ... ok
test quantize_q3k_cpu ... ok
test quantize_q2k_cpu ... ok
test quantize_q4k_cpu ... ok
test quantize_q5k_cpu ... ok
test quantized_mm ... ok
test quantized_matmul_q4_1_cpu ... ok
test quantized_matmul_q4_0_cpu ... ok
test quantized_matmul_q8_0_cpu ... ok
test quantized_matmul_q5_1_cpu ... ok
test quantized_matmul_q5_0_cpu ... ok
test quantized_matmul_q6k_cpu ... ok
test quantized_matmul_q3k_cpu ... ok
test qmm_cpu ... ok
test quantized_matmul_q4k_cpu ... ok
test quantized_matmul_q2k_cpu ... ok
test quantized_matmul_q5k_cpu ... ok
test qmm_n_cpu ... ok
test qmm_b_cpu ... ok
test quantized_matmul_q8k ... ok
test quantized_matmul_q6k ... ok
test quantized_matmul_q3k ... ok
test quantized_matmul_q2k ... ok
test quantized_matmul_q4k ... ok
test quantized_matmul_q5k ... ok
test imatrix_quantize_q6k ... ok
test imatrix_quantize_q3k ... ok
test imatrix_quantize_q4k ... ok
test imatrix_quantize_q5k ... ok
test imatrix_quantize_q2k ... ok
test result: ok. 35 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 7.60s
Running tests/serialization_tests.rs (target/debug/deps/serialization_tests-7c30c3127e28f428)
running 3 tests
test npy ... ok
test npz ... ok
test safetensors ... ok
test result: ok. 3 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running tests/tensor_tests.rs (target/debug/deps/tensor_tests-1c67c8165560ef8b)
running 39 tests
test arange_cpu ... ok
test broadcast_cpu ... ok
test add_mul_cpu ... ok
test binary_op_cpu ... ok
test full_cpu ... ok
test embeddings_cpu ... ok
test i64_abs ... ok
test broadcasting_cpu ... ok
test clamp_cpu ... ok
test cmp_cpu ... ok
test gather_cpu ... ok
test index_add_cpu ... ok
test cat_cpu ... ok
test index_select_cpu ... ok
test narrow_cpu ... ok
test log_sum_exp ... ok
test ones_cpu ... ok
test pad_with_same ... ok
test pow ... ok
test scatter_add_cpu ... ok
test slice_scatter_cpu ... ok
test randn_hasneg ... ok
test tensor_2d_cpu ... ok
test ss_cpu ... ok
test transpose_cpu ... ok
test randn_cpu ... ok
test tril_triu_eye ... ok
test unfold ... ok
test zeros_cpu ... ok
test zero_dim_cpu ... ok
test var_cpu ... ok
test unary_op_cpu ... ok
test argmin_cpu ... ok
test argmax_cpu ... ok
test min_cpu ... ok
test max_cpu ... ok
test asort_cpu ... ok
test sum_cpu ... ok
test cumsum ... ok
test result: ok. 39 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.02s
Running unittests src/lib.rs (target/debug/deps/candle_datasets-96de18b77a127777)
running 1 test
default/train/0000.parquet [00:00:00] [████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████] 1.61 MiB/1.61 MiB 6.67 MiB/s (0s)test hub::tests::test_dataset ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.45s
Running unittests src/lib.rs (target/debug/deps/candle_examples-8119ab6131620d26)
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running unittests src/lib.rs (target/debug/deps/candle_nn-ae77522d8a01c7b4)
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running tests/batch_norm.rs (target/debug/deps/batch_norm-53bf8606a1ceeb03)
running 2 tests
test train_batch_norm ... ok
test batch_norm_test ... ok
test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running tests/group_norm.rs (target/debug/deps/group_norm-094a7369c275dcf5)
running 1 test
test group_norm ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running tests/kv_cache.rs (target/debug/deps/kv_cache-301c1d1bfcd180d1)
running 2 tests
test kv_cache ... ok
test rotating_kv_cache ... ok
test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running tests/layer_norm.rs (target/debug/deps/layer_norm-534eaf0afe229ff1)
running 1 test
test layer_norm ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.01s
Running tests/loss.rs (target/debug/deps/loss-90bc2de932265a6c)
running 2 tests
test binary_cross_entropy_with_logit ... ok
test nll_and_cross_entropy ... ok
test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running tests/one_hot.rs (target/debug/deps/one_hot-ffb6c2a51f64e0e0)
running 4 tests
test test_iter ... ok
test test_i64_one_hot ... ok
test test_u8_one_cold ... ok
test test_rank_3_one_hot ... ok
test result: ok. 4 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running tests/ops.rs (target/debug/deps/ops-6e2e48dac30d5dd5)
running 12 tests
test mul_and_act_cpu ... ok
test sigmoid_cpu ... ok
test softmax_numerical_stability ... ok
test inplace_softmax_cpu ... ok
test rms_norm_cpu ... ok
test ln_cpu ... ok
test softmax_cpu ... ok
test rope_thd_cpu ... ok
test rope_cpu ... ok
test ropei_cpu ... ok
test lnl_cpu ... ok
test rms_norml_cpu ... ok
test result: ok. 12 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.11s
Running tests/optim.rs (target/debug/deps/optim-848580560fa457c0)
running 4 tests
test sgd_optim ... ok
test adamw_linear_regression_varmap ... ok
test adamw_linear_regression ... ok
test sgd_linear_regression ... ok
test result: ok. 4 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.22s
Running tests/rnn.rs (target/debug/deps/rnn-57d2f067589d4395)
running 2 tests
test gru ... ok
test lstm ... ok
test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running tests/sdpa.rs (target/debug/deps/sdpa-a54330466d99276a)
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running unittests src/lib.rs (target/debug/deps/candle-c438b1f1d9a5d268)
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running unittests src/lib.rs (target/debug/deps/candle_transformers-73c8063955c84d47)
running 7 tests
test models::segformer::tests::test_config_json_load ... ok
test models::whisper::audio::tests::test_dft ... ok
test models::whisper::audio::tests::test_fft ... ok
test models::whisper::audio::tests::test_tiny_log_mel_spectrogram ... ok
test models::mimi::conv::tests::conv_tr1d ... ok
test models::mimi::conv::tests::conv1d ... ok
test models::whisper::audio::tests::test_log_mel_spectrogram ... ok
test result: ok. 7 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.12s
Running tests/generation_tests.rs (target/debug/deps/generation_tests-93b205f2ef03dcfd)
running 4 tests
test sample_with_zero_temperature ... ok
test sample_with_top_p ... ok
test sample_with_temperature ... ok
test sample_with_top_k ... ok
test result: ok. 4 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.01s
Running tests/nms_tests.rs (target/debug/deps/nms_tests-66e0617b9ae3ee15)
running 8 tests
test nms_basic ... ok
test softnms_basic_functionality ... ok
test softnms_confidence_decay ... ok
test softnms_confidence_threshold ... ok
test softnms_equal_confidence_overlap ... ok
test softnms_no_bbox ... ok
test softnms_no_overlap ... ok
test softnms_single_bbox ... ok
test result: ok. 8 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running unittests src/lib.rs (target/debug/deps/candle_wasm_example_bert-d32a8d08148eb6de)
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running unittests src/bin/m.rs (target/debug/deps/m-88c57d8734ca3d87)
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running unittests src/lib.rs (target/debug/deps/candle_wasm_example_blip-1482ed79fde8f49d)
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running unittests src/bin/m.rs (target/debug/deps/m-448eb4dd45f2c200)
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running unittests src/lib.rs (target/debug/deps/candle_wasm_example_llama2-fb45fec61b0ecdfe)
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running unittests src/bin/app.rs (target/debug/deps/app-6ecd2386b13249dd)
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running unittests src/bin/m.rs (target/debug/deps/m-221677892e3e3561)
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running unittests src/bin/worker.rs (target/debug/deps/worker-e5048e7eea342cf8)
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running unittests src/lib.rs (target/debug/deps/candle_wasm_example_moondream-c2ae40360fbefea9)
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running unittests src/bin/m.rs (target/debug/deps/m-ecb59475a6c994dc)
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running unittests src/lib.rs (target/debug/deps/candle_wasm_example_phi-c744dd4ea239b6d8)
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running unittests src/bin/m.rs (target/debug/deps/m-5ecc2b0a5631c4be)
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running unittests src/lib.rs (target/debug/deps/candle_wasm_example_sam-832c04214f0d3fe6)
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running unittests src/bin/m.rs (target/debug/deps/m-b0559cc3c41ab968)
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running unittests src/lib.rs (target/debug/deps/candle_wasm_example_t5-882b79378ce82c07)
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running unittests src/bin/m.rs (target/debug/deps/m-46171d80c9c931d9)
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running unittests src/bin/m-quantized.rs (target/debug/deps/m_quantized-f77a81b5ac8e0283)
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running unittests src/lib.rs (target/debug/deps/candle_wasm_example_whisper-5985649f4f7851cf)
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running unittests src/bin/app.rs (target/debug/deps/app-8788f3ac3edadc47)
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running unittests src/bin/m.rs (target/debug/deps/m-3b2c6226fa0390bc)
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running unittests src/bin/worker.rs (target/debug/deps/worker-36a016e384f4571e)
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running unittests src/lib.rs (target/debug/deps/candle_wasm_example_yolo-fbf0ba487002972d)
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running unittests src/bin/app.rs (target/debug/deps/app-429a628689632285)
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running unittests src/bin/m.rs (target/debug/deps/m-1f70c5313ae95118)
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running unittests src/bin/worker.rs (target/debug/deps/worker-8d8ddec67afd4586)
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running unittests src/lib.rs (target/debug/deps/candle_wasm_tests-6ec8b8a8f0509a74)
running 1 test
test tests::it_works ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running tests/quantized_tests.rs (target/debug/deps/quantized_tests-29e8460edb1dd5df)
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running unittests src/main.rs (target/debug/deps/tensor_tools-22a03bb28d7004fa)
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Doc-tests candle_book
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Doc-tests candle_core
running 38 tests
test candle-core/src/tensor.rs - tensor::Tensor::get (line 1833) ... ok
test candle-core/src/tensor.rs - tensor::Tensor::narrow (line 847) ... ok
test candle-core/src/tensor.rs - tensor::Tensor::empty_like (line 310) ... ok
test candle-core/src/tensor.rs - tensor::Tensor::ones_like (line 223) ... ok
test candle-core/src/tensor.rs - tensor::Tensor::zeros_like (line 264) ... ok
test candle-core/src/tensor.rs - tensor::Tensor::zeros (line 250) ... ok
test candle-core/src/tensor.rs - tensor::Tensor::get_on_dim (line 1853) ... ok
test candle-core/src/tensor_cat.rs - tensor_cat::Tensor::cat (line 9) ... ok
test candle-core/src/tensor.rs - tensor::Tensor::from_slice (line 555) ... ok
test candle-core/src/tensor.rs - tensor::Tensor::unsqueeze (line 2223) ... ok
test candle-core/src/indexer.rs - indexer::Tensor::i (line 144) ... ok
test candle-core/src/tensor.rs - tensor::Tensor::meshgrid (line 711) ... ok
test candle-core/src/tensor.rs - tensor::Tensor::arange_step (line 483) ... ok
test candle-core/src/tensor.rs - tensor::Tensor::empty (line 294) ... ok
test candle-core/src/indexer.rs - indexer::Tensor::index (line 9) ... ok
test candle-core/src/tensor.rs - tensor::Tensor::squeeze (line 2185) ... ok
test candle-core/src/tensor.rs - tensor::Tensor::ones (line 210) ... ok
test candle-core/src/tensor.rs - tensor::Tensor::affine (line 770) ... ok
test candle-core/src/indexer.rs - indexer::Tensor::i (line 180) ... ok
test candle-core/src/tensor.rs - tensor::Tensor::full (line 434) ... ok
test candle-core/src/tensor.rs - tensor::Tensor::flatten_all (line 1820) ... ok
test candle-core/src/tensor.rs - tensor::Tensor::permute (line 1916) ... ok
test candle-core/src/tensor.rs - tensor::Tensor::sum_all (line 1755) ... ok
test candle-core/src/tensor.rs - tensor::Tensor::embedding (line 1560) ... ok
test candle-core/src/tensor.rs - tensor::Tensor::to_dtype (line 2075) ... ok
test candle-core/src/tensor.rs - tensor::Tensor::mean_keepdim (line 1032) ... ok
test candle-core/src/tensor.rs - tensor::Tensor::stack (line 2260) ... ok
test candle-core/src/tensor_indexing.rs - tensor_indexing::Tensor::slice_assign (line 74) ... ok
test candle-core/src/tensor.rs - tensor::Tensor (line 57) ... ok
test candle-core/src/lib.rs - (line 3) ... ok
test candle-core/src/tensor.rs - tensor::Tensor::reshape (line 2138) ... ok
test candle-core/src/tensor.rs - tensor::Tensor::from_iter (line 452) ... ok
test candle-core/src/tensor.rs - tensor::Tensor::roll (line 972) ... ok
test candle-core/src/tensor.rs - tensor::Tensor::sum_keepdim (line 1004) ... ok
test candle-core/src/tensor.rs - tensor::Tensor::from_vec (line 535) ... ok
test candle-core/src/tensor.rs - tensor::Tensor::t (line 1872) ... ok
test candle-core/src/tensor.rs - tensor::Tensor::arange (line 470) ... ok
test candle-core/src/indexer.rs - indexer::Tensor::i (line 217) ... ok
test result: ok. 38 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 2.85s
Doc-tests candle_datasets
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Doc-tests candle_examples
running 4 tests
test candle-examples/src/bs1770.rs - bs1770 (line 18) ... ignored
test candle-examples/src/bs1770.rs - bs1770::ChannelLoudnessMeter (line 294) ... ignored
test candle-examples/src/bs1770.rs - bs1770::ChannelLoudnessMeter::push (line 348) ... ignored
test candle-examples/src/bs1770.rs - bs1770::ChannelLoudnessMeter::push (line 364) ... ignored
test result: ok. 0 passed; 0 failed; 4 ignored; 0 measured; 0 filtered out; finished in 0.00s
Doc-tests candle_nn
running 7 tests
test candle-nn/src/ops.rs - ops::attn_softmax_last_dim (line 961) ... ignored
test candle-nn/src/encoding.rs - encoding::one_hot (line 57) ... ok
test candle-nn/src/encoding.rs - encoding::one_hot (line 34) ... ok
test candle-nn/src/var_builder.rs - var_builder::VarBuilder<'a>::rename_f (line 645) ... ok
test candle-nn/src/ops.rs - ops::softmax (line 12) ... ok
test candle-nn/src/layer_norm.rs - layer_norm (line 9) ... ok
test candle-nn/src/linear.rs - linear (line 8) ... ok
test result: ok. 6 passed; 0 failed; 1 ignored; 0 measured; 0 filtered out; finished in 2.30s
Doc-tests candle_transformers
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Doc-tests candle_wasm_example_bert
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Doc-tests candle_wasm_example_blip
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Doc-tests candle_wasm_example_llama2
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Doc-tests candle_wasm_example_moondream
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Doc-tests candle_wasm_example_phi
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Doc-tests candle_wasm_example_sam
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Doc-tests candle_wasm_example_t5
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Doc-tests candle_wasm_example_whisper
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Doc-tests candle_wasm_example_yolo
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Doc-tests candle_wasm_tests
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s |
|
@sempervictus did you run the tests on a CUDA build? |
|
@EricLBuehler built and ran |
|
So the hangs are consistent across models - no cache, no code :-( |
Update CUDA (base image containers) to 12.9. Drop minimum compute capability requirement to v7 - mistral-rs is great on older devices which do not support flash attention (in the same hardware facilities as v8+). Enable NCCL feature for the CUDA build target. Notes: Depends on EricLBuehler/candle#83 or equivalent change to support 12.9 (max supported right now is 12.8) This should help to get better mileage for H/B series users especially on Open drivers
Update CUDA (base image containers) to 12.9. Drop minimum compute capability requirement to v7 - mistral-rs is great on older devices which do not support flash attention (in the same hardware facilities as v8+). Enable NCCL feature for the CUDA build target. Notes: Depends on EricLBuehler/candle#83 or equivalent change to support 12.9 (max supported right now is 12.8) This should help to get better mileage for H/B series users especially on Open drivers Explicitly include NCCL in build and runtime OpenMPI and NCCL runtimes only Development/build container has libraries needed to build the NCCL feature. Deploy NCCL and OpenMPI binary files to the runtime only. NCCL present, fix lib path
Is there a failure there? Seems like it's ok to merge this change and use the newer cudarc? |
|
Cross-referencing, bumping this for UPDATE: Resolved by EricLBuehler/float8#5 |
Update CUDA (base image containers) to 12.9. Drop minimum compute capability requirement to v7 - mistral-rs is great on older devices which do not support flash attention (in the same hardware facilities as v8+). Enable NCCL feature for the CUDA build target. Notes: Depends on EricLBuehler/candle#83 or equivalent change to support 12.9 (max supported right now is 12.8) This should help to get better mileage for H/B series users especially on Open drivers Explicitly include NCCL in build and runtime OpenMPI and NCCL runtimes only Development/build container has libraries needed to build the NCCL feature. Deploy NCCL and OpenMPI binary files to the runtime only. NCCL present, fix lib path
CUDA/cublas/etc interface library is getting somewhat long in the tooth compared to current CUDA releases. This is likely why 12.9 fails to build with the appropriate cotainer and may contain fix for the current matmul/cublas/drop stack traces breaking paged attention in mistralrs.
Testing:
Local build - didn't blow up, haven't gotten to mistral tests
Summary by CodeRabbit