-
Notifications
You must be signed in to change notification settings - Fork 17
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
I'm trying to install microbenchmarks via headless mode but they don't seem to be recognized. Because of #26, I'm using v25.10 for now as it can install other workloads but it fails to do so in for microbenchmarks:
=== HEADLESS INSTALLATION MODE ===
✓ Configuration loaded from: /cm-tests/.../dgx-automation/config/dgx-headless-config-gb300.yaml
Environment type: uv
Install path: /cm-tests/dgxc-benchmarking/gb300/workloads
GPU type: gb300
Node architecture: aarch64
Install method: local
Selected workloads: pretrain_nemotron4-15b, pretrain_nemotron4-340b, pretrain_llama3.1, pretrain_deepseek-v3, pretrain_grok1, pretrain_nemotron-h, microbenchmark_cpu_overhead, microbenchmark_nccl
Development mode: Using repository at /cm-tests/dgxc-benchmarking/gb300
Error: Selected workloads not found: ['microbenchmark_cpu_overhead', 'microbenchmark_nccl']
Custom script failed for run gb300, version v25.10
Preparation step failed with code 1.
I tried it with different names like nccl and microbenchmark-nccl but none of them worked.
Steps/Code to reproduce bug
Here is my headless play file:
venv_type: uv
install_path: /cm-tests/dgxc-benchmarking/gb300/workloads
slurm_info:
slurm:
account: root
gpu_partition: main
cpu_partition: main
gpu_partition_gres: 8
cpu_partition_gres: null
node_architecture: aarch64
gpu_type: gb300
node_architecture: aarch64
install_method: local
selected_workloads:
- pretrain_nemotron4-15b
- pretrain_nemotron4-340b
- pretrain_llama3.1
- pretrain_deepseek-v3
- pretrain_grok1
- pretrain_nemotron-h
- microbenchmark_cpu_overhead
- microbenchmark_nccl
env_vars:
HF_TOKEN: hf_
And I use this command to run it: ./install.sh --play config.yaml -v -d.
Expected behavior
The installation will succeed, and in case there are errors the issue will be clearly indicated.
Environment details (please complete the following information):
Environment location: Cloud(Nebius)
Method of DGXC Benchmarking install: From source with UV
Run print_env.sh from the project root and paste the results here
By submitting this issue, you agree to follow our code of conduct.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working