Skip to content

[BUG] microbenchmarks can't be installed in headless mode #27

@keyvaann

Description

@keyvaann

Describe the bug
I'm trying to install microbenchmarks via headless mode but they don't seem to be recognized. Because of #26, I'm using v25.10 for now as it can install other workloads but it fails to do so in for microbenchmarks:

=== HEADLESS INSTALLATION MODE ===
✓ Configuration loaded from: /cm-tests/.../dgx-automation/config/dgx-headless-config-gb300.yaml
Environment type: uv
Install path: /cm-tests/dgxc-benchmarking/gb300/workloads
GPU type: gb300
Node architecture: aarch64
Install method: local
Selected workloads: pretrain_nemotron4-15b, pretrain_nemotron4-340b, pretrain_llama3.1, pretrain_deepseek-v3, pretrain_grok1, pretrain_nemotron-h, microbenchmark_cpu_overhead, microbenchmark_nccl

Development mode: Using repository at /cm-tests/dgxc-benchmarking/gb300
Error: Selected workloads not found: ['microbenchmark_cpu_overhead', 'microbenchmark_nccl']
Custom script failed for run gb300, version v25.10
Preparation step failed with code 1.

I tried it with different names like nccl and microbenchmark-nccl but none of them worked.

Steps/Code to reproduce bug
Here is my headless play file:

venv_type: uv
install_path: /cm-tests/dgxc-benchmarking/gb300/workloads
slurm_info:
  slurm:
    account: root
    gpu_partition: main
    cpu_partition: main
    gpu_partition_gres: 8
    cpu_partition_gres: null
    node_architecture: aarch64
gpu_type: gb300
node_architecture: aarch64
install_method: local
selected_workloads:
  - pretrain_nemotron4-15b
  - pretrain_nemotron4-340b
  - pretrain_llama3.1
  - pretrain_deepseek-v3
  - pretrain_grok1
  - pretrain_nemotron-h
  - microbenchmark_cpu_overhead
  - microbenchmark_nccl
env_vars:
  HF_TOKEN: hf_

And I use this command to run it: ./install.sh --play config.yaml -v -d.

Expected behavior
The installation will succeed, and in case there are errors the issue will be clearly indicated.

Environment details (please complete the following information):

Environment location: Cloud(Nebius)
Method of DGXC Benchmarking install: From source with UV
Run print_env.sh from the project root and paste the results here

By submitting this issue, you agree to follow our code of conduct.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions