Skip to content

Add distributed launcher support for linex and metrix#87

Open
mawad-amd wants to merge 8 commits intomainfrom
muhaawad/distributed-launchers
Open

Add distributed launcher support for linex and metrix#87
mawad-amd wants to merge 8 commits intomainfrom
muhaawad/distributed-launchers

Conversation

@mawad-amd
Copy link
Member

Summary

  • Add distributed profiling support to both linex and metrix with an explicit launcher parameter that ensures the correct command order (launcher rocprofv3 ... -- app instead of rocprofv3 ... -- launcher app)
  • Detect rank metadata (global_rank, local_rank, world_size, hostname, launcher) from environment variables set by torchrun, mpirun, srun, and horovodrun
  • Linex: rank-scoped output directories (rank0000/, rank0001/, ...), per-rank RankProfile objects, MCP per-rank hotspots
  • Metrix: rank metadata in ProfileResult/KernelResults/ProfilingResults, rank-suffixed output files (results.rank0003.json), --launcher CLI flag, rank columns in CSV/JSON/text output

API:

# Linex
profiler.profile(command="train.py", launcher="torchrun --nproc_per_node=8")

# Metrix Python
profiler.profile(command="train.py", launcher="mpirun -np 4")

# Metrix CLI
metrix profile --launcher "torchrun --nproc_per_node=8" -- train.py

Test plan

  • Unit tests for DistributedContext env detection (torchrun, mpirun, srun)
  • Unit tests for normalize_command_argv (string and sequence input)
  • Unit tests for apply_rank_suffix (with/without extension, single process)
  • Unit tests verifying correct command order with launcher (linex + metrix)
  • Unit tests verifying plain rocprofv3 without launcher
  • Unit tests for shlex parsing and rank field propagation in rocprof_wrapper
  • End-to-end smoke test with real GPUs and torchrun

🤖 Generated with Claude Code

mawad-amd and others added 3 commits March 22, 2026 19:23
- Add DistributedContext dataclass and env detection for torchrun, mpirun, srun, horovodrun
- Linex: rank-scoped output dirs, RankProfile objects, MCP per-rank hotspots
- Metrix: rank metadata in ProfileResult/KernelResults/ProfilingResults, rank-suffixed output files
- CLI: argparse.REMAINDER for `-- launcher ...` syntax
- Both: normalize_command_argv with shlex, accept str | Sequence[str]
- Tests for distributed helpers, shlex parsing, rank field propagation

Note: command construction is still rocprofv3-wraps-launcher (wrong order).
Next step: fix to launcher-wraps-rocprofv3 for correct distributed profiling.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Previously, distributed commands like `torchrun --nproc_per_node=8 train.py`
produced `rocprofv3 ... -- torchrun --nproc_per_node=8 train.py` which is wrong.
rocprofv3 would profile the launcher process, not the per-rank GPU work.

Now we split the command into launcher args and app args, producing:
`torchrun --nproc_per_node=8 rocprofv3 ... -- train.py`

The launcher spawns N processes, each running rocprofv3 around the app.

Changes:
- Add split_launcher_command() to both distributed.py modules
- Handles torchrun, python -m torch.distributed.*, mpirun/mpiexec, srun, horovodrun
- Update linex/api.py and metrix/rocprof_wrapper.py to use launcher wrapping
- Add tests verifying correct command ordering for all launcher types

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Instead of trying to parse launcher flags from a combined command string
(fragile, requires hardcoded flag sets per launcher), let the user provide
the launcher and app commands separately:

  # Python API
  profiler.profile(command="train.py", launcher="torchrun --nproc_per_node=8")

  # Metrix CLI
  metrix profile --launcher "torchrun --nproc_per_node=8" -- train.py

This is unambiguous, works with any launcher (including custom ones), and
requires no flag-parsing maintenance.

- Remove split_launcher_command() and all _split_* helpers
- Add launcher parameter to Linex.profile(), Metrix.profile(),
  ROCProfV3Wrapper.profile(), CounterBackend.profile(), all backend
  implementations, CLI (--launcher flag), and MCP tools
- Update tests and READMEs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 23, 2026 00:36
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds first-class “distributed launcher” support to Linex and Metrix so profiling commands can be invoked as launcher rocprofv3 ... -- app, and introduces rank metadata propagation/suffixing utilities to avoid output clobbering.

Changes:

  • Add distributed context detection + argv normalization helpers (shlex-based) for both Metrix and Linex.
  • Extend Metrix/Linex APIs, CLI, and MCP tools with an explicit launcher parameter and propagate rank metadata into result objects/output.
  • Add unit tests covering env detection, argv normalization, rank suffixing, and launcher command ordering.

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
metrix/tests/unit/test_rocprof_wrapper.py Adds tests for shlex parsing, rank metadata propagation, and launcher command ordering.
metrix/tests/unit/test_distributed.py New unit tests for distributed helpers (env detection, argv normalization, rank suffixing).
metrix/src/metrix/utils/distributed.py New distributed helper module (context detection, shlex argv normalization, rank suffixing).
metrix/src/metrix/profiler/rocprof_wrapper.py Accepts launcher/env, uses shlex parsing, and annotates ProfileResult with distributed metadata.
metrix/src/metrix/mcp/server.py Adds launcher param to MCP tool and includes rank metadata in the response.
metrix/src/metrix/cli/profile_cmd.py Adds --launcher plumbing, remainder target parsing normalization, and rank-aware output formatting/suffixing.
metrix/src/metrix/cli/main.py Adds --launcher flag and switches target parsing to argparse.REMAINDER.
metrix/src/metrix/backends/gfx942.py Updates backend signatures to accept launcher (but currently not forwarded).
metrix/src/metrix/backends/gfx90a.py Updates backend signatures to accept launcher (but currently not forwarded).
metrix/src/metrix/backends/gfx1201.py Updates backend signatures to accept launcher (but currently not forwarded).
metrix/src/metrix/backends/base.py Adds rank fields to ProfileResult, adds launcher to API, and adds rank-prefixed aggregation keys.
metrix/src/metrix/api.py Adds launcher support and rank metadata to ProfilingResults/KernelResults.
metrix/README.md Documents distributed launcher usage and rank-suffixed outputs.
linex/tests/test_distributed_api.py New tests for distributed helpers, rank-scoped output, deterministic ui dir choice, and launcher ordering.
linex/src/linex/mcp/server.py Adds launcher plumbing + per-rank outputs to MCP responses (currently contains a syntax error).
linex/src/linex/distributed.py New distributed helper module for Linex (context detection + argv normalization).
linex/src/linex/api.py Adds distributed context tracking, rank-scoped output dirs, launcher support, and RankProfile aggregation.
linex/src/linex/init.py Exports RankProfile in the public API.
linex/README.md Documents distributed launcher usage and new distributed properties.
Comments suppressed due to low confidence (3)

metrix/src/metrix/backends/gfx942.py:107

  • _run_rocprof now takes launcher, but the implementation ignores it when calling ROCProfV3Wrapper.profile(...). This makes launcher a no-op for this backend. Pass launcher=launcher through to the wrapper call (and ensure callers forward it).
    def _run_rocprof(
        self,
        command: str | Sequence[str],
        counters: List[str],
        kernel_filter: Optional[str] = None,
        cwd: Optional[str] = None,
        launcher: Optional[str | Sequence[str]] = None,
        timeout_seconds: Optional[int] = 0,
    ) -> List[ProfileResult]:
        """Run rocprofv3 and return results (single pass only - base class handles multi-pass)"""
        wrapper = ROCProfV3Wrapper(timeout_seconds=timeout_seconds)
        return wrapper.profile(command, counters, kernel_filter=kernel_filter, cwd=cwd)

metrix/src/metrix/backends/gfx1201.py:74

  • _run_rocprof accepts launcher but does not forward it to ROCProfV3Wrapper.profile(...), so --launcher has no effect for gfx1201. Pass launcher=launcher to the wrapper call (and ensure the base class forwards it when invoking _run_rocprof).
    def _run_rocprof(
        self,
        command: str | Sequence[str],
        counters: List[str],
        kernel_filter: Optional[str] = None,
        cwd: Optional[str] = None,
        launcher: Optional[str | Sequence[str]] = None,
        timeout_seconds: Optional[int] = 0,
        kernel_iteration_range: Optional[str] = None,
    ) -> List[ProfileResult]:
        wrapper = ROCProfV3Wrapper(timeout_seconds=timeout_seconds)
        extra_counters_path = Path(__file__).parent / "counter_defs.yaml"

        return wrapper.profile(
            command=command,
            counters=counters,
            kernel_filter=kernel_filter,
            cwd=cwd,
            kernel_iteration_range=kernel_iteration_range,
            extra_counters_path=extra_counters_path if extra_counters_path.exists() else None,
            arch=self.device_specs.arch,
        )

metrix/src/metrix/backends/gfx90a.py:107

  • _run_rocprof now accepts a launcher parameter, but it isn’t passed down to ROCProfV3Wrapper.profile(...), so the launcher never affects the actual subprocess command. Pass launcher=launcher through to the wrapper call (and ensure the base class forwards it when invoking _run_rocprof).
    def _run_rocprof(
        self,
        command: str | Sequence[str],
        counters: List[str],
        kernel_filter: Optional[str] = None,
        cwd: Optional[str] = None,
        launcher: Optional[str | Sequence[str]] = None,
        timeout_seconds: Optional[int] = 0,
    ) -> List[ProfileResult]:
        """Run rocprofv3 and return results (single pass only - base class handles multi-pass)"""
        wrapper = ROCProfV3Wrapper(timeout_seconds=timeout_seconds)
        return wrapper.profile(command, counters, kernel_filter=kernel_filter, cwd=cwd)


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

mawad-amd and others added 5 commits March 23, 2026 01:03
… Python 3.8

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… batch path, clarify launcher semantics

- Move profile_command docstring above code (was displaced)
- Forward launcher param in CounterBackend recursive batch call
- Add Note sections to Linex.profile() and Metrix.profile() explaining
  that launcher is for mpirun-style use, and for torchrun the correct
  pattern is running metrix/linex under torchrun (not the reverse)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants