Add distributed launcher support for linex and metrix by mawad-amd · Pull Request #87 · AMDResearch/intellikit

mawad-amd · 2026-03-23T00:36:06Z

Summary

Add distributed profiling support to both linex and metrix with an explicit launcher parameter that ensures the correct command order (launcher rocprofv3 ... -- app instead of rocprofv3 ... -- launcher app)
Detect rank metadata (global_rank, local_rank, world_size, hostname, launcher) from environment variables set by torchrun, mpirun, srun, and horovodrun
Linex: rank-scoped output directories (rank0000/, rank0001/, ...), per-rank RankProfile objects, MCP per-rank hotspots
Metrix: rank metadata in ProfileResult/KernelResults/ProfilingResults, rank-suffixed output files (results.rank0003.json), --launcher CLI flag, rank columns in CSV/JSON/text output

API:

# Linex
profiler.profile(command="train.py", launcher="torchrun --nproc_per_node=8")

# Metrix Python
profiler.profile(command="train.py", launcher="mpirun -np 4")

# Metrix CLI
metrix profile --launcher "torchrun --nproc_per_node=8" -- train.py

Test plan

Unit tests for DistributedContext env detection (torchrun, mpirun, srun)
Unit tests for normalize_command_argv (string and sequence input)
Unit tests for apply_rank_suffix (with/without extension, single process)
Unit tests verifying correct command order with launcher (linex + metrix)
Unit tests verifying plain rocprofv3 without launcher
Unit tests for shlex parsing and rank field propagation in rocprof_wrapper
End-to-end smoke test with real GPUs and torchrun

🤖 Generated with Claude Code

- Add DistributedContext dataclass and env detection for torchrun, mpirun, srun, horovodrun - Linex: rank-scoped output dirs, RankProfile objects, MCP per-rank hotspots - Metrix: rank metadata in ProfileResult/KernelResults/ProfilingResults, rank-suffixed output files - CLI: argparse.REMAINDER for `-- launcher ...` syntax - Both: normalize_command_argv with shlex, accept str | Sequence[str] - Tests for distributed helpers, shlex parsing, rank field propagation Note: command construction is still rocprofv3-wraps-launcher (wrong order). Next step: fix to launcher-wraps-rocprofv3 for correct distributed profiling. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Previously, distributed commands like `torchrun --nproc_per_node=8 train.py` produced `rocprofv3 ... -- torchrun --nproc_per_node=8 train.py` which is wrong. rocprofv3 would profile the launcher process, not the per-rank GPU work. Now we split the command into launcher args and app args, producing: `torchrun --nproc_per_node=8 rocprofv3 ... -- train.py` The launcher spawns N processes, each running rocprofv3 around the app. Changes: - Add split_launcher_command() to both distributed.py modules - Handles torchrun, python -m torch.distributed.*, mpirun/mpiexec, srun, horovodrun - Update linex/api.py and metrix/rocprof_wrapper.py to use launcher wrapping - Add tests verifying correct command ordering for all launcher types Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Instead of trying to parse launcher flags from a combined command string (fragile, requires hardcoded flag sets per launcher), let the user provide the launcher and app commands separately: # Python API profiler.profile(command="train.py", launcher="torchrun --nproc_per_node=8") # Metrix CLI metrix profile --launcher "torchrun --nproc_per_node=8" -- train.py This is unambiguous, works with any launcher (including custom ones), and requires no flag-parsing maintenance. - Remove split_launcher_command() and all _split_* helpers - Add launcher parameter to Linex.profile(), Metrix.profile(), ROCProfV3Wrapper.profile(), CounterBackend.profile(), all backend implementations, CLI (--launcher flag), and MCP tools - Update tests and READMEs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

Adds first-class “distributed launcher” support to Linex and Metrix so profiling commands can be invoked as launcher rocprofv3 ... -- app, and introduces rank metadata propagation/suffixing utilities to avoid output clobbering.

Changes:

Add distributed context detection + argv normalization helpers (shlex-based) for both Metrix and Linex.
Extend Metrix/Linex APIs, CLI, and MCP tools with an explicit launcher parameter and propagate rank metadata into result objects/output.
Add unit tests covering env detection, argv normalization, rank suffixing, and launcher command ordering.

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
metrix/tests/unit/test_rocprof_wrapper.py	Adds tests for shlex parsing, rank metadata propagation, and launcher command ordering.
metrix/tests/unit/test_distributed.py	New unit tests for distributed helpers (env detection, argv normalization, rank suffixing).
metrix/src/metrix/utils/distributed.py	New distributed helper module (context detection, shlex argv normalization, rank suffixing).
metrix/src/metrix/profiler/rocprof_wrapper.py	Accepts `launcher`/`env`, uses shlex parsing, and annotates `ProfileResult` with distributed metadata.
metrix/src/metrix/mcp/server.py	Adds `launcher` param to MCP tool and includes rank metadata in the response.
metrix/src/metrix/cli/profile_cmd.py	Adds `--launcher` plumbing, remainder target parsing normalization, and rank-aware output formatting/suffixing.
metrix/src/metrix/cli/main.py	Adds `--launcher` flag and switches target parsing to `argparse.REMAINDER`.
metrix/src/metrix/backends/gfx942.py	Updates backend signatures to accept `launcher` (but currently not forwarded).
metrix/src/metrix/backends/gfx90a.py	Updates backend signatures to accept `launcher` (but currently not forwarded).
metrix/src/metrix/backends/gfx1201.py	Updates backend signatures to accept `launcher` (but currently not forwarded).
metrix/src/metrix/backends/base.py	Adds rank fields to `ProfileResult`, adds `launcher` to API, and adds rank-prefixed aggregation keys.
metrix/src/metrix/api.py	Adds `launcher` support and rank metadata to `ProfilingResults`/`KernelResults`.
metrix/README.md	Documents distributed launcher usage and rank-suffixed outputs.
linex/tests/test_distributed_api.py	New tests for distributed helpers, rank-scoped output, deterministic ui dir choice, and launcher ordering.
linex/src/linex/mcp/server.py	Adds `launcher` plumbing + per-rank outputs to MCP responses (currently contains a syntax error).
linex/src/linex/distributed.py	New distributed helper module for Linex (context detection + argv normalization).
linex/src/linex/api.py	Adds distributed context tracking, rank-scoped output dirs, launcher support, and `RankProfile` aggregation.
linex/src/linex/init.py	Exports `RankProfile` in the public API.
linex/README.md	Documents distributed launcher usage and new distributed properties.

Comments suppressed due to low confidence (3)

metrix/src/metrix/backends/gfx942.py:107

_run_rocprof now takes launcher, but the implementation ignores it when calling ROCProfV3Wrapper.profile(...). This makes launcher a no-op for this backend. Pass launcher=launcher through to the wrapper call (and ensure callers forward it).

    def _run_rocprof(
        self,
        command: str | Sequence[str],
        counters: List[str],
        kernel_filter: Optional[str] = None,
        cwd: Optional[str] = None,
        launcher: Optional[str | Sequence[str]] = None,
        timeout_seconds: Optional[int] = 0,
    ) -> List[ProfileResult]:
        """Run rocprofv3 and return results (single pass only - base class handles multi-pass)"""
        wrapper = ROCProfV3Wrapper(timeout_seconds=timeout_seconds)
        return wrapper.profile(command, counters, kernel_filter=kernel_filter, cwd=cwd)

metrix/src/metrix/backends/gfx1201.py:74

_run_rocprof accepts launcher but does not forward it to ROCProfV3Wrapper.profile(...), so --launcher has no effect for gfx1201. Pass launcher=launcher to the wrapper call (and ensure the base class forwards it when invoking _run_rocprof).

    def _run_rocprof(
        self,
        command: str | Sequence[str],
        counters: List[str],
        kernel_filter: Optional[str] = None,
        cwd: Optional[str] = None,
        launcher: Optional[str | Sequence[str]] = None,
        timeout_seconds: Optional[int] = 0,
        kernel_iteration_range: Optional[str] = None,
    ) -> List[ProfileResult]:
        wrapper = ROCProfV3Wrapper(timeout_seconds=timeout_seconds)
        extra_counters_path = Path(__file__).parent / "counter_defs.yaml"

        return wrapper.profile(
            command=command,
            counters=counters,
            kernel_filter=kernel_filter,
            cwd=cwd,
            kernel_iteration_range=kernel_iteration_range,
            extra_counters_path=extra_counters_path if extra_counters_path.exists() else None,
            arch=self.device_specs.arch,
        )

metrix/src/metrix/backends/gfx90a.py:107

_run_rocprof now accepts a launcher parameter, but it isn’t passed down to ROCProfV3Wrapper.profile(...), so the launcher never affects the actual subprocess command. Pass launcher=launcher through to the wrapper call (and ensure the base class forwards it when invoking _run_rocprof).

    def _run_rocprof(
        self,
        command: str | Sequence[str],
        counters: List[str],
        kernel_filter: Optional[str] = None,
        cwd: Optional[str] = None,
        launcher: Optional[str | Sequence[str]] = None,
        timeout_seconds: Optional[int] = 0,
    ) -> List[ProfileResult]:
        """Run rocprofv3 and return results (single pass only - base class handles multi-pass)"""
        wrapper = ROCProfV3Wrapper(timeout_seconds=timeout_seconds)
        return wrapper.profile(command, counters, kernel_filter=kernel_filter, cwd=cwd)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

linex/src/linex/api.py

linex/src/linex/mcp/server.py

metrix/src/metrix/backends/base.py

metrix/src/metrix/cli/profile_cmd.py

linex/src/linex/api.py

metrix/src/metrix/cli/profile_cmd.py

… Python 3.8 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… batch path, clarify launcher semantics - Move profile_command docstring above code (was displaced) - Forward launcher param in CounterBackend recursive batch call - Add Note sections to Linex.profile() and Metrix.profile() explaining that launcher is for mpirun-style use, and for torchrun the correct pattern is running metrix/linex under torchrun (not the reverse) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

mawad-amd and others added 3 commits March 22, 2026 19:23

Copilot AI review requested due to automatic review settings March 23, 2026 00:36

Copilot started reviewing on behalf of mawad-amd March 23, 2026 00:37 View session

Copilot AI reviewed Mar 23, 2026

View reviewed changes

mawad-amd and others added 5 commits March 23, 2026 01:03

Fix lint: remove duplicate launcher kwarg, fix parenthesized with for…

cb1ee05

… Python 3.8 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Fix lint: add missing launcher param to analyze_instruction_hotspots

9b07645

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Fix ruff formatting

22f8fad

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Fix launcher forwarding in CounterBackend batch path

06ba4de

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add distributed launcher support for linex and metrix#87

Add distributed launcher support for linex and metrix#87
mawad-amd wants to merge 8 commits intomainfrom
muhaawad/distributed-launchers

mawad-amd commented Mar 23, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mawad-amd commented Mar 23, 2026

Summary

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants