IntelliKit

Profiling and analysis for AMD software — GPU, host CPU, and LLM workflows

IntelliKit is a set of Python tools for AMD-focused performance and validation. Most of the stack targets GPUs through ROCm, turning hardware counters, traces, and dispatch data into clear APIs you can use from Python. uprof_mcp adds AMD uProf for host-side CPU hotspot analysis in the same toolbox. For LLM-style workflows you also get Model Context Protocol (MCP) servers (profiling, HIP compile, HIP docs, rocminfo, …) and agent skills — installable SKILL.md playbooks for Kerncap, Metrix, Linex, Nexus, and Accordo (install/skills/install.sh). Use the stack from a notebook, a script, an MCP client, or Cursor / Claude / Codex.

What’s in the box

Rough workflow: isolate a kernel → profile it (counters and/or source lines) → lean on other helpers (what actually ran, MCP, skills, CPU profiling) → validate that changes are still correct.

Tool	What it’s for	Docs
Kerncap	Isolate — capture dispatches and build standalone reproducers (HIP, Triton).	README · examples
Metrix	Profile — human-readable metrics from hardware counters (bandwidth, cache, etc.).	README · examples
Linex	Profile — source-line timing and stalls (compile with `-g` for file:line mapping).	README · examples
Nexus	Inspect — from HSA packets, see what ran: source and assembly.	README · examples
rocm_mcp	MCP — HIP compile, HIP docs, rocminfo, and related servers for agents.	README · examples
uprof_mcp	CPU — MCP bridge to AMD uProf for host-side hotspots.	README · examples
Accordo	Validate — prove an optimized kernel still matches a reference.	README · examples

Idea in one line: pull a kernel out with Kerncap, understand it with Metrix and Linex, dig into execution with Nexus, wire agents with MCP and skills, add uProf when you care about the host, then lock in correctness with Accordo.

Quick start

Tools — every package from Git via pip (install/tools/install.sh; no metapackage at the repo root):

curl -sSL https://raw.githubusercontent.com/AMDResearch/intellikit/main/install/tools/install.sh | bash

Skills — agent skill files for Kerncap, Metrix, Linex, Nexus, and Accordo (install/skills/install.sh):

curl -sSL https://raw.githubusercontent.com/AMDResearch/intellikit/main/install/skills/install.sh | bash

Clone? Use ./install/tools/install.sh and ./install/skills/install.sh. Pipe from curl? Put flags after bash -s -- (example: … | bash -s -- --tools metrix,linex). --help on either script lists the rest.

Requirements

Requirement	Notes
Python	3.10 or newer
ROCm	6.0+ for GPU packages (use 7.0+ for Linex); skip if you only use host-side tools like `uprof_mcp`
GPU	MI300+ for the full GPU experience; some pieces vary by tool — see each package’s README
uProf	AMD uProf on x86 for `uprof_mcp` only — see that README

For development on a subset of packages only, use editable installs (nothing to install at the monorepo root):

pip install -e metrix/
pip install -e linex/

Try it

Try Metrix on your app (see Metrix docs and examples):

from metrix import Metrix

profiler = Metrix()
results = profiler.profile("./your_app", metrics=["memory.hbm_bandwidth_utilization"])

for kernel in results.kernels:
    print(f"{kernel.name}: {kernel.duration_us.avg:.2f} μs")

MCP quick config

With uv and a clone of this repo, you can point an MCP client at each package directory (adjust /path/to/intellikit/...):

{
  "mcpServers": {
    "metrix-mcp": {
      "command": "uv",
      "args": ["run", "--directory", "/path/to/intellikit/metrix", "metrix-mcp"]
    },
    "kerncap-mcp": {
      "command": "uv",
      "args": ["run", "--directory", "/path/to/intellikit/kerncap", "kerncap-mcp"]
    },
    "hip-compiler-mcp": {
      "command": "uv",
      "args": ["run", "--directory", "/path/to/intellikit/rocm_mcp", "hip-compiler-mcp"]
    }
  }
}

If you installed with pip / install.sh instead, use the console script names (metrix-mcp, …) on your PATH, or the full path under your venv.

More servers and examples: rocm_mcp/README.md, AGENTS.md (contributor-oriented, includes all entry point names).

More install options (pip command, branch, dry-run, per-package Git URLs)

Tools script (install/tools/install.sh)

Default pip3; the script checks that pip’s Python is 3.10+ before installing (override with --pip-cmd if needed).
Subset only: --tools metrix,linex,nexus
Custom pip:
curl -sSL .../install/tools/install.sh | bash -s -- --pip-cmd pip3.12
or --pip-cmd "python3.12 -m pip"
Branch/tag: --ref my-branch
Different repo: --repo-url https://github.com/you/fork.git
Preview: --dry-run

Skills script (install/skills/install.sh)

--target cursor | claude | codex | agents — where skills are written
--global — e.g. ~/.cursor/skills/ for Cursor
--dry-run

Individual packages from Git

pip install "git+https://github.com/AMDResearch/intellikit.git#subdirectory=metrix"
# accordo, kerncap, linex, nexus, rocm_mcp, uprof_mcp — same pattern

Editable from clone

git clone https://github.com/AMDResearch/intellikit.git
cd intellikit
pip install -e ./accordo
pip install -e ./kerncap
# …any subset

Example: profile → inspect → validate

from metrix import Metrix
from nexus import Nexus
from accordo import Accordo

# 1) Baseline metrics
profiler = Metrix()
baseline = profiler.profile(
    "./app_baseline",
    metrics=["memory.hbm_bandwidth_utilization"],
)
baseline_bw = baseline.kernels[0].metrics["memory.hbm_bandwidth_utilization"].avg

# 2) See what ran on the GPU
trace = Nexus().run(["./app_baseline"])
for kernel in trace:
    print(kernel.name, len(kernel.assembly), "instructions")

# 3) After you optimize — check correctness
validator = Accordo(binary="./app_baseline", kernel_name="my_kernel")
ref = validator.capture_snapshot(binary="./app_baseline")
opt = validator.capture_snapshot(binary="./app_opt")
result = validator.compare_snapshots(ref, opt, tolerance=1e-6)

if result.is_valid:
    opt_results = profiler.profile(
        "./app_opt",
        metrics=["memory.hbm_bandwidth_utilization"],
    )
    opt_bw = opt_results.kernels[0].metrics["memory.hbm_bandwidth_utilization"].avg
    print(f"PASS — {result.num_arrays_validated} arrays matched; BW delta {opt_bw - baseline_bw:.1f}%")

Contributing & support

We welcome issues and pull requests on GitHub.

Bugs / ideas: Issues

Made for the next generation of GPU development — with or without an LLM in the loop.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IntelliKit

What’s in the box

Quick start

Requirements

Try it

MCP quick config

Example: profile → inspect → validate

Contributing & support

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 97 Commits
.devcontainer		.devcontainer
.github		.github
accordo		accordo
apptainer		apptainer
docker		docker
docs		docs
install		install
kerncap		kerncap
linex		linex
metrix		metrix
nexus		nexus
rocm_mcp		rocm_mcp
uprof_mcp		uprof_mcp
.clang-format		.clang-format
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
ruff.toml		ruff.toml

Folders and files

Latest commit

History

Repository files navigation

IntelliKit

What’s in the box

Quick start

Requirements

Try it

MCP quick config

Example: profile → inspect → validate

Contributing & support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages