AMDGCN Assembly Editing Tools

A comprehensive toolkit for parsing, analyzing, and optimizing AMDGCN assembly files. This package provides tools for Control Flow Graph (CFG) generation, Data Dependency Graph (DDG) analysis, instruction scheduling optimization, and register management.

Features

CFG Generation: Parse AMDGCN assembly files and generate Control Flow Graphs
DDG Analysis: Build Data Dependency Graphs with RAW/WAR dependency tracking
Instruction Scheduling: Move and distribute instructions with dependency verification
Register Replacement: Replace registers within instruction ranges with alignment support
Register Slice Analysis: Extract instruction subgraphs related to specific registers
Visualization: Generate DOT/SVG visualizations of CFG and DDG structures

Installation

Ensure you have Python 3.6+ and optionally Graphviz for SVG generation:

# Install graphviz for visualization (optional)
apt-get install graphviz  # Debian/Ubuntu
yum install graphviz      # RHEL/CentOS

Quick Start

1. Parse Assembly and Generate DDG

# Parse .amdgcn file and generate DDG analysis
python3 amdgcn_ddg.py kernel.amdgcn --output-dir ./ddg_output

# Generate JSON analysis only (skip SVG)
python3 amdgcn_ddg.py kernel.amdgcn --output-dir ./ddg_output --json-only

# Print statistics
python3 amdgcn_ddg.py kernel.amdgcn --stats

2. Load Analysis from JSON

# Load previously saved analysis
python3 amdgcn_ddg.py --load-json ./ddg_output/analysis.json --stats

3. Regenerate Assembly File

# Regenerate .amdgcn from JSON (useful after transformations)
python3 amdgcn_ddg.py --load-json ./ddg_output/analysis.json --regenerate output.amdgcn

Transformation Passes

Move Instruction Pass

Move an instruction up or down within a basic block while respecting data dependencies.

# Move instruction at index 10 in .LBB0_2 up by 5 cycles
python3 amdgcn_ddg.py --load-json analysis.json --move .LBB0_2 10 5

# Move instruction down by 3 cycles (use negative value)
python3 amdgcn_ddg.py --load-json analysis.json --move .LBB0_2 15 -3

Distribute Instruction Pass

Evenly distribute instructions with a specific opcode throughout a basic block.

# Distribute global_load_dwordx4 instructions in .LBB0_2 with K=8
python3 amdgcn_ddg.py --load-json analysis.json --distribute .LBB0_2 global_load_dwordx4 8

# Distribute v_mfma instructions
python3 amdgcn_ddg.py --load-json analysis.json --distribute .LBB0_2 v_mfma_f32_32x32x16_fp8_fp8 16

Register Replacement Pass

Replace registers within a specified instruction range with new registers from the free register pool.

# Replace v[40:45] in instructions 267-1000 with alignment 2
python3 amdgcn_ddg.py --load-json analysis.json --replace-regs 267 1000 v[40:45] 2

# Replace multiple register segments
python3 amdgcn_ddg.py --load-json analysis.json --replace-regs 267 1000 v[40:45] 2 s[37:40] 1

Transform Pass Pipeline (JSON)

Define multiple passes in a JSON file for batch execution:

[
  {
    "type": "replace_registers",
    "range_start": 267,
    "range_end": 1000,
    "registers": ["v[40:45]", "s[37:40]"],
    "alignments": [2, 1]
  },
  {
    "type": "distribute",
    "block": ".LBB0_2",
    "opcode": "global_load_dwordx4",
    "k": 16
  },
  {
    "type": "move",
    "block": ".LBB0_2",
    "index": 10,
    "cycles": 5
  }
]

Execute the pipeline:

python3 amdgcn_ddg.py --load-json analysis.json --transform-json transform_passes.json

Register Slice Analysis

Extract all instructions related to specific registers and visualize their dependencies.

# Analyze instructions using v40-v45
python3 amdgcn_register_slice.py kernel.amdgcn --registers v40,v41,v42,v43,v44,v45 --output-dir ./slice_output

# Use register range notation
python3 amdgcn_register_slice.py kernel.amdgcn --registers "v[40:45]" --output-dir ./slice_output

# Load from JSON instead of parsing .amdgcn
python3 amdgcn_register_slice.py --load-json analysis.json --registers v40,v41 --output-dir ./slice_output

# Analyze special registers
python3 amdgcn_register_slice.py kernel.amdgcn --registers exec,vcc,scc --output-dir ./slice_output

Output files:

slice.json: Structured analysis data
slice.dot: GraphViz DOT format
slice.svg: Visual dependency graph
slice.txt: Human-readable summary

Python API

Basic Usage

from amdgcn_cfg import AMDGCNParser, CFG
from amdgcn_ddg import (
    generate_all_ddgs,
    compute_inter_block_deps,
    save_analysis_to_json,
    load_analysis_from_json,
    AnalysisResult,
)

# Parse assembly file
parser = AMDGCNParser()
cfg = parser.parse_file("kernel.amdgcn")

# Generate DDGs
ddgs, waitcnt_deps = generate_all_ddgs(cfg)
inter_deps = compute_inter_block_deps(cfg, ddgs)

# Create analysis result
result = AnalysisResult(
    cfg=cfg,
    ddgs=ddgs,
    inter_block_deps=inter_deps,
    waitcnt_deps=waitcnt_deps
)

# Save to JSON
save_analysis_to_json(cfg, ddgs, inter_deps, waitcnt_deps, "analysis.json")

# Load from JSON
result = load_analysis_from_json("analysis.json")

Using Transformation Passes

from amdgcn_passes import (
    PassManager,
    MoveInstructionPass,
    DistributeInstructionPass,
    RegisterReplacePass,
    move_instruction,
    distribute_instructions,
    replace_registers,
)

# Method 1: Use convenience functions
move_result = move_instruction(result, ".LBB0_2", 10, 5, verbose=True)
if move_result.success:
    print(f"Moved: {move_result.message}")

# Method 2: Use PassManager
pm = PassManager()
pm.add_pass(RegisterReplacePass(
    range_start=267,
    range_end=1000,
    registers_to_replace=["v[40:45]"],
    alignments=[2],
    verbose=True
))
pm.add_pass(DistributeInstructionPass(
    block_label=".LBB0_2",
    target_opcode="global_load_dwordx4",
    k=16,
    verbose=True
))
pm.run(result)

# Regenerate assembly with updated metadata
result.to_amdgcn("output.amdgcn", update_metadata=True)

Register Statistics and Free GPR Analysis

from amdgcn_ddg import (
    compute_register_statistics,
    compute_fgpr,
    compute_register_metadata,
    RegisterMetadata,
)

# Compute register usage
stats = compute_register_statistics(ddgs)
print(f"VGPR max index: {stats.vgpr_max_index}")
print(f"AGPR max index: {stats.agpr_max_index}")
print(f"SGPR max index: {stats.sgpr_max_index}")
print(f"Uses VCC: {stats.uses_vcc}")

# Compute free GPRs
fgpr = compute_fgpr(stats)
print(f"Free VGPRs: {len(fgpr.fgpr_v)}")
print(f"Free SGPRs: {len(fgpr.fgpr_s)}")

# Compute metadata for .amdgcn regeneration
metadata = compute_register_metadata(stats)
print(f"next_free_vgpr: {metadata.next_free_vgpr}")
print(f"accum_offset: {metadata.accum_offset}")
print(f"total_num_sgprs: {metadata.total_num_sgprs}")

Register Slice Analysis

from amdgcn_register_slice import (
    build_register_slice,
    generate_slice_dot,
    generate_slice_json,
    save_slice_outputs,
)

# Build slice for specific registers
target_registers = {"v40", "v41", "v42"}
slice_result = build_register_slice(cfg, ddgs, target_registers)

# Generate outputs
save_slice_outputs(slice_result, "./slice_output", generate_svg=True)

File Structure

amdgcn_edit/
├── amdgcn_cfg.py           # CFG parser and data structures
├── amdgcn_ddg.py           # DDG generation and main CLI
├── amdgcn_passes.py        # Instruction scheduling passes
├── amdgcn_register_slice.py # Register slice analyzer
├── amdgcn_latency.py       # Hardware latency analysis
├── amdgcn_verify.py        # Scheduling verification
├── gfx942_hardware_info.json # GFX942/GFX950 hardware info
├── __init__.py             # Package exports
└── test_*.py               # Unit tests

Hardware Support

Currently supports AMD GFX942/GFX950 (MI300) architecture with:

VGPR: v0-v255 (256 registers)
AGPR: a0-a255 (256 accumulator registers)
SGPR: s0-s103 (104 scalar registers)
Special registers: exec, vcc, scc, m0

Hardware information is stored in gfx942_hardware_info.json including:

MFMA instruction latencies
Register limits
Instruction cycle costs

Running Tests

# Run all tests
python3 -m pytest test_*.py -v

# Run specific test file
python3 test_amdgcn_ddg.py
python3 test_amdgcn_passes.py
python3 test_amdgcn_register_slice.py

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
amdgcn_cfg.py		amdgcn_cfg.py
amdgcn_ddg.py		amdgcn_ddg.py
amdgcn_latency.py		amdgcn_latency.py
amdgcn_passes.py		amdgcn_passes.py
amdgcn_register_slice.py		amdgcn_register_slice.py
amdgcn_verify.py		amdgcn_verify.py
cross_block_reg_flow.py		cross_block_reg_flow.py
debug_distribute_pass.py		debug_distribute_pass.py
find_related_instructions.py		find_related_instructions.py
gfx942_hardware_info.json		gfx942_hardware_info.json
pa_dot_kernel.bf16.kv_blk_64.kv_cmput_blk_256.ps_256.amdgcn		pa_dot_kernel.bf16.kv_blk_64.kv_cmput_blk_256.ps_256.amdgcn
pa_dot_kernel.hand_opt.amdgcn		pa_dot_kernel.hand_opt.amdgcn
pa_dot_kernel.no_iglp.bf16.kv_blk_64.kv_cmput_blk_128.amdgcn		pa_dot_kernel.no_iglp.bf16.kv_blk_64.kv_cmput_blk_128.amdgcn
pa_dot_kernel.no_iglp.bf16.kv_blk_64.kv_cmput_blk_128.ps_256.amdgcn		pa_dot_kernel.no_iglp.bf16.kv_blk_64.kv_cmput_blk_128.ps_256.amdgcn
pa_dot_kernel.no_iglp.bf16.kv_blk_64.kv_cmput_blk_128.with_branch.amdgcn		pa_dot_kernel.no_iglp.bf16.kv_blk_64.kv_cmput_blk_128.with_branch.amdgcn
pa_dot_kernel.no_iglp.bf16.kv_blk_64.kv_cmput_blk_256.amdgcn		pa_dot_kernel.no_iglp.bf16.kv_blk_64.kv_cmput_blk_256.amdgcn
pa_dot_kernel.opt_mtp_DDD_opt_mtp_tt.bf16.kv_blk_64.kv_cmput_blk_256.amdgcn		pa_dot_kernel.opt_mtp_DDD_opt_mtp_tt.bf16.kv_blk_64.kv_cmput_blk_256.amdgcn
pa_dot_kernel.v1.amdgcn		pa_dot_kernel.v1.amdgcn
pa_dot_kernel.v2.amdgcn		pa_dot_kernel.v2.amdgcn
test_amdgcn_cfg.py		test_amdgcn_cfg.py
test_amdgcn_ddg.py		test_amdgcn_ddg.py
test_amdgcn_latency.py		test_amdgcn_latency.py
test_amdgcn_passes.py		test_amdgcn_passes.py
test_amdgcn_register_slice.py		test_amdgcn_register_slice.py
test_amdgcn_verify.py		test_amdgcn_verify.py
test_list.sh		test_list.sh
trans_pass_list.json		trans_pass_list.json
trans_pass_list.ps_256.json		trans_pass_list.ps_256.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AMDGCN Assembly Editing Tools

Features

Installation

Quick Start

1. Parse Assembly and Generate DDG

2. Load Analysis from JSON

3. Regenerate Assembly File

Transformation Passes

Move Instruction Pass

Distribute Instruction Pass

Register Replacement Pass

Transform Pass Pipeline (JSON)

Register Slice Analysis

Python API

Basic Usage

Using Transformation Passes

Register Statistics and Free GPR Analysis

Register Slice Analysis

File Structure

Hardware Support

Running Tests

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AMDGCN Assembly Editing Tools

Features

Installation

Quick Start

1. Parse Assembly and Generate DDG

2. Load Analysis from JSON

3. Regenerate Assembly File

Transformation Passes

Move Instruction Pass

Distribute Instruction Pass

Register Replacement Pass

Transform Pass Pipeline (JSON)

Register Slice Analysis

Python API

Basic Usage

Using Transformation Passes

Register Statistics and Free GPR Analysis

Register Slice Analysis

File Structure

Hardware Support

Running Tests

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages