Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,8 @@ cover/
*.csv
*.ncu-rep
local_settings.py
*.local
*.local.*
db.sqlite3
db.sqlite3-journal

Expand Down
13 changes: 10 additions & 3 deletions modeling/transformers/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -21,16 +21,23 @@ ARG TORCH_VERSION=2.9.1

ENV PYTHONUNBUFFERED=1

# Install system dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
# Install system dependencies and NSight Systems CLI.
RUN \
apt-get update && \
apt-get install -y --no-install-recommends gnupg && \
echo "deb https://developer.download.nvidia.com/devtools/repos/ubuntu2204/amd64/ /" | tee /etc/apt/sources.list.d/nvidia-devtools.list && \
apt-key adv --fetch-keys https://developer.download.nvidia.com/devtools/repos/ubuntu2204/amd64/nvidia.pub && \
apt-get update && \
apt-get install -y --no-install-recommends \
python3-pip \
python3-dev \
python-is-python3 \
git \
wget \
curl \
build-essential \
&& rm -rf /var/lib/apt/lists/*
nsight-systems-cli && \
rm -rf /var/lib/apt/lists/*

# Upgrade pip
RUN python -m pip install --upgrade pip setuptools wheel
Expand Down
33 changes: 33 additions & 0 deletions modeling/transformers/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,38 @@ python infer.py \
--num_runs 5
```

### Kernel Coverage Report

Report the fraction of GPU time and kernel launches covered by TileGym cuTile kernels. Runs the model under NSight Systems (`nsys profile`) and analyzes the trace automatically.

```bash
python infer.py \
--model_id meta-llama/Meta-Llama-3.1-8B \
--use_tilegym \
--use_cutile \
--use_attn \
--report_kernel_coverage \
--sentence_file sample_inputs/input_prompt_32K.txt \
--output_length 100
```

Example output:
```text
===== NSYS KERNEL GPU TIME ANALYSIS =====

Kernel Name # Calls GPU Time (ms) % of Total
------------------------------------------------------------ -------- ------------- ----------
fmha_kernel ... 54.507 10.5%
rms_norm_kernel_gather ... 9.788 1.9%
...
------------------------------------------------------------ -------- ------------- ----------
TileGym Total 9676 95.147 18.3%
All Kernels Total 104858 520.725 100.0%

>>> cuTile Kernel Coverage (GPU Time): 18.3% <<<
>>> cuTile Kernel Coverage (# Launches): 9.2% <<<
```

## Performance Benchmark

Benchmark TileGym's CUTILE-optimized kernels against standard PyTorch implementation. The `--profile` flag enables detailed performance metrics including throughput (tokens/sec) and generation latency.
Expand Down Expand Up @@ -241,6 +273,7 @@ python infer.py \
| `--num_runs` | Benchmark iterations | `5` |
| `--warmup_runs` | Warmup iterations | `2` |
| `--profile` | Enable profiling | `False` |
| `--report_kernel_coverage` | Report cuTile kernel GPU time and launch count coverage via nsys | `False` |
| `--show_outputs` | Print generated text | `False` |


Expand Down
14 changes: 14 additions & 0 deletions modeling/transformers/bench_deepseek.sh
Original file line number Diff line number Diff line change
Expand Up @@ -55,3 +55,17 @@ else
echo "Summary file not found."
fi
echo "========================================"

echo ""
echo "========================================"
echo " TileGym Kernel Coverage"
echo "========================================"
python infer.py \
--model_id ${MODEL_ID} \
--use_tilegym \
--use_cutile \
--use_attn \
--report_kernel_coverage \
--sentence_file ${INPUT_FILE} \
--output_length ${OUTPUT_LENGTH}
echo "========================================"
14 changes: 14 additions & 0 deletions modeling/transformers/bench_gemma3.sh
Original file line number Diff line number Diff line change
Expand Up @@ -55,3 +55,17 @@ else
echo "Summary file not found."
fi
echo "========================================"

echo ""
echo "========================================"
echo " TileGym Kernel Coverage"
echo "========================================"
python infer.py \
--model_id ${MODEL_ID} \
--use_tilegym \
--use_cutile \
--use_attn \
--report_kernel_coverage \
--sentence_file ${INPUT_FILE} \
--output_length ${OUTPUT_LENGTH}
echo "========================================"
14 changes: 14 additions & 0 deletions modeling/transformers/bench_gpt_oss.sh
Original file line number Diff line number Diff line change
Expand Up @@ -55,3 +55,17 @@ else
echo "Summary file not found."
fi
echo "========================================"

echo ""
echo "========================================"
echo " TileGym Kernel Coverage"
echo "========================================"
python infer.py \
--model_id ${MODEL_ID} \
--use_tilegym \
--use_cutile \
--use_attn \
--report_kernel_coverage \
--sentence_file ${INPUT_FILE} \
--output_length ${OUTPUT_LENGTH}
echo "========================================"
14 changes: 14 additions & 0 deletions modeling/transformers/bench_llama.sh
Original file line number Diff line number Diff line change
Expand Up @@ -55,3 +55,17 @@ else
echo "Summary file not found."
fi
echo "========================================"

echo ""
echo "========================================"
echo " TileGym Kernel Coverage"
echo "========================================"
python infer.py \
--model_id ${MODEL_ID} \
--use_tilegym \
--use_cutile \
--use_attn \
--report_kernel_coverage \
--sentence_file ${INPUT_FILE} \
--output_length ${OUTPUT_LENGTH}
echo "========================================"
14 changes: 14 additions & 0 deletions modeling/transformers/bench_mistral.sh
Original file line number Diff line number Diff line change
Expand Up @@ -55,3 +55,17 @@ else
echo "Summary file not found."
fi
echo "========================================"

echo ""
echo "========================================"
echo " TileGym Kernel Coverage"
echo "========================================"
python infer.py \
--model_id ${MODEL_ID} \
--use_tilegym \
--use_cutile \
--use_attn \
--report_kernel_coverage \
--sentence_file ${INPUT_FILE} \
--output_length ${OUTPUT_LENGTH}
echo "========================================"
14 changes: 14 additions & 0 deletions modeling/transformers/bench_phi3.sh
Original file line number Diff line number Diff line change
Expand Up @@ -55,3 +55,17 @@ else
echo "Summary file not found."
fi
echo "========================================"

echo ""
echo "========================================"
echo " TileGym Kernel Coverage"
echo "========================================"
python infer.py \
--model_id ${MODEL_ID} \
--use_tilegym \
--use_cutile \
--use_attn \
--report_kernel_coverage \
--sentence_file ${INPUT_FILE} \
--output_length ${OUTPUT_LENGTH}
echo "========================================"
15 changes: 15 additions & 0 deletions modeling/transformers/bench_qwen.sh
Original file line number Diff line number Diff line change
Expand Up @@ -59,3 +59,18 @@ else
echo "Summary file not found."
fi
echo "========================================"

echo ""
echo "========================================"
echo " TileGym Kernel Coverage"
echo "========================================"
python infer.py \
--model_id ${MODEL_ID} \
--use_tilegym \
--use_cutile \
--use_attn \
--report_kernel_coverage \
--sentence_file ${INPUT_FILE} \
--batch_size ${BATCH_SIZE} \
--output_length ${OUTPUT_LENGTH}
echo "========================================"
Loading