Welcome to Triton-Viz, a visualization and profiling toolkit designed for deep learning applications. Built with the intention of making kernel programming in tile-based DSLs like Triton more intuitive.
Visit our site to see our tool in action!
Table of Contents
Triton-Viz is a visualization and analysis toolkit specifically designed to complement the development and optimization of applications written in OpenAI's Triton, an open-source programming language aimed at simplifying the task of coding for accelerators such as GPUs. Triton-Viz offers a suite of features to enhance the debugging, performance analysis, and understanding of Triton code.
Given that Triton allows developers to program at a higher level while still targeting low-level accelerator devices, managing and optimizing resources like memory becomes a crucial aspect of development. Triton-Viz addresses these challenges by providing real-time visualization of tensor operations and their memory usage. The best part about this tool is that while it does focus on visualizing GPU operations, users are not required to have GPU resources to run examples on their system.
- Python installed (preferably the latest available version), minimum supported version is 3.10.
Most users can install directly from PyPI:
pip install triton-vizIf you want to run examples from this repo, contribute, or build the frontend, install from source instead:
git clone https://github.com/Deep-Learning-Profiling-Tools/triton-viz.git
cd triton-viz
uv sync # or "uv sync --extra test" if you're running testsIf you want to run tests, run uv sync --extra test instead of uv sync. Otherwise you're all set!
The PyPI package ships with prebuilt frontend assets in triton_viz/static, so
you do not need npm to run the visualizer. If you want to modify the frontend,
rebuild the TS sources:
npm install
npm run build:frontendFor PyPI installs, install with the nki extra and AWS Neuron repository:
pip install triton-viz[nki] --extra-index-url https://pip.repos.neuron.amazonaws.comFor source installs, if you want to exercise the Neuron Kernel Interface (NKI) interpreter or run the NKI-specific tests:
uv sync --extra nki # or "uv sync --extra nki --extra test" if also running testsNote that you need to specify all features that you want in one statement when using uv sync, i.e. if you want both NKI and testing support, you must run uv sync --extra nki --extra test. The below statements are wrong and will remove the NKI install when installing test packages:
uv sync --extra nki
uv sync --extra test
- To run core Triton-viz tests, run
pytest tests/. - (if NKI installed) To run NKI-specific tests, run
pytest tests/ -m nki. - To run all tests (Triton + NKI), run
pytest tests/ -m "". - To run visualizer frontend tests, run
npm run test:frontend.
Examples live in this repo. Clone it first if you installed via pip.
cd examples
python <file_name>.py- Triton is best supported today; Amazon NKI DSL support is in active development.
- The web visualizer requires a browser with WebGL/OpenGL enabled (standard in modern browsers).
Analyze kernels across visualization, profiling, and sanitization with a single line of code.
- Visualizer: currently supports load, store, and matmul operations for 1/2/3D tensors (more operations and dimensions coming soon).
- Profiler: flags non-unrolled loops, inefficient mask usage, and missing buffer_load optimizations while tracking load/store byte counts with low-overhead sampling.
- Sanitizer: symbolically checks tensor memory accesses for out-of-bounds errors and emits reports with tensor metadata, call stack, and expression trees; optional fake-memory backend avoids real reads.
- 3D View: inspect tensor layouts and memory access patterns from any perspective.
- Program IDs: examine op inputs/outputs at specific PIDs and see per-program load/store footprints.
- Code Mapping: map visual ops back to source lines for debugging.
- Heatmaps: spot outliers, zeros, or saturation with value color gradients.
- Histograms: review value distributions to guide quantization decisions.
Triton-Viz uses a small set of environment variables to configure runtime behavior. Unless noted, boolean flags are enabled only when set to 1.
TRITON_VIZ_VERBOSE(default:0): enable verbose logging and extra debug output.TRITON_VIZ_NUM_SMS(default:1): number of concurrent SMs to emulate for the CPU interpreter (min 1).TRITON_VIZ_PORT(default:8000withshare=True,5001withshare=False): port for the Flask server.ENABLE_SANITIZER(default:1): enable the sanitizer pipeline that checks memory accesses.ENABLE_PROFILER(default:1): enable the profiler pipeline that collects performance data.ENABLE_TIMING(default:0): collect timing data during execution.REPORT_GRID_EXECUTION_PROGRESS(default:0): report per-program block execution progress in the interpreter.SANITIZER_ENABLE_FAKE_TENSOR(default:0): use a fake tensor backend for sanitizer runs to avoid real memory reads.PROFILER_ENABLE_LOAD_STORE_SKIPPING(default:1): skip redundant load/store checks to reduce profiling overhead.PROFILER_ENABLE_BLOCK_SAMPLING(default:1): sample a subset of blocks to reduce profiling overhead.PROFILER_DISABLE_BUFFER_LOAD_CHECK(default:0): disable buffer load checks in the profiler.
If you're interested in fun puzzles to work with in Triton, do check out: Triton Puzzles
Triton-Viz is licensed under the MIT License. See the LICENSE for details.
If you find this repo useful for your research, please cite our paper:
@inproceedings{ramesh2025tritonviz,
author={Ramesh, Tejas and Rush, Alexander and Liu, Xu and Yin, Binqian and Zhou, Keren and Jiao, Shuyin},
title={Triton-Viz: Visualizing GPU Programming in AI Courses},
booktitle = {Proceedings of the 56th ACM Technical Symposium on Computer Science Education (SIGCSE TS '25)},
numpages = {7},
location = {Pittsburgh, Pennsylvania, United States},
series = {SIGCSE TS '25}
}
@inproceedings{wu2026tritonsanitizer,
author = {Wu, Hao and Zhao, Qidong and Chen, Songqing and Chen, Yang and Hao, Yueming and Liu, Tony C. W. and Chen, Sijia and Aziz, Adnan and Zhou, Keren},
title = {Triton-Sanitizer: A Fast and Device-Agnostic Memory Sanitizer for Triton with Rich Diagnostic Context},
year = {2026},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
location = {Pittsburgh, PA, USA},
booktitle = {Proceedings of the 31st ACM International Conference on Architectural Support for Programming Languages and Operating Systems},
series = {ASPLOS '26},
keywords = {GPU, Debugging, Symbolic Execution, Memory Safety, Triton, Memory Access Errors}
}
