A learning repository following the book "Programming Massively Parallel Processors: A Hands-on Approach" by Wen-mei W. Hwu, David B. Kirk, and Izzat El Hajj. This project explores parallel programming through both CUDA C/C++ and Python/Triton implementations.
This repository documents my journey learning GPU programming and parallel computing. I'm experimenting with:
- CUDA C/C++ for low-level GPU programming
- Triton for high-level, Pythonic GPU kernels
- CMake for C/C++ build management
- uv for Python dependency management
- Doxygen for C/C++ code documentation
- jj (Jujutsu) for version control
Current Progress: Chapter 3
Note: This is an experimental learning repository. Code may not be production-ready and is intended for educational purposes.
pmpp/
βββ src/
β βββ cuda/ # CUDA C/C++ implementations
β β βββ vector_add/ # Chapter 2-3: Vector addition example
β βββ triton/ # Python/Triton implementations (coming soon)
βββ notes/ # Chapter summaries and learning notes
βββ html/ # Doxygen-generated documentation
βββ CMakeLists.txt # CMake configuration (if using top-level build)
βββ pyproject.toml # Python project configuration
βββ uv.lock # Locked Python dependencies
βββ Doxyfile # Doxygen configuration
βββ README.md # This file
src/cuda/: Contains CUDA C/C++ kernel implementations organized by chapter/topicsrc/triton/: Will contain Python/Triton kernel implementations for comparisonnotes/: Personal notes, chapter summaries, and key conceptshtml/: Auto-generated API documentation (gitignored, generated locally)
- NVIDIA GPU with CUDA support (Compute Capability 3.5+)
- Check your GPU:
nvidia-smi
- CUDA Toolkit (β₯11.0 recommended) - Installation Guide
- CMake (β₯3.18) - For building C/C++ projects
- Python (β₯3.11) - For Triton implementations
- uv - Modern Python package manager (Installation)
- Doxygen (optional) - For generating C/C++ documentation
- jj (Jujutsu) (optional) - Version control (Installation)
nvcc --version
nvidia-smiUsing jj:
jj git clone <repository-url>
cd pmppOr with git:
git clone <repository-url>
cd pmpp# Install dependencies (Triton β₯3.5.0)
uv sync
# Verify installation
uv run python -c "import triton; print(triton.__version__)"Each CUDA project has its own CMakeLists.txt. Navigate to the project directory:
# Example: Building vector_add
cd src/cuda/vector_add
cmake -B build
cmake --build build
# Run the executable
./build/vector_add.outFor a cleaner workflow, you can also use:
cd src/cuda/vector_add
cmake .
make
./vector_add.outcd src/cuda/vector_add
cmake -B build && cmake --build build
./build/vector_add.outuv run python src/triton/example.py# Generate HTML documentation
doxygen Doxyfile
# View in browser
firefox html/index.html
# or
xdg-open html/index.htmlThe Doxygen configuration parses inline comments in CUDA source files to generate comprehensive API documentation.
This project uses jj (Jujutsu) instead of traditional git. Basic commands:
# Create a new change
jj describe # Add commit description
# View history
jj log # View commit graph
# Create new change
jj commit # Finalize current change
# Sync with remote
jj git push # Push to git remote
jj git fetch # Fetch from git remoteNew to jj? Check out the Jujutsu Tutorial
- Book: Programming Massively Parallel Processors (4th Edition recommended)
- NVIDIA CUDA Programming Guide
- CUDA Best Practices Guide
- Triton Documentation
- OpenAI Triton Tutorials
- CUDA by Example
- GPU Gems Series
- Chapter notes available in the
notes/directory
- β Understand GPU architecture and memory hierarchies
- β Master CUDA programming fundamentals (kernels, threads, blocks, grids)
- π Learn advanced optimization techniques (memory coalescing, shared memory)
- π Explore Triton for high-level GPU programming
- π Compare CUDA and Triton approaches
- π Implement real-world parallel algorithms
Legend: β Completed | π In Progress | π Upcoming
| Chapter | Topic | CUDA C/C++ | Triton | Notes |
|---|---|---|---|---|
| 1 | Introduction | β | - | β |
| 2 | Heterogeneous Data Parallel Computing | β | π | β |
| 3 | Multidimensional Grids and Data | π | π | π |
| 4 | Compute Architecture and Scheduling | π | π | π |
| ... | ... | ... | ... | ... |
This is a personal learning repository, but suggestions and corrections are welcome! Feel free to:
- Open issues for questions or clarifications
- Submit pull requests for bug fixes
- Share alternative implementations
This project is for educational purposes. Code implementations are based on exercises and examples from "Programming Massively Parallel Processors."
For academic use, please cite the original book:
Hwu, W., Kirk, D., & El Hajj, I. (2022).
Programming Massively Parallel Processors: A Hands-on Approach (4th ed.).
Morgan Kaufmann.
Built with: π CUDA β’ π Python β’ β‘ Triton β’ π οΈ CMake β’ π¦ uv β’ π Doxygen β’ πΏ jj
Happy parallel programming! π