Skip to content

Parry-97/pmpp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

20 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

PMPP: Programming Massively Parallel Processors

CUDA Python CMake Progress

A learning repository following the book "Programming Massively Parallel Processors: A Hands-on Approach" by Wen-mei W. Hwu, David B. Kirk, and Izzat El Hajj. This project explores parallel programming through both CUDA C/C++ and Python/Triton implementations.

πŸ“š About

This repository documents my journey learning GPU programming and parallel computing. I'm experimenting with:

  • CUDA C/C++ for low-level GPU programming
  • Triton for high-level, Pythonic GPU kernels
  • CMake for C/C++ build management
  • uv for Python dependency management
  • Doxygen for C/C++ code documentation
  • jj (Jujutsu) for version control

Current Progress: Chapter 3

Note: This is an experimental learning repository. Code may not be production-ready and is intended for educational purposes.

πŸ“‚ Project Structure

pmpp/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ cuda/              # CUDA C/C++ implementations
β”‚   β”‚   └── vector_add/    # Chapter 2-3: Vector addition example
β”‚   └── triton/            # Python/Triton implementations (coming soon)
β”œβ”€β”€ notes/                 # Chapter summaries and learning notes
β”œβ”€β”€ html/                  # Doxygen-generated documentation
β”œβ”€β”€ CMakeLists.txt         # CMake configuration (if using top-level build)
β”œβ”€β”€ pyproject.toml         # Python project configuration
β”œβ”€β”€ uv.lock                # Locked Python dependencies
β”œβ”€β”€ Doxyfile               # Doxygen configuration
└── README.md              # This file

Directory Purposes

  • src/cuda/: Contains CUDA C/C++ kernel implementations organized by chapter/topic
  • src/triton/: Will contain Python/Triton kernel implementations for comparison
  • notes/: Personal notes, chapter summaries, and key concepts
  • html/: Auto-generated API documentation (gitignored, generated locally)

πŸ”§ Prerequisites

Hardware

  • NVIDIA GPU with CUDA support (Compute Capability 3.5+)
  • Check your GPU: nvidia-smi

Software

  • CUDA Toolkit (β‰₯11.0 recommended) - Installation Guide
  • CMake (β‰₯3.18) - For building C/C++ projects
  • Python (β‰₯3.11) - For Triton implementations
  • uv - Modern Python package manager (Installation)
  • Doxygen (optional) - For generating C/C++ documentation
  • jj (Jujutsu) (optional) - Version control (Installation)

Verify CUDA Installation

nvcc --version
nvidia-smi

πŸš€ Getting Started

1. Clone the Repository

Using jj:

jj git clone <repository-url>
cd pmpp

Or with git:

git clone <repository-url>
cd pmpp

2. Python Setup with uv

# Install dependencies (Triton β‰₯3.5.0)
uv sync

# Verify installation
uv run python -c "import triton; print(triton.__version__)"

3. Building CUDA C/C++ Projects

Each CUDA project has its own CMakeLists.txt. Navigate to the project directory:

# Example: Building vector_add
cd src/cuda/vector_add
cmake -B build
cmake --build build

# Run the executable
./build/vector_add.out

For a cleaner workflow, you can also use:

cd src/cuda/vector_add
cmake .
make
./vector_add.out

πŸƒ Running Examples

CUDA C/C++ Examples

cd src/cuda/vector_add
cmake -B build && cmake --build build
./build/vector_add.out

Python/Triton Examples (Coming Soon)

uv run python src/triton/example.py

πŸ“– Documentation

Generating C/C++ API Documentation

# Generate HTML documentation
doxygen Doxyfile

# View in browser
firefox html/index.html
# or
xdg-open html/index.html

The Doxygen configuration parses inline comments in CUDA source files to generate comprehensive API documentation.

🌿 Version Control with jj

This project uses jj (Jujutsu) instead of traditional git. Basic commands:

# Create a new change
jj describe          # Add commit description

# View history
jj log               # View commit graph

# Create new change
jj commit            # Finalize current change

# Sync with remote
jj git push          # Push to git remote
jj git fetch         # Fetch from git remote

New to jj? Check out the Jujutsu Tutorial

πŸ“š Learning Resources

Primary Resource

Documentation

Supplementary Materials

🎯 Learning Goals

  • βœ… Understand GPU architecture and memory hierarchies
  • βœ… Master CUDA programming fundamentals (kernels, threads, blocks, grids)
  • πŸ”„ Learn advanced optimization techniques (memory coalescing, shared memory)
  • πŸ”„ Explore Triton for high-level GPU programming
  • πŸ”œ Compare CUDA and Triton approaches
  • πŸ”œ Implement real-world parallel algorithms

Legend: βœ… Completed | πŸ”„ In Progress | πŸ”œ Upcoming

πŸ—ΊοΈ Chapter Progress

Chapter Topic CUDA C/C++ Triton Notes
1 Introduction βœ… - βœ…
2 Heterogeneous Data Parallel Computing βœ… πŸ”œ βœ…
3 Multidimensional Grids and Data πŸ”„ πŸ”œ πŸ”„
4 Compute Architecture and Scheduling πŸ”œ πŸ”œ πŸ”œ
... ... ... ... ...

🀝 Contributing

This is a personal learning repository, but suggestions and corrections are welcome! Feel free to:

  • Open issues for questions or clarifications
  • Submit pull requests for bug fixes
  • Share alternative implementations

πŸ“ License

This project is for educational purposes. Code implementations are based on exercises and examples from "Programming Massively Parallel Processors."

For academic use, please cite the original book:

Hwu, W., Kirk, D., & El Hajj, I. (2022).
Programming Massively Parallel Processors: A Hands-on Approach (4th ed.).
Morgan Kaufmann.

Built with: πŸš€ CUDA β€’ 🐍 Python β€’ ⚑ Triton β€’ πŸ› οΈ CMake β€’ πŸ“¦ uv β€’ πŸ“š Doxygen β€’ 🌿 jj

Happy parallel programming! πŸŽ‰

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published