Parallel Matrix Accelerator

Overview

This repository hosts parallel computing labs exploring progressively more powerful accelerators for dense matrix multiplication and convolutional kernels. Each lab scales the same GEMM baseline across OpenMP, MPI, CUDA, and FPGA/HLS targets.

Directory Map

Path	Description
`common/`	Common toolchains.
`lab1-openmp-gemm/`	Multi-core acceleration: C++17 + OpenMP GEMM with blocked/streamed kernels.
`lab2-mpi-gemm/`	Multi-CPU acceleration: Distributed GEMM that scatters tiles with MPI, supports blocking/buffered/non-blocking communication.
`lab3-cuda-cnn/`	GPU acceleration: CUDA implementation of convolution/GEMM hybrids using shared-memory tiling.
`lab4-fpga-cnn/`	FPGA acceleration: MerlinCC/HLS kernels for FPGA emulation and AWS F1 synthesis.

Build & Run

OpenMP: cd lab1-openmp-gemm && make -j && make test to benchmark both parallel kernels against the baseline library. Reports required by the course are regenerated via make zip.
MPI: cd lab2-mpi-gemm && make test np=4 (override np as needed). Switch between communication APIs at the top of mpi.cpp.
CUDA: cd lab3-cuda-cnn && make cnn && . ./params.sh && ./cnn for the CNN benchmark, or make vadd && . ./params.sh && ./vadd for micro-validation. make test-seq compares against the sequential host reference.
FPGA/HLS: cd lab4-fpga-cnn && make KERNEL=cnn test to run the fast simulator, make estimate to pull cycle counts from merlin.rpt, and make KERNEL=dotprod / vadd for alternate kernels. Scripts prefixed with setup configure the Merlin or Docker toolchains.

Performance & Profiling Notes

Tile sizes, unrolling factors, and kernel variants are documented in each lab*-report.md.
Speedups are measured relative to the naive single-core routine in lab1-openmp-gemm/lib/gemm.cpp, typically using 4096² matrices on Apple Silicon hosts and AWS F1 instances.

Cleaning Up

Run make clean inside any lab directory to drop binaries.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.idea		.idea
common		common
lab1-openmp-gemm		lab1-openmp-gemm
lab2-mpi-gemm		lab2-mpi-gemm
lab3-cuda-cnn		lab3-cuda-cnn
lab4-fpga-cnn		lab4-fpga-cnn
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Parallel Matrix Accelerator

Overview

Directory Map

Build & Run

Performance & Profiling Notes

Cleaning Up

About

Uh oh!

Releases

Packages

Languages

ykozxy/parallel-matrix-multiplication

Folders and files

Latest commit

History

Repository files navigation

Parallel Matrix Accelerator

Overview

Directory Map

Build & Run

Performance & Profiling Notes

Cleaning Up

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages