Parallel-Matrix-Multiply

In this parallel strategy, rows and columns were partitioned into 4 so matrix A and B was partitioned into 4*4 block matrices. The tile width would be 512/4 = 128. At a time one block matrix A and B was brought to shared memory and all the threads computed on those data. Each thread would work on 128/64 = 2 columns of the block of matrix B. In parallel 64 threads would be working on a block at a time. The block sequence would be: First block of first row of C = first block of first row of c + (first block of first row of A * first block of first column of B) + (second block of first row of A * second block of first column of B) + … (fourth block of first row of A * fourth block of first column of B). This technique exploits both spatial and temporal locality of data since reuse of adjacent dataset and reuse of same dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
LICENSE		LICENSE
README.md		README.md
matrix_multiply.cc		matrix_multiply.cc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Parallel-Matrix-Multiply

About

Uh oh!

Releases

Packages

Languages

License

merinjo/Parallel-Matrix-Multiply

Folders and files

Latest commit

History

Repository files navigation

Parallel-Matrix-Multiply

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages