Skip to content

In this parallel strategy, rows and columns were partitioned into 4 so matrix A and B was partitioned into 4*4 block matrices. The tile width would be 512/4 = 128. At a time one block matrix A and B was brought to shared memory and all the threads computed on those data. Each thread would work on 128/64 = 2 columns of the block of matrix B. In p…

License

Notifications You must be signed in to change notification settings

merinjo/Parallel-Matrix-Multiply

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

Parallel-Matrix-Multiply

In this parallel strategy, rows and columns were partitioned into 4 so matrix A and B was partitioned into 4*4 block matrices. The tile width would be 512/4 = 128. At a time one block matrix A and B was brought to shared memory and all the threads computed on those data. Each thread would work on 128/64 = 2 columns of the block of matrix B. In parallel 64 threads would be working on a block at a time. The block sequence would be: First block of first row of C = first block of first row of c + (first block of first row of A * first block of first column of B) + (second block of first row of A * second block of first column of B) + … (fourth block of first row of A * fourth block of first column of B). This technique exploits both spatial and temporal locality of data since reuse of adjacent dataset and reuse of same dataset.

About

In this parallel strategy, rows and columns were partitioned into 4 so matrix A and B was partitioned into 4*4 block matrices. The tile width would be 512/4 = 128. At a time one block matrix A and B was brought to shared memory and all the threads computed on those data. Each thread would work on 128/64 = 2 columns of the block of matrix B. In p…

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages