In this parallel strategy, rows and columns were partitioned into 4 so matrix A and B was partitioned into 4*4 block matrices. The tile width would be 512/4 = 128. At a time one block matrix A and B was brought to shared memory and all the threads computed on those data. Each thread would work on 128/64 = 2 columns of the block of matrix B. In parallel 64 threads would be working on a block at a time. The block sequence would be: First block of first row of C = first block of first row of c + (first block of first row of A * first block of first column of B) + (second block of first row of A * second block of first column of B) + … (fourth block of first row of A * fourth block of first column of B). This technique exploits both spatial and temporal locality of data since reuse of adjacent dataset and reuse of same dataset.
-
Notifications
You must be signed in to change notification settings - Fork 1
In this parallel strategy, rows and columns were partitioned into 4 so matrix A and B was partitioned into 4*4 block matrices. The tile width would be 512/4 = 128. At a time one block matrix A and B was brought to shared memory and all the threads computed on those data. Each thread would work on 128/64 = 2 columns of the block of matrix B. In p…
License
merinjo/Parallel-Matrix-Multiply
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
About
In this parallel strategy, rows and columns were partitioned into 4 so matrix A and B was partitioned into 4*4 block matrices. The tile width would be 512/4 = 128. At a time one block matrix A and B was brought to shared memory and all the threads computed on those data. Each thread would work on 128/64 = 2 columns of the block of matrix B. In p…
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published