Skip to content

Virus2466/matrix-traversal-benchmark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Matrix-traversal-benchmark ( Cache Locality Benchmark)

Environment

  • Tested on Debian via Windows Subsystem for Linux 2 (WSL2)
  • CPU: [Intel i7-11800H]
  • Compiled with g++ -O3 -ffast-math -march=native -g
  • Note: WSL2 introduces minimal virtualization overhead for CPU/cache benchmarks, but the relative difference between row-major and column-major traversal remains clearly visible (and often amplified due to higher memory latency on misses).

Results may vary slightly on native Linux, but the core lesson align traversal with storage order for cache efficiency holds everywhere.

A practical demonstration of how memory layout and traversal order affect performance in C++. I started with the beginner-friendly std::vector<std::vector>, saw the slowdowns, then optimized to a contiguous flat 1D vector — with benchmarks, perf stats, and explanations. Key Insight: Cache misses dominate aligning access with storage order can yield 20–50× speedups

Started With std::vector<std::vector>

Began with the common way to represent 2D Matrix using vectors of vectors. It's easy and intuitive , but each row has seperate allocation.

Results with vector<vector> ( n = 5000 x 5000 )

Traversal Time L1 Cache Miss Rate (approx) Notes
Row-major (fast) ~0.7–0.8s ~15–30% Better locality within rows
Column-major (slow) ~1–2s ~90–99% Huge strides + fragmentation

Row Major

1

Column Major

2

Code Example

std::vector<std::vector<int>> arr(n, std::vector<int>(n, 0));

// Row-major (better)
for (int i = 0; i < n; ++i)
    for (int j = 0; j < n; ++j)
        arr[i][j]++;

// Column-major (terrible here)
for (int j = 0; j < n; ++j)
    for (int i = 0; i < n; ++i)
        arr[i][j]++;

Part 2 : Switching to Flat 1D Vector

Use a single std::vector with manual indexing -> full line usage -> fast!.

Results with vector<vector> ( n = 10000 )

Traversal Time L1 Cache Miss Rate (approx) Notes
Row-major ~0.08s ~15–30% 20-30x Faster
Column-major ~0.7s ~90–99% Sill Slow due to Strides.
Screenshot 2025-12-30 152705

Code Example

#include<iostream>
#include<vector>
#include<chrono>



int main(){

    const int n = 5000;
    std::vector<int>arr(n*n,0);

    // Row Major Traversal.
    auto s1 = std::chrono::high_resolution_clock::now();
    for(int i = 0 ; i < n ; i++){
        for(int j = 0 ; j < n ; j++){
            arr[i * n + j]++;
        }
    }

    auto e1 = std::chrono::high_resolution_clock::now();
    std::chrono::duration<double> diff1 = e1 - s1;


    // Column Major Traversal.
    auto s2 = std::chrono::high_resolution_clock::now();
    for(int j = 0 ; j < n ; j++){
        for(int i = 0 ; i < n ; i++){
            arr[i * n + j]++;
        }
    }


    auto e2 = std::chrono::high_resolution_clock::now();
    std::chrono::duration<double> diff2 = e2 - s2;


    std::cout << "Flat Row Major - " << diff1.count() << "s\n";
    std::cout << "Flat Column Major - " << diff2.count() << "s\n";


    // to prevent optimization deletion.
    std::cout << "Verify - " << arr[0] << "and" << arr[n*n-1] << "\n";


    return 0;
}

Key Lesson Learned

  • vector is convenient but slow for large matrices.
  • Use flat 1D arrays for performance-critical code.
  • Always traverse in storage order (row-major in C++).
  • Cache misses are the real bottleneck — profile with perf!

Perf Insights [ n = 10000 ]

  • Perf flags : perf stat -r 5

Row-Major Traversal

Screenshot 2025-12-30 160642

Column Major Travesral

Screenshot 2025-12-30 160850

Flat 1D Vector

Screenshot 2025-12-30 161052

About

Practical demonstration of how memory layout and traversal order affect performance in C++

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages