Matrix-traversal-benchmark ( Cache Locality Benchmark)

Environment

Tested on Debian via Windows Subsystem for Linux 2 (WSL2)
CPU: [Intel i7-11800H]
Compiled with g++ -O3 -ffast-math -march=native -g
Note: WSL2 introduces minimal virtualization overhead for CPU/cache benchmarks, but the relative difference between row-major and column-major traversal remains clearly visible (and often amplified due to higher memory latency on misses).

Results may vary slightly on native Linux, but the core lesson align traversal with storage order for cache efficiency holds everywhere.

A practical demonstration of how memory layout and traversal order affect performance in C++. I started with the beginner-friendly std::vector<std::vector>, saw the slowdowns, then optimized to a contiguous flat 1D vector — with benchmarks, perf stats, and explanations. Key Insight: Cache misses dominate aligning access with storage order can yield 20–50× speedups

Started With std::vector<std::vector>

Began with the common way to represent 2D Matrix using vectors of vectors. It's easy and intuitive , but each row has seperate allocation.

Results with vector<vector> ( n = 5000 x 5000 )

Traversal	Time	L1 Cache Miss Rate (approx)	Notes
Row-major (fast)	~0.7–0.8s	~15–30%	Better locality within rows
Column-major (slow)	~1–2s	~90–99%	Huge strides + fragmentation

Row Major

Column Major

Code Example

std::vector<std::vector<int>> arr(n, std::vector<int>(n, 0));

// Row-major (better)
for (int i = 0; i < n; ++i)
    for (int j = 0; j < n; ++j)
        arr[i][j]++;

// Column-major (terrible here)
for (int j = 0; j < n; ++j)
    for (int i = 0; i < n; ++i)
        arr[i][j]++;

Part 2 : Switching to Flat 1D Vector

Use a single std::vector with manual indexing -> full line usage -> fast!.

Results with vector<vector> ( n = 10000 )

Traversal	Time	L1 Cache Miss Rate (approx)	Notes
Row-major	~0.08s	~15–30%	20-30x Faster
Column-major	~0.7s	~90–99%	Sill Slow due to Strides.

Code Example

#include<iostream>
#include<vector>
#include<chrono>



int main(){

    const int n = 5000;
    std::vector<int>arr(n*n,0);

    // Row Major Traversal.
    auto s1 = std::chrono::high_resolution_clock::now();
    for(int i = 0 ; i < n ; i++){
        for(int j = 0 ; j < n ; j++){
            arr[i * n + j]++;
        }
    }

    auto e1 = std::chrono::high_resolution_clock::now();
    std::chrono::duration<double> diff1 = e1 - s1;


    // Column Major Traversal.
    auto s2 = std::chrono::high_resolution_clock::now();
    for(int j = 0 ; j < n ; j++){
        for(int i = 0 ; i < n ; i++){
            arr[i * n + j]++;
        }
    }


    auto e2 = std::chrono::high_resolution_clock::now();
    std::chrono::duration<double> diff2 = e2 - s2;


    std::cout << "Flat Row Major - " << diff1.count() << "s\n";
    std::cout << "Flat Column Major - " << diff2.count() << "s\n";


    // to prevent optimization deletion.
    std::cout << "Verify - " << arr[0] << "and" << arr[n*n-1] << "\n";


    return 0;
}

Key Lesson Learned

vector is convenient but slow for large matrices.
Use flat 1D arrays for performance-critical code.
Always traverse in storage order (row-major in C++).
Cache misses are the real bottleneck — profile with perf!

Perf Insights [ n = 10000 ]

Perf flags : perf stat -r 5

Row-Major Traversal

Column Major Travesral

Flat 1D Vector

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
array-traversal-col.cpp		array-traversal-col.cpp
array-traversal-flat-1.cpp		array-traversal-flat-1.cpp
array-traversal.cpp		array-traversal.cpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Matrix-traversal-benchmark ( Cache Locality Benchmark)

Environment

Started With std::vector<std::vector>

Results with vector<vector> ( n = 5000 x 5000 )

Row Major

Column Major

Code Example

Part 2 : Switching to Flat 1D Vector

Results with vector<vector> ( n = 10000 )

Code Example

Key Lesson Learned

Perf Insights [ n = 10000 ]

Row-Major Traversal

Column Major Travesral

Flat 1D Vector

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Matrix-traversal-benchmark ( Cache Locality Benchmark)

Environment

Started With std::vector<std::vector>

Results with vector<vector> ( n = 5000 x 5000 )

Row Major

Column Major

Code Example

Part 2 : Switching to Flat 1D Vector

Results with vector<vector> ( n = 10000 )

Code Example

Key Lesson Learned

Perf Insights [ n = 10000 ]

Row-Major Traversal

Column Major Travesral

Flat 1D Vector

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages