Skip to content

Commit aced1da

Browse files
committed
Add false-sharing.cpp
1 parent 859b8f5 commit aced1da

2 files changed

Lines changed: 125 additions & 0 deletions

File tree

content/cpp/false-sharing.md

Lines changed: 124 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,124 @@
1+
> When algorithm cannot go any faster, you exploit the hardware
2+
3+
## Multiple threads
4+
5+
Consider this scenario: you have 2 variables (like ints or long long) and you perform a long running task on each of them. Now to speed things up you use 2 threads hoping they would take half the amount of time.
6+
7+
```cpp
8+
long long x = 0;
9+
long long y = 0;
10+
11+
void increment(long long& a) {
12+
for (int i=0; i<100'000'000; i++) {
13+
a++;
14+
}
15+
}
16+
```
17+
18+
Now measure the time taken when `increment` is invoked on x and y on separate threads.
19+
20+
```cpp
21+
int main() {
22+
auto start = std::chrono::high_resolution_clock::now();
23+
24+
std::thread t1([&](){ increment(a); });
25+
std::thread t2([&](){ increment(b); });
26+
t1.join();
27+
t2.join();
28+
29+
auto end = std::chrono::high_resolution_clock::now();
30+
std::cout << "time: " << std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count() << " ms\n";
31+
32+
return 0;
33+
}
34+
```
35+
36+
Run the program and note the time taken, for my machine this turned out to be close to `500ms`
37+
Another metric we can use here is the IPC or instruction per cycle, which you can get using `perf`
38+
```shell
39+
$ perf stat ./a.out
40+
time: 503 ms
41+
42+
Performance counter stats for './a.out':
43+
44+
893.16 msec task-clock:u # 1.760 CPUs utilized
45+
0 context-switches:u # 0.000 /sec
46+
0 cpu-migrations:u # 0.000 /sec
47+
139 page-faults:u # 155.627 /sec
48+
1,602,565,788 instructions:u # 0.67 insn per cycle
49+
2,385,620,324 cycles:u # 2.671 GHz
50+
200,447,733 branches:u # 224.424 M/sec
51+
14,482 branch-misses:u # 0.01% of all branches
52+
TopdownL1 # 68.2 % tma_backend_bound
53+
# 11.9 % tma_bad_speculation
54+
# 4.3 % tma_frontend_bound
55+
# 15.6 % tma_retiring
56+
57+
0.507340864 seconds time elapsed
58+
59+
0.892937000 seconds user
60+
0.000000000 seconds sys
61+
```
62+
63+
We can see `0.67 insn per cycle`, hmm ok.
64+
65+
## Struct instead of int
66+
67+
Now, let us use this padded struct instead of the long longs which we used earlier
68+
69+
```cpp
70+
struct PaddedStruct {
71+
long long value;
72+
char pad[64 - sizeof(long long)];
73+
};
74+
75+
PaddedStruct pa = {};
76+
PaddedStruct pb = {};
77+
```
78+
79+
Overload the earlier defined function to handle this structure as well
80+
```cpp
81+
void increment(Padding& a) {
82+
for (int i=0; i<100'000'000; i++) {
83+
a.value++;
84+
}
85+
}
86+
```
87+
88+
Now invoke the functions on two thread, similar to what we did earlier
89+
```cpp
90+
std::thread t1([&](){ increment(pa); });
91+
std::thread t2([&](){ increment(pb); });
92+
```
93+
94+
This time, you will notice time takes turns out to be roughly half of what was observed earlier. For my machine, this new time was `300ms`.
95+
Again, we can get the IPC using `perf`
96+
```
97+
$ perf stat ./a.out
98+
time: 297 ms
99+
100+
Performance counter stats for './a.out':
101+
102+
594.66 msec task-clock:u # 1.975 CPUs utilized
103+
0 context-switches:u # 0.000 /sec
104+
0 cpu-migrations:u # 0.000 /sec
105+
138 page-faults:u # 232.066 /sec
106+
1,602,565,643 instructions:u # 1.06 insn per cycle
107+
1,508,069,432 cycles:u # 2.536 GHz
108+
200,447,663 branches:u # 337.080 M/sec
109+
14,506 branch-misses:u # 0.01% of all branches
110+
TopdownL1 # 71.2 % tma_backend_bound
111+
# 1.5 % tma_bad_speculation
112+
# 2.8 % tma_frontend_bound
113+
# 24.6 % tma_retiring
114+
115+
0.301146813 seconds time elapsed
116+
117+
0.594276000 seconds user
118+
0.000000000 seconds sys
119+
```
120+
121+
We can clearly see 1.06 insn per cycle, that is roughly double of what we saw in case of long longs.
122+
123+
124+

content/cpp/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,3 +8,4 @@ title: C++
88
- [[constructors]]
99
- [[templates]]
1010
- [[concepts]]
11+
- [[false-sharing]]

0 commit comments

Comments
 (0)