-
Notifications
You must be signed in to change notification settings - Fork 58
Reorder statements to improve spatial locality #43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Looks good. However, I would appreciate more explanation on why prioritizing row-wise checks improves spatial locality. |
|
I have not yet conducted a performance benchmark for this optimization. The improvement was motivated by insights from the following references: After printing the board indices being checked by check_win and check_line_segment_win, we can observe the access patterns: ---- Line type: COL ---- ---- Line type: ROW ---- ---- Line type: PRIMARY ---- ---- Line type: SECONDARY ---- Since the chances of winning via ROW and COLUMN are the same, let's consider a case where the current board already contains a winning sequence like ROW XXX or ROW OOO. Even in this situation, the current implementation still performs a complete COLUMN-major traversal of the board before identifying a winner. Given that the board is only 4×4, the performance loss from accessing non-contiguous memory might not be significant at this scale. However, the access pattern is still worth considering for potential optimization. |
Maybe consider including them in the commit message using the Link: tag?
I agree that, in theory, prioritizing row-wise checks can lead to better cache locality. However, the current commit message only states that spatial locality improves, without explaining WHY. Maybe consider tweaking the commit message to clarify the relationship between row-wise access, memory layout, spatial locality, and potential performance benefits ? |
ddf7e45 to
c77acdb
Compare
|
Instead of appending the references supporting this proposed change, show experimental evidences. |
Experiment 1: Fairly Generated Test Data
To eliminate bias from board position and access order, I applied the Fisher–Yates shuffle to randomly permute the entire board after inserting the winning condition. This ensures the spatial location of the win does not favor any particular access pattern and provides a fair baseline for performance comparison. Since all potential Experiment 2: Biased Sample Distribution (Favoring Rows)
Under this biased condition, the row-major implementation clearly outperforms column-major, with a much wider performance gap. Perf analysis summary:
These results demonstrate a clear performance benefit for row-major storage when handling row-aligned winning conditions. Therefore, if the AI algorithm is designed to favor row-based placements, it can reduce the cost of victory checks and improve overall efficiency. Experiment 3: Simulating Real GameplayThe previous experiments assume "guaranteed wins," which is unrealistic in actual gameplay, where wins often occur only after many turns. For example: O | O | O | O | O
---+---+---+---+---
O | | | X |
---+---+---+---+---
O | | X | |
---+---+---+---+---
O | | X | | X
---+---+---+---+---
X | | | X | In this scenario, scanning from the top-left may require multiple invalid checks before identifying a win at the bottom-left, which can be costly for column-major access. To simulate this, I created two 5 × 5 boards that are transposes of each other:
Both require several invalid checks before locating a winning pattern, representing a more realistic edge-case scenario. O | O | | |
---+---+---+---+---
O | O | | |
---+---+---+---+---
| | | | O
---+---+---+---+---
O | O | | O | O
---+---+---+---+---
O | O | | O | O O | O | | O | O
---+---+---+---+---
O | O | | O | O
---+---+---+---+---
| | | |
---+---+---+---+---
| | | O | O
---+---+---+---+---
| | O | O | OAfter running 1,000K iterations with perf, the metrics were as follows:
Even under near-realistic conditions, row-major consistently outperformed column-major across almost all metrics. Although its cache miss rate is higher, this is due to having nearly half the total cache references — the overall execution time is still significantly lower. Summary & OutlookAcross all three experiments, row-major storage clearly benefits from better spatial locality, especially when the game favors horizontal wins or involves frequent win-checking logic. As a result, future AI algorithms for board games could benefit from encouraging row-oriented placement strategies. This not only simplifies decision-making but also enhances performance, particularly on large boards or scenarios requiring frequent win evaluations. |
[...]
I'm not sure if "column-major storage" here refers to actually storing the board in column-major order. If so, I'm unclear about the point of this experiment, since we're currently using a 1D array in row-major order and aren't planning to change that.
[...]
I'm also unsure about the purpose of this experiment. For fairness, shouldn't we also test column-based wins and prioritize scanning columns?
[...]
This experiment seems more reasonable, but I'm a bit confused - I expected spatial locality to help by reducing cache misses, but the results show more cache misses. The faster execution seems to come from fewer instructions instead.
I didn't really see a clear benefit from the first experiment.
|
|
@visitorckw — just wanted to follow up on this PR when you have time. Appreciate your thoughts! |
Yes, I'm aware of how cache miss rate is calculated. I just initially thought the improved efficiency came from a lower miss rate, not fewer cache references. |
Just to summarize the improvements observed:
Both metrics show a reasonable amount of reduction, Or do you have any suggestions for improving the experiment further? |
|
Please include at least your benchmark observations in the commit message. Currently, the message mainly repeats that the change improves spatial locality. I'm asking because I noticed the instruction count dropped by nearly 50%, so I'm wondering if the performance gain is actually due to fewer instructions being executed, rather than cache behavior alone. |
c77acdb to
e404f67
Compare
Reorder win-check logic to favor row-wise over column-wise checks. This aligns better with C’s row-major memory layout, enhancing spatial locality. It reduces cache line crossings, cuts down total instruction count, and improves sequential memory access— especially useful for large boards or frequent evaluations. Benchmark on a 5×5 board (1M iterations): - Instructions: 20,042,179,354 → 9,528,171,690 (-52.5%) - Cache refs: 205,075 → 115,548 (-43.65%) - Cache misses: 28,503 → 20,176 (-29.23%) - Time elapsed: 0.971s → 0.597s (-38.5%) Despite a minor rise in miss rate, total misses declined due to fewer cache references.
e404f67 to
cfe9548
Compare
|
Thank @yy214123 for contributing! |





Since row-wise and column-wise checks are logically equivalent in terms of win probability, evaluating row-wise directions first is more favorable due to better spatial locality.
This reordering does not alter functional behavior but may lead to more efficient execution in practice.