Expand lookup tables to KWG and cross set computation#33
Expand lookup tables to KWG and cross set computation#33
Conversation
Extract BIT_MASK from movegen.c to shared inc/bit_tables.h and add BITS_BELOW_MASK table for (1U << n) - 1 patterns. This consolidates the 68000 shift optimization across the codebase. Updated locations: - kwg.h: kwg_get_letter_sets() uses BIT_MASK for letter set bitmaps - kwg.c: compute_cross_set() uses BIT_MASK for cross set computation - klv.c: leave_map_init() uses BIT_MASK and BITS_BELOW_MASK - graphics.c/main.c: blanks bitmask operations (consistency) - movegen.c: now includes shared header instead of local table On 68000, variable shifts cost 2 cycles/bit (up to 52 cycles for n=26), while table lookups cost ~8 cycles regardless of shift amount. https://claude.ai/code/session_01Wfwz5QT34PfaYXp8kK6Q8V
There was a problem hiding this comment.
Pull request overview
This PR consolidates bit manipulation operations across the codebase by extracting the BIT_MASK lookup table from movegen.c into a shared header file (inc/bit_tables.h) and adding a new BITS_BELOW_MASK table. The optimization targets the Motorola 68000 architecture where variable shifts are significantly slower (2 cycles/bit, up to 52 cycles) than table lookups (~8 cycles).
Changes:
- Extracted BIT_MASK table to shared inc/bit_tables.h header and added BITS_BELOW_MASK table for (1U << n) - 1 patterns
- Updated cross-set computation, leave map indexing, and blank tracking to use lookup tables instead of shift operations
- Consolidated the 68000 optimization pattern across 6 source files
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| inc/bit_tables.h | New shared header defining BIT_MASK[32] and BITS_BELOW_MASK[33] lookup tables with documentation |
| src/movegen.c | Removed local BIT_MASK definition, now includes shared header |
| inc/kwg.h | Uses BIT_MASK in kwg_get_letter_sets() for letter set bitmap operations |
| src/kwg.c | Uses BIT_MASK in compute_cross_set() for cross-set computation |
| src/klv.c | Uses BIT_MASK and BITS_BELOW_MASK in leave_map_init() for rack indexing |
| src/graphics.c | Uses BIT_MASK for blank bitmask operations in draw_history() |
| src/main.c | Uses BIT_MASK for blank bitmask operations in add_to_history() |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
inc/bit_tables.h
Outdated
| static const uint32_t BIT_MASK[32] = { | ||
| 0x00000001, 0x00000002, 0x00000004, 0x00000008, | ||
| 0x00000010, 0x00000020, 0x00000040, 0x00000080, | ||
| 0x00000100, 0x00000200, 0x00000400, 0x00000800, | ||
| 0x00001000, 0x00002000, 0x00004000, 0x00008000, | ||
| 0x00010000, 0x00020000, 0x00040000, 0x00080000, | ||
| 0x00100000, 0x00200000, 0x00400000, 0x00800000, | ||
| 0x01000000, 0x02000000, 0x04000000, 0x08000000, | ||
| 0x10000000, 0x20000000, 0x40000000, 0x80000000 | ||
| }; | ||
|
|
||
| /* | ||
| * BITS_BELOW_MASK: Mask with all bits below n set. | ||
| * BITS_BELOW_MASK[n] = (1U << n) - 1 = bits 0..(n-1) set | ||
| * Valid for n in [0, 32]. | ||
| * BITS_BELOW_MASK[0] = 0 (no bits set) | ||
| * BITS_BELOW_MASK[7] = 0x7F (bits 0-6 set) | ||
| * BITS_BELOW_MASK[32] = 0xFFFFFFFF (all bits set) | ||
| */ | ||
| static const uint32_t BITS_BELOW_MASK[33] = { | ||
| 0x00000000, /* n=0: no bits */ | ||
| 0x00000001, 0x00000003, 0x00000007, 0x0000000F, | ||
| 0x0000001F, 0x0000003F, 0x0000007F, 0x000000FF, | ||
| 0x000001FF, 0x000003FF, 0x000007FF, 0x00000FFF, | ||
| 0x00001FFF, 0x00003FFF, 0x00007FFF, 0x0000FFFF, | ||
| 0x0001FFFF, 0x0003FFFF, 0x0007FFFF, 0x000FFFFF, | ||
| 0x001FFFFF, 0x003FFFFF, 0x007FFFFF, 0x00FFFFFF, | ||
| 0x01FFFFFF, 0x03FFFFFF, 0x07FFFFFF, 0x0FFFFFFF, | ||
| 0x1FFFFFFF, 0x3FFFFFFF, 0x7FFFFFFF, 0xFFFFFFFF | ||
| }; |
There was a problem hiding this comment.
The use of static const arrays in this header creates a separate copy in each translation unit that includes it. The existing codebase uses extern const for data tables (see TILE_SCORES, IS_VOWEL, TILE_COUNTS in scrabble.h). While static const in headers is a valid pattern for small lookup tables used in performance-critical code, it deviates from the established convention in this codebase and will increase binary size.
Consider using extern const declarations here with definitions in a corresponding bit_tables.c file to match the existing pattern. This would reduce code duplication across object files while still allowing the compiler to optimize access patterns.
Speed Test Benchmark ResultsPerformance Comparison
Detailed ResultsTest Results (refs/pull/33/merge){
"num_games": 50,
"lexicons": [
{
"name": "NWL23",
"shadow_mean": 308.2,
"noshadow_mean": 332.6,
"hybrid_mean": 285.8,
"move_count": 1148
},
{
"name": "CSW24",
"shadow_mean": 335.4,
"noshadow_mean": 386.4,
"hybrid_mean": 315.3,
"move_count": 1091
}
]
}Baseline Results (main){
"num_games": 50,
"lexicons": [
{
"name": "NWL23",
"shadow_mean": 308.2,
"noshadow_mean": 332.6,
"hybrid_mean": 285.8,
"move_count": 1148
},
{
"name": "CSW24",
"shadow_mean": 335.4,
"noshadow_mean": 386.4,
"hybrid_mean": 315.3,
"move_count": 1091
}
]
} |
Address review feedback: the existing codebase uses extern const declarations in headers with definitions in .c files (see TILE_SCORES, IS_VOWEL in scrabble.h). Using static const in headers creates duplicate copies in each translation unit, increasing binary size. Changes: - inc/bit_tables.h: Changed to extern const declarations - src/bit_tables.c: New file with table definitions - tools/genesis.bzl: Added bit_tables.c to sources and bit_tables.h to hdrs https://claude.ai/code/session_01Wfwz5QT34PfaYXp8kK6Q8V
The Makefile has a separate hardcoded list of source files for native builds (test_batch, test_native, etc.). Add bit_tables.c to this list so the linker can find BIT_MASK and BITS_BELOW_MASK symbols. https://claude.ai/code/session_01Wfwz5QT34PfaYXp8kK6Q8V
|
no speedup, closing |
Extract BIT_MASK from movegen.c to shared inc/bit_tables.h and add BITS_BELOW_MASK table for (1U << n) - 1 patterns. This consolidates the 68000 shift optimization across the codebase.
Updated locations:
On 68000, variable shifts cost 2 cycles/bit (up to 52 cycles for n=26), while table lookups cost ~8 cycles regardless of shift amount.
https://claude.ai/code/session_01Wfwz5QT34PfaYXp8kK6Q8V