Skip to content

Expand lookup tables to KWG and cross set computation#33

Closed
olaugh wants to merge 3 commits intomainfrom
claude/expand-lookup-tables-5V0Ae
Closed

Expand lookup tables to KWG and cross set computation#33
olaugh wants to merge 3 commits intomainfrom
claude/expand-lookup-tables-5V0Ae

Conversation

@olaugh
Copy link
Owner

@olaugh olaugh commented Jan 26, 2026

Extract BIT_MASK from movegen.c to shared inc/bit_tables.h and add BITS_BELOW_MASK table for (1U << n) - 1 patterns. This consolidates the 68000 shift optimization across the codebase.

Updated locations:

  • kwg.h: kwg_get_letter_sets() uses BIT_MASK for letter set bitmaps
  • kwg.c: compute_cross_set() uses BIT_MASK for cross set computation
  • klv.c: leave_map_init() uses BIT_MASK and BITS_BELOW_MASK
  • graphics.c/main.c: blanks bitmask operations (consistency)
  • movegen.c: now includes shared header instead of local table

On 68000, variable shifts cost 2 cycles/bit (up to 52 cycles for n=26), while table lookups cost ~8 cycles regardless of shift amount.

https://claude.ai/code/session_01Wfwz5QT34PfaYXp8kK6Q8V

Extract BIT_MASK from movegen.c to shared inc/bit_tables.h and add
BITS_BELOW_MASK table for (1U << n) - 1 patterns. This consolidates
the 68000 shift optimization across the codebase.

Updated locations:
- kwg.h: kwg_get_letter_sets() uses BIT_MASK for letter set bitmaps
- kwg.c: compute_cross_set() uses BIT_MASK for cross set computation
- klv.c: leave_map_init() uses BIT_MASK and BITS_BELOW_MASK
- graphics.c/main.c: blanks bitmask operations (consistency)
- movegen.c: now includes shared header instead of local table

On 68000, variable shifts cost 2 cycles/bit (up to 52 cycles for n=26),
while table lookups cost ~8 cycles regardless of shift amount.

https://claude.ai/code/session_01Wfwz5QT34PfaYXp8kK6Q8V
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR consolidates bit manipulation operations across the codebase by extracting the BIT_MASK lookup table from movegen.c into a shared header file (inc/bit_tables.h) and adding a new BITS_BELOW_MASK table. The optimization targets the Motorola 68000 architecture where variable shifts are significantly slower (2 cycles/bit, up to 52 cycles) than table lookups (~8 cycles).

Changes:

  • Extracted BIT_MASK table to shared inc/bit_tables.h header and added BITS_BELOW_MASK table for (1U << n) - 1 patterns
  • Updated cross-set computation, leave map indexing, and blank tracking to use lookup tables instead of shift operations
  • Consolidated the 68000 optimization pattern across 6 source files

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
inc/bit_tables.h New shared header defining BIT_MASK[32] and BITS_BELOW_MASK[33] lookup tables with documentation
src/movegen.c Removed local BIT_MASK definition, now includes shared header
inc/kwg.h Uses BIT_MASK in kwg_get_letter_sets() for letter set bitmap operations
src/kwg.c Uses BIT_MASK in compute_cross_set() for cross-set computation
src/klv.c Uses BIT_MASK and BITS_BELOW_MASK in leave_map_init() for rack indexing
src/graphics.c Uses BIT_MASK for blank bitmask operations in draw_history()
src/main.c Uses BIT_MASK for blank bitmask operations in add_to_history()

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

inc/bit_tables.h Outdated
Comment on lines +23 to +52
static const uint32_t BIT_MASK[32] = {
0x00000001, 0x00000002, 0x00000004, 0x00000008,
0x00000010, 0x00000020, 0x00000040, 0x00000080,
0x00000100, 0x00000200, 0x00000400, 0x00000800,
0x00001000, 0x00002000, 0x00004000, 0x00008000,
0x00010000, 0x00020000, 0x00040000, 0x00080000,
0x00100000, 0x00200000, 0x00400000, 0x00800000,
0x01000000, 0x02000000, 0x04000000, 0x08000000,
0x10000000, 0x20000000, 0x40000000, 0x80000000
};

/*
* BITS_BELOW_MASK: Mask with all bits below n set.
* BITS_BELOW_MASK[n] = (1U << n) - 1 = bits 0..(n-1) set
* Valid for n in [0, 32].
* BITS_BELOW_MASK[0] = 0 (no bits set)
* BITS_BELOW_MASK[7] = 0x7F (bits 0-6 set)
* BITS_BELOW_MASK[32] = 0xFFFFFFFF (all bits set)
*/
static const uint32_t BITS_BELOW_MASK[33] = {
0x00000000, /* n=0: no bits */
0x00000001, 0x00000003, 0x00000007, 0x0000000F,
0x0000001F, 0x0000003F, 0x0000007F, 0x000000FF,
0x000001FF, 0x000003FF, 0x000007FF, 0x00000FFF,
0x00001FFF, 0x00003FFF, 0x00007FFF, 0x0000FFFF,
0x0001FFFF, 0x0003FFFF, 0x0007FFFF, 0x000FFFFF,
0x001FFFFF, 0x003FFFFF, 0x007FFFFF, 0x00FFFFFF,
0x01FFFFFF, 0x03FFFFFF, 0x07FFFFFF, 0x0FFFFFFF,
0x1FFFFFFF, 0x3FFFFFFF, 0x7FFFFFFF, 0xFFFFFFFF
};
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of static const arrays in this header creates a separate copy in each translation unit that includes it. The existing codebase uses extern const for data tables (see TILE_SCORES, IS_VOWEL, TILE_COUNTS in scrabble.h). While static const in headers is a valid pattern for small lookup tables used in performance-critical code, it deviates from the established convention in this codebase and will increase binary size.

Consider using extern const declarations here with definitions in a corresponding bit_tables.c file to match the existing pattern. This would reduce code duplication across object files while still allowing the compiler to optimize access patterns.

Copilot uses AI. Check for mistakes.
@github-actions
Copy link

github-actions bot commented Jan 26, 2026

Speed Test Benchmark Results

Performance Comparison

Metric Test (refs/pull/33/merge) Baseline (main) Change
NWL23 Hybrid 285.8 frames 285.8 frames 0%
CSW24 Hybrid 315.3 frames 315.3 frames 0%

Detailed Results

Test Results (refs/pull/33/merge)
{
  "num_games": 50,
  "lexicons": [
    {
      "name": "NWL23",
      "shadow_mean": 308.2,
      "noshadow_mean": 332.6,
      "hybrid_mean": 285.8,
      "move_count": 1148
    },
    {
      "name": "CSW24",
      "shadow_mean": 335.4,
      "noshadow_mean": 386.4,
      "hybrid_mean": 315.3,
      "move_count": 1091
    }
  ]
}
Baseline Results (main)
{
  "num_games": 50,
  "lexicons": [
    {
      "name": "NWL23",
      "shadow_mean": 308.2,
      "noshadow_mean": 332.6,
      "hybrid_mean": 285.8,
      "move_count": 1148
    },
    {
      "name": "CSW24",
      "shadow_mean": 335.4,
      "noshadow_mean": 386.4,
      "hybrid_mean": 315.3,
      "move_count": 1091
    }
  ]
}

View full workflow run

Address review feedback: the existing codebase uses extern const
declarations in headers with definitions in .c files (see TILE_SCORES,
IS_VOWEL in scrabble.h). Using static const in headers creates duplicate
copies in each translation unit, increasing binary size.

Changes:
- inc/bit_tables.h: Changed to extern const declarations
- src/bit_tables.c: New file with table definitions
- tools/genesis.bzl: Added bit_tables.c to sources and bit_tables.h to hdrs

https://claude.ai/code/session_01Wfwz5QT34PfaYXp8kK6Q8V
The Makefile has a separate hardcoded list of source files for native
builds (test_batch, test_native, etc.). Add bit_tables.c to this list
so the linker can find BIT_MASK and BITS_BELOW_MASK symbols.

https://claude.ai/code/session_01Wfwz5QT34PfaYXp8kK6Q8V
@olaugh
Copy link
Owner Author

olaugh commented Jan 26, 2026

no speedup, closing

@olaugh olaugh closed this Jan 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants