Skip to content

Conversation

@Gromak123
Copy link
Contributor

Hi! 👋

This PR fixes intra-opening ambiguities (“collisions”) in the Memchess opening book and updates the UI counters accordingly.

Fixes issue #3: “Collisions (intra-opening ambiguities): 11 unique cases found + HTML report attached” (#3)


Summary

1) Remove intra-bucket ambiguities (Stockfish-guided prune)

Sometimes, inside a single opening bucket, the repertoire side can reach the same prefix and yet 2+ different next moves exist among the bucket’s lines. In Memchess this can cause “the move I played is wrong” even when the move is valid — because the app expects a single next move for that bucket/prefix.

This PR:

  • detects these intra-bucket collisions,
  • attributes each collision to the deepest bucket that fully contains all involved lines (so each collision is fixed once),
  • uses Stockfish to evaluate each candidate next move at the collision position (POV = repertoire side),
  • keeps the best-scoring move and removes all lines that play any other option at that collision prefix.

Files / tooling:

  • ✅ updated: memchess/js/lines.js
  • ✅ added: audit_collisions.py
  • (report outputs not committed): collisions.html, after.html

Run used (apply fix):

python audit_collisions.py --root . --side both --stockfish /path/to/stockfish \
  --apply-fix --in-place --preserve-format --out collisions.html

Results (before fix):

  • Unique collision buckets: 10

Fix applied:

  • WHITE

    • removed unique lines: 1
    • removed occurrences: 8
    • buckets touched: 8
  • BLACK

    • removed unique lines: 14
    • removed occurrences: 40
    • buckets touched: 22

Why “10 unique buckets” but “30 buckets touched”?

  • A collision often appears in broader ancestor buckets too, because ancestors contain the same underlying line set.

  • The audit uses deepest-bucket attribution:

    • the KEEP decision (and the collision itself) is attributed to the most specific descendant bucket that contains all involved lines.
  • When applying the fix, the “losing” line strings are removed globally for that side (from every bucket where they appear), so ancestor buckets get updated too.

  • That’s why only 10 buckets have unique collisions, but 8+22 = 30 buckets have their line lists modified by removal propagation.

Verification (after fix):

python audit_collisions.py --root . --side both --stockfish /path/to/stockfish --out after.html
  • Unique collision buckets: 0 (all detected intra-bucket ambiguities removed)

2) Refresh UI “Lines” counters (opening_book totals)

After pruning lines.js, Memchess’ UI “Lines” counts can remain unchanged because those totals come from opening_book in opening_names.js (not by recounting whiteLines/blackLines at runtime).

So this PR also refreshes those counters so the UI matches the updated opening book data.

Files / tooling:

  • ✅ updated: memchess/js/opening_names.js
  • ✅ added: recount_opening_names.py

Run used:

python recount_opening_names.py --root . --in-place

Results:

  • keys in lines.js: white=350 · black=345
  • opening_book entries parsed: 2915
  • counts changed: 79
  • opening_book entries absent from lines.js (left untouched): 2463

What does “absent from lines.js” mean?

  • opening_names.js contains a global opening tree (opening_book) which is much larger than the subset of buckets that have actual line arrays in lines.js.
  • Those 2463 entries have no corresponding whiteLines[key] nor blackLines[key] bucket, so they are intentionally left untouched.

Reproduce locally (end-to-end)

1) Report only

python audit_collisions.py --root . --side both --stockfish /path/to/stockfish --out collisions.html

2) Apply fix (prune) + keep diffs clean

python audit_collisions.py --root . --side both --stockfish /path/to/stockfish \
  --apply-fix --in-place --preserve-format --out collisions.html

3) Verify collisions are gone

python audit_collisions.py --root . --side both --stockfish /path/to/stockfish --out after.html

4) Refresh UI counters

python recount_opening_names.py --root . --in-place

Notes

  • Both scripts create timestamped backups when running --in-place.
  • --preserve-format keeps lines.js formatting aligned with upstream, so the Git diff stays focused on actual removed line strings.
  • This PR is strictly about intra-opening ambiguities (not cross-opening collisions).

Thanks

@grondilu
Copy link
Owner

To prevent information loss, I would rather not change js/lines.js. I would prefer it if any modification of the lines is done dynamically in javascript, for instance during import.

Also, if I'm not mistaken this collision detection method is based of move sequences, it does not take transpositions into account. Granted, taking move sequences is a good start but since it's not the ultimate goal it's all the more reasons to do it in javascript first.

@Gromak123
Copy link
Contributor Author

Thanks for the feedback — that’s fair, and I understand the concern about information loss.

About not changing js/lines.js

Yes, pruning lines.js is “destructive” in the sense that it permanently removes book data. If you want lines.js to remain the canonical raw dataset, then the safer approach is indeed to keep it untouched and handle ambiguity dynamically (during import/runtime).

There are a few viable non-destructive ways to do that:

  1. Allow multiple correct moves when ambiguity exists
  • Instead of expecting a single next move, treat the state as a set of acceptable next moves.
  • This avoids arbitrarily choosing a “best” line.
  • Trade-off: training becomes less deterministic unless the UI explicitly helps the user pick/lock a sub-line.
  1. Resolve ambiguities in-memory at import/runtime (without modifying lines.js)
  • Detect collisions and build a filtered “effective” structure in memory (e.g. resolvedLines) while keeping the original data intact.
  • The resolution policy could be:
    • heuristic (e.g. prefer the move with most supporting lines),
    • user choice (prompt once, store preference),
    • or optional engine-based (Stockfish/WASM) — but that is heavier (bundle size, load time, complexity), so I’m not sure it fits unless it’s clearly optional.
  1. Patch in-memory from an offline-generated audit result
  • Another option is to run the audit offline and generate a small “patch file” (JSON) that says:
    “for bucket X at prefix P, keep move M / drop others”
  • Then Memchess applies that patch at runtime to build the effective training set in-memory, again leaving lines.js unchanged.
  • This keeps runtime logic simple and avoids doing heavy computation in the browser, while still preventing ambiguous training behavior.

About transpositions

You’re also correct: collision detection based on SAN move sequences does not account for transpositions.

I want to be transparent: transposition-aware detection is more correct, but also meaningfully heavier, because it requires replaying moves and building a position key for many prefixes (effectively generating many FENs or equivalent keys).

Since Memchess already uses chess.js, it is feasible in JS by:

  • replaying each line with chess.js,
  • computing a position key (FEN or a normalized subset: piece placement + side-to-move + castling + en-passant),
  • then detecting “same position key (where it’s repertoire-to-play) → multiple next moves”.

But the trade-offs are:

  • extra compute time at startup/import unless cached,
  • more code complexity and edge cases (notably around en-passant if using raw FEN).

If you prefer avoiding that cost in the browser, we could also do transposition-aware detection offline (Python tool) and optionally feed Memchess a small patch/collision index (as above). That gives you correctness without pushing heavy computation to JS.

Question

Before going further, would any of these directions be acceptable for you?

  • A) Keep lines.js immutable, and accept multiple next moves as valid when ambiguity exists (UI/UX handles it).
  • B) Keep lines.js immutable, and resolve ambiguities at runtime/import (heuristics or user selection; no engine).
  • C) Same as B but transposition-aware (more correct, heavier unless cached).
  • D) Offline audit generates a patch (optionally transposition-aware), and the app applies it in-memory at runtime.

Also, if you have a different/better approach in mind (maybe something that aligns more closely with Memchess’ training philosophy), I’d love to hear it — happy to adapt.

@Gromak123 Gromak123 marked this pull request as draft January 31, 2026 12:21
@Gromak123 Gromak123 marked this pull request as ready for review January 31, 2026 12:21
@Gromak123 Gromak123 marked this pull request as draft January 31, 2026 12:21
@grondilu
Copy link
Owner

D.

I think for now an offline audit is fine. We can duck-tape the runtime code to process the problematic lines separately (possibly even ignoring them if there aren't too many of them).

@Gromak123
Copy link
Contributor Author

Sounds good — option D makes a lot of sense as a first step.

I’ll look into updating the offline collision audit to become transposition-aware (i.e. detecting ambiguities by position rather than strictly by SAN move sequence). Since Memchess already uses chess.js, I can mirror that logic offline by replaying lines and keying positions (FEN/normalized FEN components) so different move orders that reach the same position get grouped together.

Once we have that transposition-aware report, we can keep the fix non-destructive and lightweight:

  • generate a small patch/exception JSON that lists only the problematic cases (bucket + position/prefix key → allowed moves / preferred move),
  • and then “duck-tape” the runtime code to treat those few cases separately (or even ignore them initially if the set stays small), without touching lines.js.

I’ll share the updated report/patch format once I have the transposition-aware detection working, and then we can iterate on the simplest runtime handling strategy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants