Make headword matches trump fts rank #2112

myieye · 2025-11-28T19:21:56Z

In a real project, an exact headword match was show as the approx. 10th item. The reason ended up being that the entry had more gloss text than the entries above it, which causes it to be penalized. I added a test that demonstrates that and was initially failing.

I think it makes sense to give the headword a special status rather than just a higher weight, because the headword is what we show front and center as if it's the primary thing we match against, so I think it should be the primary thing we match against.

So, this PR always puts headword matches above non-headword matches and sorts the headword matches by their length.

It doesn't make any sense to sort bad matches first

because that's what we primarily display - it has a special status not just a higher weight

coderabbitai · 2025-11-28T19:22:08Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

📝 Walkthrough

Walkthrough

Multiple backend files refactored to standardize writing system handling by replacing WritingSystem object references with WritingSystemId identifiers throughout the sorting and filtering pipeline. MiniLcmRepository ensures default writing systems are populated during entry retrieval, while public APIs simplified by removing ordering directives from method signatures.

Changes

Cohort / File(s)	Summary
Writing System Infrastructure `backend/FwLite/LcmCrdt/Data/SqlSortingExtensions.cs`, `backend/FwLite/LcmCrdt/Data/SetupCollationInterceptor.cs`	Updated three method signatures to accept WritingSystemId instead of WritingSystem objects; collation name computation refactored to use WsId property
Repository Layer `backend/FwLite/LcmCrdt/Data/MiniLcmRepository.cs`	Introduced EnsureWritingSystemIsPopulated helper to enforce default Vernacular writing system in QueryOptions; simplified FilterAndRank call by removing Ascending parameter; adjusted sorting logic with WritingSystem validation
Full-Text Search Service `backend/FwLite/LcmCrdt/FullTextSearch/EntrySearchService.cs`	Removed orderAscending parameter from FilterAndRank public method; consolidated ranking logic into single path with conditional headword length check, Rank, and Id ordering
Test Data `backend/FwLite/MiniLcm.Tests/QueryEntryTestsBase.cs`	Added test case with aap entry including related words/glosses to RankedOrder dataset

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Pay special attention to MiniLcmRepository.cs for duplicate EnsureWritingSystemIsPopulated logic and interaction between default writing system population and sorting behavior
Verify that removal of orderAscending parameter from FilterAndRank doesn't introduce unwanted ranking order changes across callers
Confirm WritingSystemId type substitution maintains proper collation behavior in SqlSortingExtensions

Possibly related PRs

Fix search + missing senses #2006: Modifies MiniLcmRepository.cs GetEntries/filtering pipeline with changes to how exemplar/gridify filtering interacts with FTS
Fix default crdt WS doesn't respect order #2008: Updates default writing system resolution using WritingSystemsOrdered collection, affecting order resolution and lookup behavior

Suggested labels

💻 FW Lite

Suggested reviewers

imnasnainaec
hahn-kev

Poem

🐰 A rabbit hops through sorting schemes,
WritingSystemIds dance through the beams,
No more ascending flags to hold,
Just pure collation, clean and bold!
✨ The database shines with refactored grace.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description check	✅ Passed	The description is directly related to the changeset, explaining the problem (headword matches buried due to gloss text penalties) and the solution (giving headword matches special status).
Title check	✅ Passed	The title accurately describes the main structural change: refactoring code to populate the sort writing system earlier in the flow, which is the core technical improvement enabling the fix.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

argos-ci · 2025-11-28T19:23:35Z

The latest updates on your projects. Learn more about Argos notifications ↗︎

Build	Status	Details	Updated (UTC)
default (Inspect)	✅ No changes detected	-	Dec 3, 2025, 10:57 AM

github-actions · 2025-11-28T19:24:11Z

UI unit Tests

1 files ± 0 45 suites +42 18s ⏱️ +18s
111 tests +101 111 ✅ +101 0 💤 ±0 0 ❌ ±0
160 runs +150 160 ✅ +150 0 💤 ±0 0 ❌ ±0

Results for commit cb8052c. ± Comparison against base commit c0957ed.

♻️ This comment has been updated with latest results.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

backend/FwLite/LcmCrdt/FullTextSearch/EntrySearchService.cs (1)
47-49: Consider extracting the magic number 10_000 as a named constant.

The ranking logic correctly prioritizes headword matches (sorted by length) above non-headword matches. However, the magic number 10_000 could be more self-documenting:
+private const int NonHeadwordMatchPenalty = 10_000;
+
 if (rankResults)
 {
     filtered = filtered
-        .OrderBy(t => SqlHelpers.ContainsIgnoreCaseAccents(t.searchRecord.Headword, query) ? t.searchRecord.Headword.Length : 10_000)
+        .OrderBy(t => SqlHelpers.ContainsIgnoreCaseAccents(t.searchRecord.Headword, query) ? t.searchRecord.Headword.Length : NonHeadwordMatchPenalty)
         .ThenBy(t => Sql.Ext.SQLite().Rank(t.searchRecord)).ThenBy(t => t.entry.Id);
 }
backend/FwLite/LcmCrdt/Data/MiniLcmRepository.cs (1)

208-211: Defensive validation is good, but consider the exception type.

This validation throws ArgumentException if the writing system is missing after EnsureWritingSystemIsPopulated was called. Since GetEntries always calls EnsureWritingSystemIsPopulated first, this acts as a defensive safeguard for direct ApplySorting calls.

Consider whether InvalidOperationException might be more appropriate here, as the issue would be a programming error (caller forgot to populate) rather than an invalid argument per se. This is a minor observation.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c0957ed and 1157079.

📒 Files selected for processing (5)

backend/FwLite/LcmCrdt/Data/MiniLcmRepository.cs (4 hunks)
backend/FwLite/LcmCrdt/Data/SetupCollationInterceptor.cs (1 hunks)
backend/FwLite/LcmCrdt/Data/SqlSortingExtensions.cs (1 hunks)
backend/FwLite/LcmCrdt/FullTextSearch/EntrySearchService.cs (2 hunks)
backend/FwLite/MiniLcm.Tests/QueryEntryTestsBase.cs (1 hunks)

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: imnasnainaec
Repo: sillsdev/languageforge-lexbox PR: 1867
File: platform.bible-extension/src/main.ts:239-246
Timestamp: 2025-07-31T19:10:41.178Z
Learning: In the sillsdev/languageforge-lexbox repository, user imnasnainaec prefers to defer code improvements when there are related TODO comments indicating planned refactoring work, choosing to bundle related changes together rather than making incremental improvements that would need to be modified again during the larger refactoring.

📚 Learning: 2025-06-13T09:25:37.958Z

Learnt from: hahn-kev
Repo: sillsdev/languageforge-lexbox PR: 1698
File: backend/FwLite/LcmCrdt/Data/Filtering.cs:25-35
Timestamp: 2025-06-13T09:25:37.958Z
Learning: In backend/FwLite/LcmCrdt/Data/Filtering.cs `FtsFilter`, the `&&` combination between the FTS `MATCH` result and the `SearchValue` fallback is intentional to maintain existing search behavior; any future change to use `||` (or another approach) will be considered later.

Applied to files:

backend/FwLite/LcmCrdt/FullTextSearch/EntrySearchService.cs

🧬 Code graph analysis (3)

backend/FwLite/LcmCrdt/FullTextSearch/EntrySearchService.cs (2)

backend/FwLite/LcmCrdt/SqlHelpers.cs (2)

SqlHelpers (10-36)

Sql (34-35)

backend/FwLite/MiniLcm/Models/Entry.cs (1)

Headword (34-42)

backend/FwLite/LcmCrdt/Data/SetupCollationInterceptor.cs (1)

backend/FwLite/LcmCrdt/Data/SqlSortingExtensions.cs (2)

SqlSortingExtensions (6-29)

CollationName (23-28)

backend/FwLite/LcmCrdt/Data/MiniLcmRepository.cs (3)

backend/FwLite/FwDataMiniLcmBridge/Api/FwDataMiniLcmApi.cs (2)

Order (742-760)

WritingSystem (100-116)

backend/FwLite/MiniLcm/Exceptions/NotFoundException.cs (6)

NotFoundException (6-44)

NotFoundException (8-10)

NotFoundException (23-26)

NotFoundException (28-31)

NotFoundException (33-36)

NotFoundException (38-41)

backend/FwLite/MiniLcm/Models/Entry.cs (1)

Headword (34-42)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Build FwHeadless / publish-fw-headless
GitHub Check: Analyze (csharp)
GitHub Check: frontend-component-unit-tests
GitHub Check: frontend
GitHub Check: Build FW Lite and run tests

🔇 Additional comments (9)

backend/FwLite/LcmCrdt/Data/SetupCollationInterceptor.cs (1)

125-127: LGTM!

The change correctly passes writingSystem.WsId to match the updated CollationName(WritingSystemId wsId) signature, maintaining consistency with the refactor to use WritingSystemId throughout the sorting/collation pipeline.

backend/FwLite/MiniLcm.Tests/QueryEntryTestsBase.cs (1)

389-389: Good test case for validating the headword ranking fix.

This test case directly validates the PR objective: the entry "aap" with a very long gloss that would penalize BM25 ranking now correctly appears first as an exact headword match, followed by shorter headword matches. The expected order "aap,maap,liaap,maap-aap" confirms headword matches are prioritized and sorted by length.

backend/FwLite/LcmCrdt/FullTextSearch/EntrySearchService.cs (1)

32-34: LGTM - Simplified signature.

Removing the orderAscending parameter simplifies the API since ranking now follows a fixed priority order (headword matches first, then BM25 rank).

backend/FwLite/LcmCrdt/Data/MiniLcmRepository.cs (3)

131-132: LGTM - Writing system is now populated early in the flow.

This ensures the writing system is resolved before filtering and sorting operations, aligning with the PR objective to populate the sort writing system earlier.

158-166: LGTM - Clean helper for ensuring writing system is populated.

The method correctly:

Short-circuits if a writing system is already specified

Retrieves the default vernacular writing system

Throws a descriptive exception if none exists

Returns a new QueryOptions with the populated writing system

214-218: LGTM - Sorting logic correctly uses wsId for collation.

The headword sorting now uses the local wsId for both the Headword() call and CollateUnicode(), ensuring consistent writing system handling throughout the sorting pipeline.

backend/FwLite/LcmCrdt/Data/SqlSortingExtensions.cs (3)

11-15: LGTM - Simplified parameter type.

Using WritingSystemId instead of the full WritingSystem object is a cleaner API design since only the identifier is needed for collation purposes.

17-21: LGTM - Expression updated consistently.

The expression signature and lambda correctly use WritingSystemId throughout.

23-28: LGTM - Direct access to wsId.Code is cleaner.

The change from ws.WsId.Code to wsId.Code simplifies the code while maintaining the same collation name format.

myieye added 4 commits November 28, 2025 16:44

Refactor to populate sort ws earlier

ab567b7

Add fts test with undesireable headword order

12b24d6

Remove order from rank sorting

d7c7cc0

It doesn't make any sense to sort bad matches first

Always prefer headword matches

1157079

because that's what we primarily display - it has a special status not just a higher weight

github-actions bot added the 💻 FW Lite issues related to the fw lite application, not miniLcm or crdt related label Nov 28, 2025

myieye changed the title ~~Refactor to populate sort ws earlier~~ Make headword matches trump fts rank Nov 28, 2025

coderabbitai bot reviewed Nov 28, 2025

View reviewed changes

myieye added 2 commits December 1, 2025 09:31

Add clarifying comments and use MaxInt instead of magic number

2315adc

Then sort matching fts headwords by text

cb8052c

myieye merged commit 91f3f7e into develop Dec 3, 2025
18 of 19 checks passed

myieye deleted the fix-headword-rank branch December 3, 2025 12:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Make headword matches trump fts rank #2112

Make headword matches trump fts rank #2112

Uh oh!

myieye commented Nov 28, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Nov 28, 2025 •

edited

Loading

Review skipped

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

argos-ci bot commented Nov 28, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Nov 28, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Make headword matches trump fts rank #2112

Make headword matches trump fts rank #2112

Uh oh!

Conversation

myieye commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

argos-ci bot commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

UI unit Tests

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

myieye commented Nov 28, 2025 •

edited

Loading

coderabbitai bot commented Nov 28, 2025 •

edited

Loading

argos-ci bot commented Nov 28, 2025 •

edited

Loading

github-actions bot commented Nov 28, 2025 •

edited

Loading