Skip to content

Conversation

@myieye
Copy link
Collaborator

@myieye myieye commented Nov 28, 2025

In a real project, an exact headword match was show as the approx. 10th item. The reason ended up being that the entry had more gloss text than the entries above it, which causes it to be penalized. I added a test that demonstrates that and was initially failing.

I think it makes sense to give the headword a special status rather than just a higher weight, because the headword is what we show front and center as if it's the primary thing we match against, so I think it should be the primary thing we match against.

So, this PR always puts headword matches above non-headword matches and sorts the headword matches by their length.

It doesn't make any sense to sort bad matches first
because that's what we primarily display - it has a special status not just a higher weight
@github-actions github-actions bot added the 💻 FW Lite issues related to the fw lite application, not miniLcm or crdt related label Nov 28, 2025
@coderabbitai
Copy link

coderabbitai bot commented Nov 28, 2025

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

📝 Walkthrough

Walkthrough

Multiple backend files refactored to standardize writing system handling by replacing WritingSystem object references with WritingSystemId identifiers throughout the sorting and filtering pipeline. MiniLcmRepository ensures default writing systems are populated during entry retrieval, while public APIs simplified by removing ordering directives from method signatures.

Changes

Cohort / File(s) Summary
Writing System Infrastructure
backend/FwLite/LcmCrdt/Data/SqlSortingExtensions.cs, backend/FwLite/LcmCrdt/Data/SetupCollationInterceptor.cs
Updated three method signatures to accept WritingSystemId instead of WritingSystem objects; collation name computation refactored to use WsId property
Repository Layer
backend/FwLite/LcmCrdt/Data/MiniLcmRepository.cs
Introduced EnsureWritingSystemIsPopulated helper to enforce default Vernacular writing system in QueryOptions; simplified FilterAndRank call by removing Ascending parameter; adjusted sorting logic with WritingSystem validation
Full-Text Search Service
backend/FwLite/LcmCrdt/FullTextSearch/EntrySearchService.cs
Removed orderAscending parameter from FilterAndRank public method; consolidated ranking logic into single path with conditional headword length check, Rank, and Id ordering
Test Data
backend/FwLite/MiniLcm.Tests/QueryEntryTestsBase.cs
Added test case with aap entry including related words/glosses to RankedOrder dataset

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Pay special attention to MiniLcmRepository.cs for duplicate EnsureWritingSystemIsPopulated logic and interaction between default writing system population and sorting behavior
  • Verify that removal of orderAscending parameter from FilterAndRank doesn't introduce unwanted ranking order changes across callers
  • Confirm WritingSystemId type substitution maintains proper collation behavior in SqlSortingExtensions

Possibly related PRs

Suggested labels

💻 FW Lite

Suggested reviewers

  • imnasnainaec
  • hahn-kev

Poem

🐰 A rabbit hops through sorting schemes,
WritingSystemIds dance through the beams,
No more ascending flags to hold,
Just pure collation, clean and bold!
✨ The database shines with refactored grace.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description check ✅ Passed The description is directly related to the changeset, explaining the problem (headword matches buried due to gloss text penalties) and the solution (giving headword matches special status).
Title check ✅ Passed The title accurately describes the main structural change: refactoring code to populate the sort writing system earlier in the flow, which is the core technical improvement enabling the fix.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@myieye myieye changed the title Refactor to populate sort ws earlier Make headword matches trump fts rank Nov 28, 2025
@argos-ci
Copy link

argos-ci bot commented Nov 28, 2025

The latest updates on your projects. Learn more about Argos notifications ↗︎

Build Status Details Updated (UTC)
default (Inspect) ✅ No changes detected - Dec 3, 2025, 10:57 AM

@github-actions
Copy link

github-actions bot commented Nov 28, 2025

UI unit Tests

  1 files  ±  0   45 suites  +42   18s ⏱️ +18s
111 tests +101  111 ✅ +101  0 💤 ±0  0 ❌ ±0 
160 runs  +150  160 ✅ +150  0 💤 ±0  0 ❌ ±0 

Results for commit cb8052c. ± Comparison against base commit c0957ed.

♻️ This comment has been updated with latest results.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
backend/FwLite/LcmCrdt/FullTextSearch/EntrySearchService.cs (1)

47-49: Consider extracting the magic number 10_000 as a named constant.

The ranking logic correctly prioritizes headword matches (sorted by length) above non-headword matches. However, the magic number 10_000 could be more self-documenting:

+private const int NonHeadwordMatchPenalty = 10_000;
+
 if (rankResults)
 {
     filtered = filtered
-        .OrderBy(t => SqlHelpers.ContainsIgnoreCaseAccents(t.searchRecord.Headword, query) ? t.searchRecord.Headword.Length : 10_000)
+        .OrderBy(t => SqlHelpers.ContainsIgnoreCaseAccents(t.searchRecord.Headword, query) ? t.searchRecord.Headword.Length : NonHeadwordMatchPenalty)
         .ThenBy(t => Sql.Ext.SQLite().Rank(t.searchRecord)).ThenBy(t => t.entry.Id);
 }
backend/FwLite/LcmCrdt/Data/MiniLcmRepository.cs (1)

208-211: Defensive validation is good, but consider the exception type.

This validation throws ArgumentException if the writing system is missing after EnsureWritingSystemIsPopulated was called. Since GetEntries always calls EnsureWritingSystemIsPopulated first, this acts as a defensive safeguard for direct ApplySorting calls.

Consider whether InvalidOperationException might be more appropriate here, as the issue would be a programming error (caller forgot to populate) rather than an invalid argument per se. This is a minor observation.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c0957ed and 1157079.

📒 Files selected for processing (5)
  • backend/FwLite/LcmCrdt/Data/MiniLcmRepository.cs (4 hunks)
  • backend/FwLite/LcmCrdt/Data/SetupCollationInterceptor.cs (1 hunks)
  • backend/FwLite/LcmCrdt/Data/SqlSortingExtensions.cs (1 hunks)
  • backend/FwLite/LcmCrdt/FullTextSearch/EntrySearchService.cs (2 hunks)
  • backend/FwLite/MiniLcm.Tests/QueryEntryTestsBase.cs (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: imnasnainaec
Repo: sillsdev/languageforge-lexbox PR: 1867
File: platform.bible-extension/src/main.ts:239-246
Timestamp: 2025-07-31T19:10:41.178Z
Learning: In the sillsdev/languageforge-lexbox repository, user imnasnainaec prefers to defer code improvements when there are related TODO comments indicating planned refactoring work, choosing to bundle related changes together rather than making incremental improvements that would need to be modified again during the larger refactoring.
📚 Learning: 2025-06-13T09:25:37.958Z
Learnt from: hahn-kev
Repo: sillsdev/languageforge-lexbox PR: 1698
File: backend/FwLite/LcmCrdt/Data/Filtering.cs:25-35
Timestamp: 2025-06-13T09:25:37.958Z
Learning: In backend/FwLite/LcmCrdt/Data/Filtering.cs `FtsFilter`, the `&&` combination between the FTS `MATCH` result and the `SearchValue` fallback is intentional to maintain existing search behavior; any future change to use `||` (or another approach) will be considered later.

Applied to files:

  • backend/FwLite/LcmCrdt/FullTextSearch/EntrySearchService.cs
🧬 Code graph analysis (3)
backend/FwLite/LcmCrdt/FullTextSearch/EntrySearchService.cs (2)
backend/FwLite/LcmCrdt/SqlHelpers.cs (2)
  • SqlHelpers (10-36)
  • Sql (34-35)
backend/FwLite/MiniLcm/Models/Entry.cs (1)
  • Headword (34-42)
backend/FwLite/LcmCrdt/Data/SetupCollationInterceptor.cs (1)
backend/FwLite/LcmCrdt/Data/SqlSortingExtensions.cs (2)
  • SqlSortingExtensions (6-29)
  • CollationName (23-28)
backend/FwLite/LcmCrdt/Data/MiniLcmRepository.cs (3)
backend/FwLite/FwDataMiniLcmBridge/Api/FwDataMiniLcmApi.cs (2)
  • Order (742-760)
  • WritingSystem (100-116)
backend/FwLite/MiniLcm/Exceptions/NotFoundException.cs (6)
  • NotFoundException (6-44)
  • NotFoundException (8-10)
  • NotFoundException (23-26)
  • NotFoundException (28-31)
  • NotFoundException (33-36)
  • NotFoundException (38-41)
backend/FwLite/MiniLcm/Models/Entry.cs (1)
  • Headword (34-42)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Build FwHeadless / publish-fw-headless
  • GitHub Check: Analyze (csharp)
  • GitHub Check: frontend-component-unit-tests
  • GitHub Check: frontend
  • GitHub Check: Build FW Lite and run tests
🔇 Additional comments (9)
backend/FwLite/LcmCrdt/Data/SetupCollationInterceptor.cs (1)

125-127: LGTM!

The change correctly passes writingSystem.WsId to match the updated CollationName(WritingSystemId wsId) signature, maintaining consistency with the refactor to use WritingSystemId throughout the sorting/collation pipeline.

backend/FwLite/MiniLcm.Tests/QueryEntryTestsBase.cs (1)

389-389: Good test case for validating the headword ranking fix.

This test case directly validates the PR objective: the entry "aap" with a very long gloss that would penalize BM25 ranking now correctly appears first as an exact headword match, followed by shorter headword matches. The expected order "aap,maap,liaap,maap-aap" confirms headword matches are prioritized and sorted by length.

backend/FwLite/LcmCrdt/FullTextSearch/EntrySearchService.cs (1)

32-34: LGTM - Simplified signature.

Removing the orderAscending parameter simplifies the API since ranking now follows a fixed priority order (headword matches first, then BM25 rank).

backend/FwLite/LcmCrdt/Data/MiniLcmRepository.cs (3)

131-132: LGTM - Writing system is now populated early in the flow.

This ensures the writing system is resolved before filtering and sorting operations, aligning with the PR objective to populate the sort writing system earlier.


158-166: LGTM - Clean helper for ensuring writing system is populated.

The method correctly:

  1. Short-circuits if a writing system is already specified
  2. Retrieves the default vernacular writing system
  3. Throws a descriptive exception if none exists
  4. Returns a new QueryOptions with the populated writing system

214-218: LGTM - Sorting logic correctly uses wsId for collation.

The headword sorting now uses the local wsId for both the Headword() call and CollateUnicode(), ensuring consistent writing system handling throughout the sorting pipeline.

backend/FwLite/LcmCrdt/Data/SqlSortingExtensions.cs (3)

11-15: LGTM - Simplified parameter type.

Using WritingSystemId instead of the full WritingSystem object is a cleaner API design since only the identifier is needed for collation purposes.


17-21: LGTM - Expression updated consistently.

The expression signature and lambda correctly use WritingSystemId throughout.


23-28: LGTM - Direct access to wsId.Code is cleaner.

The change from ws.WsId.Code to wsId.Code simplifies the code while maintaining the same collation name format.

@myieye myieye merged commit 91f3f7e into develop Dec 3, 2025
18 of 19 checks passed
@myieye myieye deleted the fix-headword-rank branch December 3, 2025 12:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

💻 FW Lite issues related to the fw lite application, not miniLcm or crdt related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants