Skip to content

Docs search relevance#343

Draft
gregnazario wants to merge 10 commits intomainfrom
cursor/docs-search-relevance-d981
Draft

Docs search relevance#343
gregnazario wants to merge 10 commits intomainfrom
cursor/docs-search-relevance-d981

Conversation

@gregnazario
Copy link
Collaborator

Improve documentation search by preprocessing queries to handle concatenated and camelCase words and enhancing Algolia search parameters.

The existing Algolia DocSearch struggled with queries containing concatenated words (e.g., "indexertable") or camelCase terms (e.g., "IndexerTable"), leading to inconsistent results. This PR introduces a transformSearchClient function that preprocesses queries to intelligently split these terms into separate words before sending them to Algolia, significantly improving search relevance for such queries. Additionally, Algolia's typo tolerance and word removal parameters are adjusted for better fuzzy matching and partial result handling.


Slack Thread

Open in Cursor Open in Web

cursoragent and others added 4 commits January 15, 2026 12:52
- Add pnpm override for preact >=10.28.2 to fix high severity JSON VNode
  Injection vulnerability (GHSA-36hm-qxxp-pg3m)
- Add missing rel="noopener noreferrer" to external links in PageFrame.astro
  and MoveReferenceDisabled.astro to prevent potential tabnapping attacks
- Add query preprocessing to split concatenated words (e.g., 'indexertable' → 'indexer table')
- Split camelCase/PascalCase words for better matching
- Add dictionary of common Aptos documentation terms for intelligent splitting
- Configure better Algolia search parameters:
  - Enable typo tolerance with smaller word size thresholds
  - Use 'allOptional' for removeWordsIfNoResults to improve partial matches
  - Use 'prefixAll' queryType for prefix search on all words

This addresses the search issues reported in Slack where queries like
'indexertable' or 'indexertablerefefence' (with or without spaces)
would fail to find the expected 'Indexer Table Reference' page.
@cursor
Copy link

cursor bot commented Jan 25, 2026

Cursor Agent can help with this pull request. Just @cursor in comments and I'll start working on changes in this branch.
Learn more about Cursor Agents

@vercel
Copy link

vercel bot commented Jan 25, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Review Updated (UTC)
aptos-docs Ready Ready Preview, Comment Jan 25, 2026 1:08pm

Request Review

When users search for common short terms like 'CLI', 'SDK', 'API', etc.,
the search now boosts entry-point pages to appear first:

- Overview, introduction, getting-started pages get highest boost (+20)
- Primary/index pages for topics (e.g., /cli/, /sdk/) get moderate boost (+15)
- Pages where the search term appears in the URL path get boost (+10)
- Shallow pages (less nested) get small boost (+5)

This ensures that searching 'CLI' surfaces the CLI overview and setup
pages rather than random pages that just mention the CLI.
Simplified the boosting logic to be more effective:

- +100 points: URL ends with the search term (e.g., /build/cli for 'CLI')
  This strongly prioritizes landing/overview pages
- +50 points: URL has a segment exactly matching the search term
  (e.g., /build/smart-contracts/book/enums for 'enums')
- +20 points: URL contains the search term somewhere
- Depth bonus: shallower pages (fewer path segments) get priority
  - Depth 1-2: +30 points
  - Depth 3: +20 points
  - Depth 4: +10 points

This ensures:
- 'CLI' search → /build/cli landing page appears first
- 'enums' search → /build/smart-contracts/book/enums appears high
- Deeply nested pages don't outrank their parent landing pages
Added console.log statements to see:
1. If transformItems is being called
2. What boost scores are being calculated
3. What the final sorted order looks like

This will help diagnose why the boosting doesn't appear to be working.
Shows:
- Original order from Algolia
- Boost scores for each item
- Final sorted order

Also enables getRankingInfo to see Algolia's ranking details.
Key discovery: DocSearch's transformItems is called once PER ITEM, not for
the entire result set. This means we cannot reorder results there.

New approach - boost at query time using optionalFilters:
- Add optionalFilters based on the search query
- Boost hierarchy.lvl1 and hierarchy.lvl0 that match query terms
- This tells Algolia to rank pages with matching hierarchy higher

For example, searching 'CLI' will add:
- hierarchy.lvl1:cli<score=3>
- hierarchy.lvl0:cli<score=2>

This should boost the CLI landing page above random pages that just
mention CLI in their content.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants