Skip to content

refactor SearchService to optimize candidate note retrieval#33

Merged
PrivateGER merged 1 commit intodevelopfrom
fix/tsvector
Jan 6, 2026
Merged

refactor SearchService to optimize candidate note retrieval#33
PrivateGER merged 1 commit intodevelopfrom
fix/tsvector

Conversation

@PrivateGER
Copy link
Owner

@PrivateGER PrivateGER commented Jan 6, 2026

…ly use indexes

What

Why

Additional info (optional)

Checklist

  • Read the contribution guide
  • Test working in a local environment
  • (If needed) Add story of storybook
  • (If needed) Update CHANGELOG.md
  • (If possible) Add tests

Summary by CodeRabbit

  • Refactor
    • Optimized search functionality to deliver faster and more efficient results through improved database query processing, reducing load times for note searches.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Jan 6, 2026

Walkthrough

The search service refactors note search from a single monolithic query into a two-phase approach: first selecting note IDs via indexed conditions and text filters, then fetching complete note records with visibility, blocking, and muting enforcement applied in the second query.

Changes

Cohort / File(s) Summary
Search Query Refactoring
packages/backend/src/core/SearchService.ts
Restructures searchNoteByLike flow with two-phase query pattern: candidate selection phase (applies full-text filters, user/channel/host/filetype conditions, sorting) followed by full-note retrieval phase (enforces visibility, block, and mute rules); adds early exit for empty candidates.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description follows the required template structure but has all substantive sections (What, Why, Additional info) left completely empty with no explanation of changes. Fill in the What section describing the two-phase query approach, the Why section explaining the optimization rationale, and Additional info with relevant testing considerations.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title directly and clearly describes the main change: refactoring SearchService to optimize candidate note retrieval, which aligns perfectly with the code summary.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
packages/backend/src/core/SearchService.ts (2)

328-334: Consider adding basic visibility filtering to reduce false candidates.

The 5x multiplier accounts for filtering, but if the database has many follower-only or specified-visibility notes, this could still result in underfetching. Adding a simple pre-filter on the candidate query could improve hit rates without adding complex joins:

🔎 Proposed enhancement
 		if (opts.filetype) {
 			candidateQuery.andWhere('note."attachedFileTypes" && :types', { types: fileTypes[opts.filetype] });
 		}
+
+		// Pre-filter to searchable visibility levels to reduce false candidates
+		candidateQuery.andWhere('note.visibility IN (:...visibilities)', { 
+			visibilities: ['public', 'home'] 
+		});

 		// Fetch more candidates than needed since some will likely be filtered by visibility checks
 		const candidateRows = await candidateQuery.limit(pagination.limit * 5).getRawMany();

This is safe because generateVisibilityQuery in the second phase already filters to these visibility levels for non-authenticated or non-follower users, and notes with restricted visibility won't match text search indexes anyway in most configurations.


425-428: Pre-existing: MeiliSearch result ordering doesn't match SQL path behavior.

Not introduced by this PR, but worth noting: the MeiliSearch path hardcodes descending sort (a.id > b.id ? -1 : 1) regardless of pagination direction, while the refactored SQL path correctly uses sortOrder based on sinceId/untilId. Consider aligning this behavior for consistency in a follow-up.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7b4ae91 and b9559c2.

📒 Files selected for processing (1)
  • packages/backend/src/core/SearchService.ts
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-12-19T14:14:22.397Z
Learnt from: CR
Repo: PrivateGER/hydrus-nextbooru PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-19T14:14:22.397Z
Learning: Applies to src/app/api/tags/search/**/*.{ts,tsx} : Implement progressive tag search filtering - only show tags that co-occur with already-selected tags

Applied to files:

  • packages/backend/src/core/SearchService.ts
🔍 Remote MCP

The Context7 library search didn't yield specific documentation for SearchService or Sharkey. However, the web search provided useful technical context. Let me search for more specific information about Sharkey and the PR's implementation approach.

Summary of Relevant Context for PR Review

Project Context

Sharkey is a Misskey fork that follows upstream changes when possible while adding its own features. It is ActivityPub software that can interface with the fediverse—an interconnected social network connecting with other software such as Akkoma, Mastodon, and Pixelfed. Sharkey uses TypeORM as a database tool to make database migrations easier, and the PR targets the SearchService component in the backend.

Database Search Architecture Context

This PR optimizes note search through a two-phase query approach. The refactor is strategically aligned with PostgreSQL full-text search best practices:

Full-Text Search Technologies Referenced:

  • PostgreSQL uses tsvector and tsquery data types for full text searches, and GIN indexes are recommended for full-text search vectors (tsvector)
  • In PostgreSQL, a tsvector is a special data type used to represent text in a form optimized for full-text search. A tsvector is a vector of lexemes (the basic units of text)—it simplifies language for search operations
  • The combination of tsvector and tsquery provides dynamic search capabilities, while the strategic use of GIN indexes on generated tsvector columns significantly enhances search performance

Query Optimization Principles

The PR's two-phase candidate selection pattern aligns with recognized PostgreSQL optimization techniques:

  • PostgreSQL generates various execution plans and the optimizer evaluates these with the intention of choosing the most efficient plan, considering factors like available indexes, join strategies, and data distribution
  • The planner/optimizer's task is to create an optimal execution plan. A given SQL query can be executed in many different ways, and if computationally feasible, the query optimizer will examine each possible execution plan, selecting the one expected to run fastest
  • Reducing the amount of data to be sorted is vital to performance and important to the user experience

Relevant Indexing Considerations

For Sharkey/Akkoma instances, additional indexes can be created on note and user tables to provide much faster search at the cost of additional disk space, with the PostgreSQL pg_trgm extension enabled to support queries using LIKE and ILIKE operators.

Key Review Points for This PR

The refactor decouples candidate selection (using indexed conditions with full-text filters) from full data retrieval (with visibility/blocking/muting enforcement), which reduces query planner complexity by narrowing the candidate set before expensive joins are performed. This approach leverages indexed access patterns identified as performance optimizations in PostgreSQL documentation and aligns with best practices for full-text search implementations using tsvector and GIN indexes.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: semgrep-cloud-platform/scan
🔇 Additional comments (2)
packages/backend/src/core/SearchService.ts (2)

279-308: Candidate query setup looks solid.

The two-phase approach correctly separates ID selection from full data retrieval, allowing PostgreSQL to utilize indexes more effectively. The pagination logic properly mirrors makePaginationQuery behavior, and the provider-specific full-text search filters are appropriate.


336-354: Full notes retrieval with visibility checks is well-structured.

The second query correctly fetches complete note data with all necessary joins and applies comprehensive visibility, blocking, and muting checks. The sort order is preserved via the orderBy clause matching the candidate query's order.

@PrivateGER PrivateGER merged commit 1ca64e4 into develop Jan 6, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant