-
Notifications
You must be signed in to change notification settings - Fork 108
Open
Labels
Description
Currently, searching for Package.name, Canon.name, Canon.url, and URL.url relies on basic database indexing and trigram matching (added in migration 20250508_1752-add_trgm_indexes.py). While the trigram indexes improved search performance, the current approach has limitations in terms of:
- Case sensitivity handling
- Search accuracy
- Performance at scale
- Consistency across different search fields
Current State
- Trigram indexes exist on
urls.urlandcanons.name - Basic database indexes on other searchable fields
- No standardized approach to case handling
- Search operations may require case-sensitive matches
Proposed Enhancement
Add dedicated search columns for key searchable fields:
-- Example schema changes
ALTER TABLE packages ADD COLUMN search_name TEXT GENERATED ALWAYS AS (LOWER(name)) STORED;
ALTER TABLE canons ADD COLUMN search_name TEXT GENERATED ALWAYS AS (LOWER(name)) STORED;
ALTER TABLE canons ADD COLUMN search_url TEXT GENERATED ALWAYS AS (LOWER(url)) STORED;
ALTER TABLE urls ADD COLUMN search_url TEXT GENERATED ALWAYS AS (LOWER(url)) STORED;Benefits
- Consistent case-insensitive search across all fields
- Hopefully Better performance through dedicated indexed columns
- Cleaner search queries in application code
Implementation Areas
Database Changes
- Add new generated columns
- Create appropriate indexes
- Migration script for existing data
Core Changes
- Update search-related queries to use new columns
- Transformers should write to that field as well (or db defaults, gotta figure out which approach is better)
- Bigram indexes might be better for lower case?