Search Architecture

Issue: #192 · Status: Implemented (Phase 1–2, Phase 3 stub) Last updated: 2026-02-26

Overview

Product search is the primary user interaction in TryVit. This document formalizes the search ranking model, multi-language strategy, synonym infrastructure, quality metrics plan, and scale roadmap.

Architecture Layers

┌─────────────────────────────────────────────────────────────┐
│                     SEARCH LAYER                            │
│                                                             │
│  ┌──────────────┐  ┌──────────────┐  ┌─────────────────┐  │
│  │ Query         │  │ Ranking       │  │ Quality          │  │
│  │ Parser        │  │ Engine        │  │ Monitor          │  │
│  │               │  │               │  │                  │  │
│  │ unaccent()    │  │ search_rank() │  │ search_quality   │  │
│  │ Tokenize      │  │ 5-signal      │  │ _report()        │  │
│  │ Synonym       │  │ weighted      │  │ (Phase 3/#190)   │  │
│  │ expand        │  │ composite     │  │                  │  │
│  └──────┬───────┘  └──────┬───────┘  └─────────────────┘  │
│         │                  │                                │
│         ▼                  ▼                                │
│  ┌─────────────────────────────────────────────────────┐   │
│  │   PostgreSQL FTS + Trigram Hybrid                    │   │
│  │   tsvector GIN index + pg_trgm GIN index            │   │
│  │   build_search_vector() per-country tokenization     │   │
│  └─────────────────────────────────────────────────────┘   │
│                                                             │
│  ┌─────────────────────────────────────────────────────┐   │
│  │   PHASE 3 (future): Typesense / Meilisearch         │   │
│  │   Triggered when P95 > 200ms at product count        │   │
│  └─────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘

Ranking Model

5-Signal Composite Score

The search_rank() function computes a weighted composite of five signals:

#	Signal	Weight	Range	Source
1	Full-text ts_rank	0.35	0–1	`ts_rank(search_vector, tsquery)`
2	Trigram similarity	0.30	0–1	`similarity()` across name/brand fields
3	Synonym match	0.15	0–1	`ts_rank()` on expanded synonym terms
4	Category context	0.10	0/0.5/1	Query-category string overlap
5	Data completeness	0.10	0–1	`data_completeness_pct / 100`

Formula:

$$\text{score} = \sum_{i=1}^{5} w_i \cdot s_i$$

Where $w_i$ are weights from search_ranking_config and $s_i$ are signal values.

Configurable Weights

Weights are stored in the search_ranking_config table:

SELECT config_name, weights, active
FROM search_ranking_config;

Only one config may be active at a time (enforced by partial unique index). The new_search_ranking feature flag gates usage of search_rank() vs the legacy inline ranking in api_search_products().

Legacy Ranking (fallback)

When new_search_ranking flag is disabled, the original 3-component additive formula is used:

relevance = ts_rank(vector, query)
           + GREATEST(similarity(name), similarity(name_en), similarity(brand) * 0.8)
           + ts_rank(vector, synonym_query) * 0.9

Multi-Language Strategy

Text Search Configurations

Country	Config	Stemming	Notes
PL	`simple`	None	Upgrade to Polish ispell when ready
DE	`german`	Yes	Built-in PostgreSQL German stemmer
EN/UK	`english`	Yes	Built-in PostgreSQL English stemmer
CZ	`simple`	None	Future: Czech stemmer

The build_search_vector() function selects the appropriate configuration based on the product's country field. The trigger fires on INSERT/UPDATE of product_name, product_name_en, brand, category, or country.

Synonym Dictionary

Cross-language synonyms are stored in search_synonyms:

Direction	Pairs	Example
PL ↔ EN	~50	mleko ↔ milk
DE ↔ EN	~50	milch ↔ milk

The expand_search_query() function returns all synonym terms for a query, enabling cross-language search without the user needing to know both languages.

Adding Synonyms

INSERT INTO search_synonyms (term_original, term_target, language_from, language_to)
VALUES
    ('new_term_pl', 'new_term_en', 'pl', 'en'),
    ('new_term_en', 'new_term_pl', 'en', 'pl');

Always add both directions for bidirectional lookup.

Search Infrastructure

Database Objects

Object	Type	Purpose
`products.search_vector`	Column	Pre-computed tsvector (GIN indexed)
`build_search_vector()`	Function	Language-aware tsvector builder
`search_rank()`	Function	5-signal formalized ranking
`search_ranking_config`	Table	Configurable ranking weights
`search_synonyms`	Table	Cross-language synonym dictionary
`expand_search_query()`	Function	Synonym expansion
`api_search_products()`	RPC	Main search endpoint
`api_search_autocomplete()`	RPC	Prefix autocomplete
`api_search_did_you_mean()`	RPC	Fuzzy suggestions for zero-result queries
`search_quality_report()`	RPC (stub)	Quality dashboard (pending #190)
`trg_products_search_vector()`	Trigger	Auto-updates search_vector on product changes

Indexes

Index	Type	Table	Purpose
`idx_products_search_vector`	GIN	products	Full-text search
`idx_search_synonyms_lookup`	btree	search_synonyms	Fast synonym lookup
`idx_search_ranking_config_single_active`	unique (partial)	search_ranking_config	Enforces single active config

Feature Flags

Flag	Type	Default	Purpose
`new_search_ranking`	boolean	disabled	Gates `search_rank()` in search RPC
`new_search_ui`	percentage	disabled	Gates frontend search UI changes

Frontend Components

Component	File	Purpose
SearchAutocomplete	`components/search/SearchAutocomplete`	Debounced prefix autocomplete
FilterPanel	`components/search/FilterPanel`	Multi-facet filter sidebar
ActiveFilterChips	`components/search/ActiveFilterChips`	Active filter chip bar
DidYouMean	`components/search/DidYouMean`	Zero-result fuzzy suggestions
SaveSearchDialog	`components/search/SaveSearchDialog`	Save search query + filters

Zod Contracts

Contract	Validates
`SearchProductsContract`	`api_search_products` response
`SearchAutocompleteContract`	`api_search_autocomplete` response
`FilterOptionsContract`	`api_get_filter_options` response
`SavedSearchesContract`	`api_get_saved_searches` response
`SearchQualityReportContract`	`search_quality_report` response

Performance Targets

Operation	Target P95	Current (2.5K)	Approach
Full search	< 150ms	~50ms	tsvector + trigram hybrid
Autocomplete	< 50ms	~20ms	Prefix index + LIMIT
Synonym expansion	< 5ms	< 2ms	Indexed lookup

Scale Tipping Points

Scale	Strategy	Trigger
< 10K	PostgreSQL native (current)	Now
10K–50K	PostgreSQL + read replica	Search P95 > 100ms
50K–200K	Dedicated search engine (Typesense)	Search P95 > 200ms
> 200K	Search engine + ML re-ranking	CTR < 25%

Quality Metrics (Phase 3 — requires #190)

The search_quality_report() function is deployed as a stub. When Event Analytics (#190) provides the analytics_events table, it will calculate:

Metric	Description	Alert Threshold
Zero-result rate	% of queries returning 0 results	> 15%
Click-through rate	% of searches where user clicks	< 30%
Mean Reciprocal Rank	Average 1/position of first click	< 0.3
Avg results per query	Mean result count	< 3

Security

Concern	Mitigation
Query injection	`plainto_tsquery()` / parameterized queries
Search abuse (DoS)	Rate limiting (#182) on search endpoint
Data leakage	RLS enforced; country-scoped results only
Autocomplete harvesting	Rate limited per IP
Synonym manipulation	`search_synonyms` writable by service_role only
Ranking config tampering	`search_ranking_config` writable by service_role

A/B Testing Ranking Models

Create a new config in search_ranking_config with different weights
Set active = false on the new config
Enable new_search_ranking feature flag with percentage rollout
Activate the new config: UPDATE search_ranking_config SET active = true WHERE config_name = 'experiment_v1'
Monitor quality metrics via search_quality_report() (when #190 is live)
Either graduate or roll back based on CTR/MRR delta

Adding Search Support for a New Country

Add language to language_ref if not present
Add synonym pairs to search_synonyms (both directions)
Update build_search_vector() CASE statement if a new TSC is needed
Update expand_search_query() if language-specific logic is needed
Backfill search_vector for existing products: UPDATE products SET search_vector = build_search_vector(...) WHERE country = 'XX'
Verify with QA tests

Dependencies

Dependency	Status	Impact
#185 Performance	✅ Done	Query monitoring for search RPCs
#189 Scoring Engine	✅ Done	`data_completeness_pct` signal
#190 Event Analytics	❌ Blocked	Search quality metrics (CTR, MRR)
#191 Feature Flags	✅ Done	A/B testing ranking weights
#182 Rate Limiting	❌ Open	Search endpoint protection

QA Coverage

23 SQL-based QA tests in db/qa/QA__search_architecture.sql:

T01–T03: Ranking config integrity (active config, weight sum, key presence)
T04–T06: build_search_vector() correctness (normal, NULL, DE)
T07–T10: search_rank() behavior (positive return, exact > partial, category boost, completeness boost)
T11–T13: German synonym pairs + expansion
T14–T15: Feature flag existence and default state
T16–T17: Quality report stub structure
T18: api_search_products() structural validity
T19: Partial unique index enforcement
T20: All products have search_vector populated
T21–T22: Security posture (anon revocation)
T23: Trigger existence

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Search Architecture

Overview

Architecture Layers

Ranking Model

5-Signal Composite Score

Configurable Weights

Legacy Ranking (fallback)

Multi-Language Strategy

Text Search Configurations

Synonym Dictionary

Adding Synonyms

Search Infrastructure

Database Objects

Indexes

Feature Flags

Frontend Components

Zod Contracts

Performance Targets

Scale Tipping Points

Quality Metrics (Phase 3 — requires #190)

Security

A/B Testing Ranking Models

Adding Search Support for a New Country

Dependencies

QA Coverage

FilesExpand file tree

SEARCH_ARCHITECTURE.md

Latest commit

History

SEARCH_ARCHITECTURE.md

File metadata and controls

Search Architecture

Overview

Architecture Layers

Ranking Model

5-Signal Composite Score

Configurable Weights

Legacy Ranking (fallback)

Multi-Language Strategy

Text Search Configurations

Synonym Dictionary

Adding Synonyms

Search Infrastructure

Database Objects

Indexes

Feature Flags

Frontend Components

Zod Contracts

Performance Targets

Scale Tipping Points

Quality Metrics (Phase 3 — requires #190)

Security

A/B Testing Ranking Models

Adding Search Support for a New Country

Dependencies

QA Coverage