Date: 2026-02-16 Database: Supabase PostgreSQL with pgvector Scale: 58,000+ protocol chunks, 2,738 agencies Critical Path: Vector search latency (target: <200ms)
Key Findings:
- ✅ HNSW vector index properly configured (m=16, ef_construction=64)
- ✅ Comprehensive B-tree indexes for jurisdiction filters
- ✅ Connection pooling implemented (20 connections production)
- ✅ Query optimization with multi-query fusion and re-ranking
⚠️ No query runtime HNSW tuning (ef_search not configured)⚠️ No explicit VACUUM configuration⚠️ Limited database monitoring/instrumentation
Performance Target: 2-second end-to-end latency Current Performance: 1.5-2.5 seconds (variable)
Current Setup (Excellent):
-- File: drizzle/migrations/0030_optimize_manus_protocol_chunks.sql
CREATE INDEX idx_manus_chunks_embedding_hnsw
ON manus_protocol_chunks
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);Analysis:
- ✅ Index Type: HNSW (optimal for 58k vectors)
- ✅ Parameters:
m=16(good graph connectivity),ef_construction=64(good build quality) - ✅ Distance Metric: Cosine similarity (matches Voyage AI embeddings)
- ✅ Dimensions: 1536 (Voyage voyage-large-2)
Index Size Estimation: ~200MB (as documented in DATABASE-REVIEW-SUMMARY.md)
Issue: No explicit ef_search configuration
Current State:
-- File: supabase/migrations/20260127_fix_state_filter.sql (line 59)
ORDER BY pc.embedding <=> query_embedding
LIMIT match_count;Problem: PostgreSQL uses default ef_search=40 which may be suboptimal for accuracy/speed tradeoff.
Recommendation:
-- Add runtime parameter tuning based on tier
SET LOCAL hnsw.ef_search = 100; -- For high-accuracy (Pro tier)
SET LOCAL hnsw.ef_search = 64; -- For standard searches (Free tier)Expected Impact:
- Higher
ef_search= better recall (+2-5% accuracy) but slower (+20-40ms) - Lower
ef_search= faster (-10-20ms) but lower recall - Tier-based tuning allows quality/speed tradeoff
RPC Function: search_manus_protocols
Current Implementation:
-- File: supabase/migrations/20260127_fix_state_filter.sql
SELECT
pc.id,
pc.agency_id,
pc.protocol_number,
pc.protocol_title,
pc.section,
pc.content,
pc.image_urls,
1 - (pc.embedding <=> query_embedding) AS similarity,
pc.state_code
FROM manus_protocol_chunks pc
WHERE
pc.embedding IS NOT NULL
AND (agency_filter IS NULL OR pc.agency_id = agency_filter)
AND (state_code_filter IS NULL OR pc.state_code = state_code_filter)
AND (agency_name_filter IS NULL OR pc.agency_name ILIKE '%' || agency_name_filter || '%')
AND 1 - (pc.embedding <=> query_embedding) > match_threshold
ORDER BY pc.embedding <=> query_embedding
LIMIT match_count;Analysis:
- ✅ Filters applied before vector search (optimal)
- ✅ Uses HNSW index via
<=>operator - ✅ Threshold filtering (0.3-0.45 depending on query type)
⚠️ ILIKE filter onagency_namemay bypass index (minor)
Performance Characteristics:
- Best Case (agency_id filter): ~50-100ms (small result set)
- Worst Case (state-wide): ~150-250ms (larger result set)
- Average: ~150ms (as documented in RAG_OPTIMIZATION.md)
Vector Index:
idx_manus_chunks_embedding_hnsw (HNSW) -- PRIMARY SEARCH INDEXB-tree Indexes:
idx_manus_chunks_agency_id -- Agency-scoped search
idx_manus_chunks_state_code -- State-wide search
idx_manus_chunks_protocol_number -- Protocol lookupComposite Indexes:
idx_manus_chunks_state_agency -- State + Agency filtering
idx_manus_chunks_agency_protocol -- Agency protocol lookup
idx_manus_chunks_state_protocol -- State protocol lookupFull-Text Search:
idx_manus_chunks_search_vector (GIN) -- BM25 fallbackPartial Indexes:
idx_manus_chunks_with_embedding -- Only chunks WITH embeddings
idx_manus_chunks_with_images -- Image-enabled chunksAnalysis:
- ✅ Coverage: Excellent - all common query patterns indexed
- ✅ Composite Indexes: Cover multi-column WHERE clauses
- ✅ Partial Indexes: Reduce index size for common filters
- ✅ Full-Text: GIN index provides fast BM25 fallback
Critical Query Patterns:
| Pattern | Index Used | Coverage |
|---|---|---|
| Vector search (agency-scoped) | HNSW + agency_id | ✅ Perfect |
| Vector search (state-scoped) | HNSW + state_code | ✅ Perfect |
| Protocol number lookup | protocol_number B-tree | ✅ Perfect |
| State + Agency filter | state_agency composite | ✅ Perfect |
| BM25 keyword fallback | search_vector GIN | ✅ Perfect |
| County → Agency mapping | county_agency_mapping PK | ✅ Perfect |
No Missing Indexes Identified
Current State:
- No explicit VACUUM schedule configured
- No explicit ANALYZE schedule configured
- Relies on PostgreSQL autovacuum
Recommendation:
-- Add explicit VACUUM schedule for manus_protocol_chunks
-- (Large table with frequent reads, infrequent writes)
-- Nightly VACUUM ANALYZE during off-peak
-- Reduces index bloat and keeps query planner statistics fresh
ALTER TABLE manus_protocol_chunks
SET (autovacuum_vacuum_scale_factor = 0.05); -- More aggressive vacuuming
-- Force ANALYZE after bulk ingestion
ANALYZE manus_protocol_chunks;Expected Impact:
- Prevents index bloat over time
- Maintains optimal query planning
- Minimal overhead (table is mostly read-only)
File: server/db/connection.ts
const POOL_CONFIG = {
development: {
max: 10,
idleTimeoutMillis: 30000,
connectionTimeoutMillis: 10000,
},
production: {
max: 20,
idleTimeoutMillis: 45000,
connectionTimeoutMillis: 10000,
},
};Analysis:
- ✅ Environment-specific configuration
- ✅ Connection validation on pool creation
- ✅ Idle timeout prevents connection leaks
- ✅ Event handlers for monitoring
Pool Size Justification:
- 20 connections for production is reasonable for moderate traffic
- Supabase default limit: 60 connections (direct), 200 (pooler)
- Current usage: Well within limits
Configuration: supabase/config.toml
default_pool_size = 20Analysis:
- ✅ Uses Supabase's PgBouncer pooler
- ✅ Transaction-level pooling mode (default)
- ✅ Appropriate pool size for workload
No Issues Identified
Current State: Already optimized
Future Considerations:
- Monitor connection utilization under load
- Consider increasing pool size if P95 latency spikes during traffic peaks
- Add connection pool exhaustion alerts
Target Breakdown:
Query normalization: 10ms
Embedding generation: 300ms (Voyage AI)
Vector search: 200ms (pgvector HNSW)
Re-ranking: 50ms
LLM inference: 1500ms (Claude Haiku/Sonnet)
─────────────────────────────
Total Target: 2060ms
Current Performance (from RAG_OPTIMIZATION.md):
Total: 1500-2500ms (variable)
- On target for cached queries
- Exceeds target for complex queries with multi-query fusion
Query Normalization:
- ✅ 150+ EMS abbreviation expansion
- ✅ Typo correction
- ✅ Intent classification
- ✅ Complexity detection
Search Execution:
- ✅ Multi-query fusion (for complex queries)
- ✅ Reciprocal Rank Fusion (RRF)
- ✅ Adaptive similarity thresholds (0.25-0.45)
- ✅ Keyword + semantic hybrid search
Re-ranking:
- ✅ Advanced scoring (term frequency, position, exact matches)
- ✅ Context boosting (+15 same agency, +5 same state)
- ✅ Intent-specific signals (dosage for meds, steps for procedures)
Caching:
- ✅ Query result cache (1-hour TTL)
- ✅ Embedding cache (24-hour TTL)
- ✅ Redis-backed rate limiting
Timeout Protection:
// File: server/_core/rag/timeout.ts
const TIMEOUT_CONFIG = {
vectorSearch: 3000, // 3 seconds
totalPipeline: 2000, // 2 seconds
multiQueryVariation: 800, // 800ms per variation
};Analysis:
- ✅ Prevents hanging queries
- ✅ Graceful degradation on timeout
- ✅ Budget tracking for pipeline stages
EXPLAIN ANALYZE Recommendation:
Add periodic query plan analysis:
-- Run this monthly to verify HNSW index is being used
EXPLAIN (ANALYZE, BUFFERS)
SELECT
id,
1 - (embedding <=> $1::vector) AS similarity
FROM manus_protocol_chunks
WHERE
agency_id = 123
AND embedding IS NOT NULL
ORDER BY embedding <=> $1::vector
LIMIT 10;
-- Expected plan:
-- Index Scan using idx_manus_chunks_embedding_hnsw
-- Buffers: shared hit=XLatency Breakdown (Average Query):
| Stage | Current | Target | Status |
|---|---|---|---|
| Query normalization | <10ms | 10ms | ✅ On target |
| Embedding generation | ~300ms | 300ms | ✅ On target |
| Vector search (HNSW) | ~150ms | 200ms | ✅ On target |
| Re-ranking | ~50ms | 50ms | ✅ On target |
| LLM inference (Haiku) | 800-1500ms | 1500ms | ✅ Variable |
| Total | 1310-2010ms | 2060ms | ✅ Mostly on target |
Outliers:
- Complex queries with multi-query fusion: +200-400ms (3 parallel searches)
- State-wide searches (no agency filter): +50-100ms (larger result set)
- Cold cache / first query: +100-200ms (no cached results)
Impact: Medium (±30ms, +2-5% accuracy) Effort: Low (1 hour)
-- Add to search_manus_protocols function
CREATE OR REPLACE FUNCTION search_manus_protocols_tuned(
query_embedding vector(1536),
agency_filter integer DEFAULT NULL,
state_code_filter char(2) DEFAULT NULL,
match_count integer DEFAULT 10,
match_threshold float DEFAULT 0.3,
ef_search_param integer DEFAULT 64 -- NEW PARAMETER
)
RETURNS TABLE (...) AS $$
BEGIN
-- Set HNSW search parameter for this query
PERFORM set_config('hnsw.ef_search', ef_search_param::text, true);
RETURN QUERY
SELECT ... -- existing query
END;
$$ LANGUAGE plpgsql;Call from application:
// High-accuracy search (Pro tier)
await supabase.rpc('search_manus_protocols_tuned', {
query_embedding: embedding,
agency_filter: agencyId,
ef_search_param: 100, // Higher recall
});
// Standard search (Free tier)
await supabase.rpc('search_manus_protocols_tuned', {
query_embedding: embedding,
agency_filter: agencyId,
ef_search_param: 64, // Balanced
});Impact: High (observability) Effort: Medium (4-8 hours)
Add instrumentation for:
-
Query Plan Capture:
-- Create function to log slow queries CREATE TABLE IF NOT EXISTS slow_query_log ( id SERIAL PRIMARY KEY, query_text TEXT, execution_time_ms INTEGER, plan JSONB, created_at TIMESTAMPTZ DEFAULT NOW() );
-
Index Usage Statistics:
-- Monitor index usage SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read, idx_tup_fetch FROM pg_stat_user_indexes WHERE schemaname = 'public' AND tablename = 'manus_protocol_chunks' ORDER BY idx_scan DESC;
-
Connection Pool Metrics:
// Add to server/db/connection.ts export function getPoolStats() { return { totalCount: _pool?.totalCount || 0, idleCount: _pool?.idleCount || 0, waitingCount: _pool?.waitingCount || 0, }; }
-
HNSW Index Health:
-- Check index bloat SELECT pg_size_pretty(pg_relation_size('idx_manus_chunks_embedding_hnsw')) AS index_size, pg_size_pretty(pg_table_size('manus_protocol_chunks')) AS table_size;
Impact: Medium (long-term stability) Effort: Low (30 minutes)
-- Optimize autovacuum for manus_protocol_chunks
ALTER TABLE manus_protocol_chunks
SET (
autovacuum_vacuum_scale_factor = 0.05, -- More aggressive (default: 0.2)
autovacuum_analyze_scale_factor = 0.05, -- Keep stats fresh
autovacuum_vacuum_cost_delay = 10 -- Less aggressive pause (faster)
);
-- Schedule manual ANALYZE after bulk ingestion
-- (Add to ingestion scripts)
ANALYZE manus_protocol_chunks;Impact: Low-Medium (+100-200ms for cache misses) Effort: Medium (2-4 hours)
Pre-warm cache for common queries:
// server/_core/rag/cache-warmer.ts
const COMMON_QUERIES = [
"cardiac arrest",
"stroke",
"anaphylaxis",
"pediatric fever",
// ... top 50 queries from analytics
];
export async function warmQueryCache() {
for (const query of COMMON_QUERIES) {
await optimizedSearch({ query, limit: 10 }, searchFn);
}
}
// Call during server startup- Chunks: 58,000+ (manageable)
- Agencies: 2,738 (small)
- Users: Unknown (estimate <10k based on startup)
- Queries/day: Unknown
No Issues Until:
- 500,000+ chunks (HNSW still efficient)
- 10,000+ concurrent users (connection pool needs increase)
- 1M+ queries/day (consider read replicas)
Current Architecture: Can scale to 10x current load without changes
If growth exceeds 500k chunks:
- Partitioning: Partition by state_code (53 partitions)
- Sharding: Shard by agency_id for multi-region deployments
- Read Replicas: Route searches to replicas, writes to primary
If query latency degrades:
- Quantized Embeddings: Reduce from 1536 to 768 dimensions (2x faster, -3% accuracy)
- GPU Acceleration: Use pgvector with GPU support
- Dedicated Vector DB: Consider Pinecone/Weaviate for ultra-low latency
Current State: No scaling issues anticipated in next 12-24 months
Strengths:
- ✅ Properly configured HNSW index
- ✅ Comprehensive index coverage
- ✅ Connection pooling implemented
- ✅ Sophisticated RAG pipeline with caching
- ✅ Timeout protection and graceful degradation
Recommended Optimizations (Ranked by ROI):
| Priority | Item | Impact | Effort | Timeline |
|---|---|---|---|---|
| 1 | HNSW ef_search tuning | Medium | Low | 1 hour |
| 2 | Database monitoring | High | Medium | 4-8 hours |
| 3 | VACUUM configuration | Medium | Low | 30 min |
| 4 | Query cache warming | Low-Med | Medium | 2-4 hours |
Immediate (This Week):
- Add
ef_searchparameter to search RPC function - Configure tier-based HNSW tuning (Pro: 100, Free: 64)
- Add autovacuum tuning for manus_protocol_chunks
Short-term (This Month): 4. Implement slow query logging 5. Add connection pool metrics endpoint 6. Set up monthly EXPLAIN ANALYZE for critical queries
Long-term (This Quarter): 7. Build query cache warming system 8. Implement index health monitoring 9. Add performance regression tests
The database is well-optimized for current scale. Recommended improvements are quality-of-life enhancements rather than critical fixes.
Schema & Indexes:
/Users/tanner-osterkamp/Protocol-Guide/drizzle/schema.ts(lines 481-508, 547-560)/Users/tanner-osterkamp/Protocol-Guide/drizzle/migrations/0030_optimize_manus_protocol_chunks.sql
Vector Search:
/Users/tanner-osterkamp/Protocol-Guide/server/_core/embeddings/search.ts/Users/tanner-osterkamp/Protocol-Guide/supabase/migrations/20260127_fix_state_filter.sql
Connection Pooling:
/Users/tanner-osterkamp/Protocol-Guide/server/db/connection.ts
RAG Pipeline:
/Users/tanner-osterkamp/Protocol-Guide/server/_core/rag/search-execution.ts/Users/tanner-osterkamp/Protocol-Guide/server/_core/rag/config.ts/Users/tanner-osterkamp/Protocol-Guide/server/routers/query.ts
Documentation:
/Users/tanner-osterkamp/Protocol-Guide/docs/RAG_OPTIMIZATION.md/Users/tanner-osterkamp/Protocol-Guide/docs/DATABASE-REVIEW-SUMMARY.md
Audit Completed By: Database Performance Optimization Agent Date: 2026-02-16 Next Review: 2026-05-16 (Quarterly)