Skip to content

Latest commit

 

History

History
603 lines (477 loc) · 16.7 KB

File metadata and controls

603 lines (477 loc) · 16.7 KB

Protocol Guide Database Performance Audit

Date: 2026-02-16 Database: Supabase PostgreSQL with pgvector Scale: 58,000+ protocol chunks, 2,738 agencies Critical Path: Vector search latency (target: <200ms)


Executive Summary

Current Status: GOOD (Minor Optimizations Recommended)

Key Findings:

  • ✅ HNSW vector index properly configured (m=16, ef_construction=64)
  • ✅ Comprehensive B-tree indexes for jurisdiction filters
  • ✅ Connection pooling implemented (20 connections production)
  • ✅ Query optimization with multi-query fusion and re-ranking
  • ⚠️ No query runtime HNSW tuning (ef_search not configured)
  • ⚠️ No explicit VACUUM configuration
  • ⚠️ Limited database monitoring/instrumentation

Performance Target: 2-second end-to-end latency Current Performance: 1.5-2.5 seconds (variable)


1. Vector Search Analysis

1.1 HNSW Index Configuration

Current Setup (Excellent):

-- File: drizzle/migrations/0030_optimize_manus_protocol_chunks.sql
CREATE INDEX idx_manus_chunks_embedding_hnsw
ON manus_protocol_chunks
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

Analysis:

  • Index Type: HNSW (optimal for 58k vectors)
  • Parameters: m=16 (good graph connectivity), ef_construction=64 (good build quality)
  • Distance Metric: Cosine similarity (matches Voyage AI embeddings)
  • Dimensions: 1536 (Voyage voyage-large-2)

Index Size Estimation: ~200MB (as documented in DATABASE-REVIEW-SUMMARY.md)

1.2 Query-Time HNSW Parameters

Issue: No explicit ef_search configuration

Current State:

-- File: supabase/migrations/20260127_fix_state_filter.sql (line 59)
ORDER BY pc.embedding <=> query_embedding
LIMIT match_count;

Problem: PostgreSQL uses default ef_search=40 which may be suboptimal for accuracy/speed tradeoff.

Recommendation:

-- Add runtime parameter tuning based on tier
SET LOCAL hnsw.ef_search = 100; -- For high-accuracy (Pro tier)
SET LOCAL hnsw.ef_search = 64;  -- For standard searches (Free tier)

Expected Impact:

  • Higher ef_search = better recall (+2-5% accuracy) but slower (+20-40ms)
  • Lower ef_search = faster (-10-20ms) but lower recall
  • Tier-based tuning allows quality/speed tradeoff

1.3 Vector Search Query Pattern

RPC Function: search_manus_protocols

Current Implementation:

-- File: supabase/migrations/20260127_fix_state_filter.sql
SELECT
  pc.id,
  pc.agency_id,
  pc.protocol_number,
  pc.protocol_title,
  pc.section,
  pc.content,
  pc.image_urls,
  1 - (pc.embedding <=> query_embedding) AS similarity,
  pc.state_code
FROM manus_protocol_chunks pc
WHERE
  pc.embedding IS NOT NULL
  AND (agency_filter IS NULL OR pc.agency_id = agency_filter)
  AND (state_code_filter IS NULL OR pc.state_code = state_code_filter)
  AND (agency_name_filter IS NULL OR pc.agency_name ILIKE '%' || agency_name_filter || '%')
  AND 1 - (pc.embedding <=> query_embedding) > match_threshold
ORDER BY pc.embedding <=> query_embedding
LIMIT match_count;

Analysis:

  • ✅ Filters applied before vector search (optimal)
  • ✅ Uses HNSW index via <=> operator
  • ✅ Threshold filtering (0.3-0.45 depending on query type)
  • ⚠️ ILIKE filter on agency_name may bypass index (minor)

Performance Characteristics:

  • Best Case (agency_id filter): ~50-100ms (small result set)
  • Worst Case (state-wide): ~150-250ms (larger result set)
  • Average: ~150ms (as documented in RAG_OPTIMIZATION.md)

2. Index Coverage Analysis

2.1 Primary Indexes (Excellent Coverage)

Vector Index:

idx_manus_chunks_embedding_hnsw (HNSW) -- PRIMARY SEARCH INDEX

B-tree Indexes:

idx_manus_chunks_agency_id      -- Agency-scoped search
idx_manus_chunks_state_code     -- State-wide search
idx_manus_chunks_protocol_number -- Protocol lookup

Composite Indexes:

idx_manus_chunks_state_agency   -- State + Agency filtering
idx_manus_chunks_agency_protocol -- Agency protocol lookup
idx_manus_chunks_state_protocol  -- State protocol lookup

Full-Text Search:

idx_manus_chunks_search_vector (GIN) -- BM25 fallback

Partial Indexes:

idx_manus_chunks_with_embedding  -- Only chunks WITH embeddings
idx_manus_chunks_with_images     -- Image-enabled chunks

Analysis:

  • Coverage: Excellent - all common query patterns indexed
  • Composite Indexes: Cover multi-column WHERE clauses
  • Partial Indexes: Reduce index size for common filters
  • Full-Text: GIN index provides fast BM25 fallback

2.2 Query Pattern Coverage

Critical Query Patterns:

Pattern Index Used Coverage
Vector search (agency-scoped) HNSW + agency_id ✅ Perfect
Vector search (state-scoped) HNSW + state_code ✅ Perfect
Protocol number lookup protocol_number B-tree ✅ Perfect
State + Agency filter state_agency composite ✅ Perfect
BM25 keyword fallback search_vector GIN ✅ Perfect
County → Agency mapping county_agency_mapping PK ✅ Perfect

No Missing Indexes Identified

2.3 Index Maintenance

Current State:

  • No explicit VACUUM schedule configured
  • No explicit ANALYZE schedule configured
  • Relies on PostgreSQL autovacuum

Recommendation:

-- Add explicit VACUUM schedule for manus_protocol_chunks
-- (Large table with frequent reads, infrequent writes)

-- Nightly VACUUM ANALYZE during off-peak
-- Reduces index bloat and keeps query planner statistics fresh
ALTER TABLE manus_protocol_chunks
SET (autovacuum_vacuum_scale_factor = 0.05); -- More aggressive vacuuming

-- Force ANALYZE after bulk ingestion
ANALYZE manus_protocol_chunks;

Expected Impact:

  • Prevents index bloat over time
  • Maintains optimal query planning
  • Minimal overhead (table is mostly read-only)

3. Connection Pooling

3.1 Current Configuration (Good)

File: server/db/connection.ts

const POOL_CONFIG = {
  development: {
    max: 10,
    idleTimeoutMillis: 30000,
    connectionTimeoutMillis: 10000,
  },
  production: {
    max: 20,
    idleTimeoutMillis: 45000,
    connectionTimeoutMillis: 10000,
  },
};

Analysis:

  • ✅ Environment-specific configuration
  • ✅ Connection validation on pool creation
  • ✅ Idle timeout prevents connection leaks
  • ✅ Event handlers for monitoring

Pool Size Justification:

  • 20 connections for production is reasonable for moderate traffic
  • Supabase default limit: 60 connections (direct), 200 (pooler)
  • Current usage: Well within limits

3.2 Supabase Pooler

Configuration: supabase/config.toml

default_pool_size = 20

Analysis:

  • ✅ Uses Supabase's PgBouncer pooler
  • ✅ Transaction-level pooling mode (default)
  • ✅ Appropriate pool size for workload

No Issues Identified

3.3 Connection Pooling Recommendations

Current State: Already optimized

Future Considerations:

  • Monitor connection utilization under load
  • Consider increasing pool size if P95 latency spikes during traffic peaks
  • Add connection pool exhaustion alerts

4. Query Optimization

4.1 RAG Pipeline Performance

Target Breakdown:

Query normalization:    10ms
Embedding generation:   300ms (Voyage AI)
Vector search:          200ms (pgvector HNSW)
Re-ranking:             50ms
LLM inference:          1500ms (Claude Haiku/Sonnet)
─────────────────────────────
Total Target:           2060ms

Current Performance (from RAG_OPTIMIZATION.md):

Total: 1500-2500ms (variable)
- On target for cached queries
- Exceeds target for complex queries with multi-query fusion

4.2 Optimizations Implemented (Excellent)

Query Normalization:

  • ✅ 150+ EMS abbreviation expansion
  • ✅ Typo correction
  • ✅ Intent classification
  • ✅ Complexity detection

Search Execution:

  • ✅ Multi-query fusion (for complex queries)
  • ✅ Reciprocal Rank Fusion (RRF)
  • ✅ Adaptive similarity thresholds (0.25-0.45)
  • ✅ Keyword + semantic hybrid search

Re-ranking:

  • ✅ Advanced scoring (term frequency, position, exact matches)
  • ✅ Context boosting (+15 same agency, +5 same state)
  • ✅ Intent-specific signals (dosage for meds, steps for procedures)

Caching:

  • ✅ Query result cache (1-hour TTL)
  • ✅ Embedding cache (24-hour TTL)
  • ✅ Redis-backed rate limiting

4.3 Database-Level Query Optimization

Timeout Protection:

// File: server/_core/rag/timeout.ts
const TIMEOUT_CONFIG = {
  vectorSearch: 3000,        // 3 seconds
  totalPipeline: 2000,       // 2 seconds
  multiQueryVariation: 800,  // 800ms per variation
};

Analysis:

  • ✅ Prevents hanging queries
  • ✅ Graceful degradation on timeout
  • ✅ Budget tracking for pipeline stages

EXPLAIN ANALYZE Recommendation:

Add periodic query plan analysis:

-- Run this monthly to verify HNSW index is being used
EXPLAIN (ANALYZE, BUFFERS)
SELECT
  id,
  1 - (embedding <=> $1::vector) AS similarity
FROM manus_protocol_chunks
WHERE
  agency_id = 123
  AND embedding IS NOT NULL
ORDER BY embedding <=> $1::vector
LIMIT 10;

-- Expected plan:
-- Index Scan using idx_manus_chunks_embedding_hnsw
-- Buffers: shared hit=X

5. Performance Bottlenecks & Priorities

5.1 Critical Path Analysis

Latency Breakdown (Average Query):

Stage Current Target Status
Query normalization <10ms 10ms ✅ On target
Embedding generation ~300ms 300ms ✅ On target
Vector search (HNSW) ~150ms 200ms ✅ On target
Re-ranking ~50ms 50ms ✅ On target
LLM inference (Haiku) 800-1500ms 1500ms ✅ Variable
Total 1310-2010ms 2060ms Mostly on target

Outliers:

  • Complex queries with multi-query fusion: +200-400ms (3 parallel searches)
  • State-wide searches (no agency filter): +50-100ms (larger result set)
  • Cold cache / first query: +100-200ms (no cached results)

5.2 Priority Recommendations

Priority 1: HNSW Query-Time Tuning

Impact: Medium (±30ms, +2-5% accuracy) Effort: Low (1 hour)

-- Add to search_manus_protocols function
CREATE OR REPLACE FUNCTION search_manus_protocols_tuned(
  query_embedding vector(1536),
  agency_filter integer DEFAULT NULL,
  state_code_filter char(2) DEFAULT NULL,
  match_count integer DEFAULT 10,
  match_threshold float DEFAULT 0.3,
  ef_search_param integer DEFAULT 64 -- NEW PARAMETER
)
RETURNS TABLE (...) AS $$
BEGIN
  -- Set HNSW search parameter for this query
  PERFORM set_config('hnsw.ef_search', ef_search_param::text, true);

  RETURN QUERY
  SELECT ... -- existing query
END;
$$ LANGUAGE plpgsql;

Call from application:

// High-accuracy search (Pro tier)
await supabase.rpc('search_manus_protocols_tuned', {
  query_embedding: embedding,
  agency_filter: agencyId,
  ef_search_param: 100, // Higher recall
});

// Standard search (Free tier)
await supabase.rpc('search_manus_protocols_tuned', {
  query_embedding: embedding,
  agency_filter: agencyId,
  ef_search_param: 64, // Balanced
});

Priority 2: Add Database Performance Monitoring

Impact: High (observability) Effort: Medium (4-8 hours)

Add instrumentation for:

  1. Query Plan Capture:

    -- Create function to log slow queries
    CREATE TABLE IF NOT EXISTS slow_query_log (
      id SERIAL PRIMARY KEY,
      query_text TEXT,
      execution_time_ms INTEGER,
      plan JSONB,
      created_at TIMESTAMPTZ DEFAULT NOW()
    );
  2. Index Usage Statistics:

    -- Monitor index usage
    SELECT
      schemaname,
      tablename,
      indexname,
      idx_scan,
      idx_tup_read,
      idx_tup_fetch
    FROM pg_stat_user_indexes
    WHERE schemaname = 'public'
      AND tablename = 'manus_protocol_chunks'
    ORDER BY idx_scan DESC;
  3. Connection Pool Metrics:

    // Add to server/db/connection.ts
    export function getPoolStats() {
      return {
        totalCount: _pool?.totalCount || 0,
        idleCount: _pool?.idleCount || 0,
        waitingCount: _pool?.waitingCount || 0,
      };
    }
  4. HNSW Index Health:

    -- Check index bloat
    SELECT
      pg_size_pretty(pg_relation_size('idx_manus_chunks_embedding_hnsw')) AS index_size,
      pg_size_pretty(pg_table_size('manus_protocol_chunks')) AS table_size;

Priority 3: Explicit VACUUM Configuration

Impact: Medium (long-term stability) Effort: Low (30 minutes)

-- Optimize autovacuum for manus_protocol_chunks
ALTER TABLE manus_protocol_chunks
SET (
  autovacuum_vacuum_scale_factor = 0.05,  -- More aggressive (default: 0.2)
  autovacuum_analyze_scale_factor = 0.05, -- Keep stats fresh
  autovacuum_vacuum_cost_delay = 10       -- Less aggressive pause (faster)
);

-- Schedule manual ANALYZE after bulk ingestion
-- (Add to ingestion scripts)
ANALYZE manus_protocol_chunks;

Priority 4: Query Result Cache Warming

Impact: Low-Medium (+100-200ms for cache misses) Effort: Medium (2-4 hours)

Pre-warm cache for common queries:

// server/_core/rag/cache-warmer.ts
const COMMON_QUERIES = [
  "cardiac arrest",
  "stroke",
  "anaphylaxis",
  "pediatric fever",
  // ... top 50 queries from analytics
];

export async function warmQueryCache() {
  for (const query of COMMON_QUERIES) {
    await optimizedSearch({ query, limit: 10 }, searchFn);
  }
}

// Call during server startup

6. Scaling Considerations

6.1 Current Scale

  • Chunks: 58,000+ (manageable)
  • Agencies: 2,738 (small)
  • Users: Unknown (estimate <10k based on startup)
  • Queries/day: Unknown

6.2 Scaling Thresholds

No Issues Until:

  • 500,000+ chunks (HNSW still efficient)
  • 10,000+ concurrent users (connection pool needs increase)
  • 1M+ queries/day (consider read replicas)

Current Architecture: Can scale to 10x current load without changes

6.3 Future Optimization Paths

If growth exceeds 500k chunks:

  1. Partitioning: Partition by state_code (53 partitions)
  2. Sharding: Shard by agency_id for multi-region deployments
  3. Read Replicas: Route searches to replicas, writes to primary

If query latency degrades:

  1. Quantized Embeddings: Reduce from 1536 to 768 dimensions (2x faster, -3% accuracy)
  2. GPU Acceleration: Use pgvector with GPU support
  3. Dedicated Vector DB: Consider Pinecone/Weaviate for ultra-low latency

Current State: No scaling issues anticipated in next 12-24 months


7. Summary & Action Items

Current State: GOOD

Strengths:

  • ✅ Properly configured HNSW index
  • ✅ Comprehensive index coverage
  • ✅ Connection pooling implemented
  • ✅ Sophisticated RAG pipeline with caching
  • ✅ Timeout protection and graceful degradation

Recommended Optimizations (Ranked by ROI):

Priority Item Impact Effort Timeline
1 HNSW ef_search tuning Medium Low 1 hour
2 Database monitoring High Medium 4-8 hours
3 VACUUM configuration Medium Low 30 min
4 Query cache warming Low-Med Medium 2-4 hours

Action Items

Immediate (This Week):

  1. Add ef_search parameter to search RPC function
  2. Configure tier-based HNSW tuning (Pro: 100, Free: 64)
  3. Add autovacuum tuning for manus_protocol_chunks

Short-term (This Month): 4. Implement slow query logging 5. Add connection pool metrics endpoint 6. Set up monthly EXPLAIN ANALYZE for critical queries

Long-term (This Quarter): 7. Build query cache warming system 8. Implement index health monitoring 9. Add performance regression tests

No Critical Issues Identified

The database is well-optimized for current scale. Recommended improvements are quality-of-life enhancements rather than critical fixes.


8. Reference Files

Schema & Indexes:

  • /Users/tanner-osterkamp/Protocol-Guide/drizzle/schema.ts (lines 481-508, 547-560)
  • /Users/tanner-osterkamp/Protocol-Guide/drizzle/migrations/0030_optimize_manus_protocol_chunks.sql

Vector Search:

  • /Users/tanner-osterkamp/Protocol-Guide/server/_core/embeddings/search.ts
  • /Users/tanner-osterkamp/Protocol-Guide/supabase/migrations/20260127_fix_state_filter.sql

Connection Pooling:

  • /Users/tanner-osterkamp/Protocol-Guide/server/db/connection.ts

RAG Pipeline:

  • /Users/tanner-osterkamp/Protocol-Guide/server/_core/rag/search-execution.ts
  • /Users/tanner-osterkamp/Protocol-Guide/server/_core/rag/config.ts
  • /Users/tanner-osterkamp/Protocol-Guide/server/routers/query.ts

Documentation:

  • /Users/tanner-osterkamp/Protocol-Guide/docs/RAG_OPTIMIZATION.md
  • /Users/tanner-osterkamp/Protocol-Guide/docs/DATABASE-REVIEW-SUMMARY.md

Audit Completed By: Database Performance Optimization Agent Date: 2026-02-16 Next Review: 2026-05-16 (Quarterly)