Technical documentation for the semantic search pipeline powering Protocol Guide's EMS protocol retrieval.
Protocol Guide uses a sophisticated hybrid search system combining vector embeddings with keyword matching to deliver fast, accurate protocol retrieval for field medics. The system is optimized for a 2-second latency target while maintaining high accuracy for life-critical medication dosing queries.
| Property | Value |
|---|---|
| Provider | Google Gemini (Voyage AI removed 2026-03-24) |
| Model | gemini-embedding-2-preview |
| Dimensions | 1536 |
| API Endpoint | https://generativelanguage.googleapis.com/v1beta/models/gemini-embedding-2-preview:embedContent |
| Max Input | ~8000 characters (truncated for safety) |
- Higher retrieval quality than Voyage on the EMS regression set (thresholds recalibrated 2026-03-19)
- 1536 dimensions retained for HNSW index compatibility (no schema change required)
- Consolidated embedding code path through
server/_core/embeddings/config.ts
// server/_core/embeddings/config.ts
export const EMBEDDING_CONFIG = {
model: 'gemini-embedding-2-preview',
dimension: 1536,
};| Property | Value |
|---|---|
| Database | Supabase (PostgreSQL) |
| Extension | pgvector |
| Table | manus_protocol_chunks |
| Vector Column | embedding (vector(1536)) |
| Index Type | HNSW (via Supabase) |
-- manus_protocol_chunks (Supabase)
CREATE TABLE manus_protocol_chunks (
id SERIAL PRIMARY KEY,
agency_id INTEGER NOT NULL,
protocol_number TEXT NOT NULL,
protocol_title TEXT NOT NULL,
section TEXT,
content TEXT NOT NULL,
image_urls TEXT[],
embedding VECTOR(1536), -- Gemini Embedding 2 Preview (Voyage removed 2026-03-24)
state_code CHAR(2),
created_at TIMESTAMP DEFAULT NOW()
);
-- Indexes for fast retrieval
CREATE INDEX ON manus_protocol_chunks USING ivfflat (embedding vector_cosine_ops);
CREATE INDEX ON manus_protocol_chunks (agency_id);
CREATE INDEX ON manus_protocol_chunks (state_code);
CREATE INDEX ON manus_protocol_chunks (protocol_number);Protocol content is intelligently chunked to preserve semantic coherence, especially for medication dosing information.
| Parameter | Value | Notes |
|---|---|---|
| Target Size | 1200 chars | Optimal for embedding quality |
| Minimum Size | 400 chars | Prevents tiny, context-less chunks |
| Maximum Size | 1800 chars | Prevents embedding quality degradation |
| Overlap | 150 chars | Maintains context across chunk boundaries |
The chunker (server/_core/protocol-chunker.ts) detects natural boundaries:
- Paragraph breaks (double newlines)
- Section headers (TREATMENT, INDICATIONS, DOSING, etc.)
- Sentence boundaries (fallback)
Each chunk is classified for re-ranking:
medication- Contains dosing info (mg, mcg, routes)procedure- Step-by-step instructionsassessment- Signs/symptoms, criteriageneral- Everything else
Chunks are enriched with context before embedding:
// server/_core/protocol-chunker.ts
function generateEmbeddingText(chunk: ProtocolChunk): string {
return [
`Protocol: ${protocolTitle}`,
section ? `Section: ${section}` : null,
contentType !== 'general' ? `Type: ${contentType}` : null,
content
].filter(Boolean).join('\n\n');
}The core search uses PostgreSQL's pgvector extension with cosine distance:
-- server/docs/update-search-rpc.sql
SELECT
id,
protocol_number,
protocol_title,
content,
1 - (embedding <=> query_embedding) AS similarity -- Cosine similarity
FROM manus_protocol_chunks
WHERE 1 - (embedding <=> query_embedding) > match_threshold
ORDER BY embedding <=> query_embedding
LIMIT match_count;Thresholds adjust based on query intent:
| Query Type | Threshold | Rationale |
|---|---|---|
| Medication dosing | 0.38 | Higher precision for safety-critical |
| Procedure steps | 0.35 | Standard precision |
| General queries | 0.30 | Better recall |
| Minimum acceptable | 0.20 | Below this = no results |
Protocol number lookups use keyword matching first, then semantic search:
// server/_core/embeddings/search.ts
function extractProtocolNumber(query: string): string | null {
// Matches: "814", "Ref 502", "policy 510"
const patterns = [
/\b(?:ref\.?\s*(?:no\.?)?\s*)?(\d{3,4})\b/i,
/\bpolicy\s+(\d{3,4})\b/i,
/\bprotocol\s+(\d{3,4})\b/i,
];
// ...
}1. Check if query contains protocol number
├─ YES: Run keyword search (protocol_number ILIKE '%502%')
└─ Merge with semantic results (deduplicated)
2. Generate query embedding (Gemini Embedding 2 Preview; Voyage removed 2026-03-24)
3. Execute pgvector search with:
- agency_filter (optional)
- state_code_filter (optional)
- similarity threshold
4. Merge keyword + semantic results
5. Apply re-ranking
6. Return top N results
Results are re-ranked using multiple signals after vector retrieval.
| Signal | Score Boost | Condition |
|---|---|---|
| Title match | +5 | Query term in protocol title |
| Medication match | +8 | Extracted medication in content |
| Condition match | +6 | Extracted condition in content |
| Section priority | +2 to +10 | Based on section type (dosing=10, overview=3) |
| Short content penalty | -5 | Content < 200 chars |
| Dosage info (for med queries) | +10 | Contains mg/mcg/units patterns |
The system understands medical terminology synonyms:
// server/_core/ems-query-normalizer.ts
const MEDICAL_SYNONYMS = {
'cardiac arrest': ['code', 'asystole', 'vfib', 'vtach', 'pea', 'pulseless'],
'heart attack': ['myocardial infarction', 'mi', 'stemi', 'nstemi', 'acs'],
'epinephrine': ['epi', 'adrenaline', 'epipen'],
'nitroglycerin': ['nitro', 'ntg', 'nitrostat'],
// ... 80+ term mappings
};| Intent | Additional Boosts |
|---|---|
medication_dosing |
+15 for dosage patterns, +8 for adult/pediatric mentions |
procedure_steps |
+10 for step patterns, +5 for equipment mentions |
contraindication_check |
+12 for warning/avoid patterns |
pediatric_specific |
+12 for weight-based/kg mentions |
For complex/safety-critical queries, the system searches with multiple query variations:
// Original: "epi dose anaphylaxis peds"
// Variations:
[
"epinephrine dose anaphylaxis allergic reaction pediatric", // Normalized
"anaphylaxis allergic reaction protocol treatment", // Condition-focused
"epinephrine dosage indication route", // Medication-focused
"epinephrine dose anaphylaxis cardiac arrest allergic reaction", // Synonym-expanded
]Multiple result lists are merged using RRF (k=60):
// RRF Score = Σ 1/(k + rank)
// Higher k = more smoothing between rankings- Medication dosing queries
- Contraindication checks
- Complex queries (multiple medications/conditions)
- Pediatric + medication combinations
Field medics often use abbreviations and rushed typing. The normalizer handles this:
// 150+ EMS abbreviations
const EMS_ABBREVIATIONS = {
'epi': 'epinephrine',
'ntg': 'nitroglycerin',
'vfib': 'ventricular fibrillation',
'peds': 'pediatric',
'sob': 'shortness of breath',
'bvm': 'bag valve mask',
'iv': 'intravenous',
// ...
};const TYPO_CORRECTIONS = {
'epinephrin': 'epinephrine',
'defibralation': 'defibrillation',
'siezure': 'seizure',
'anaphylaxsis': 'anaphylaxis',
// ...
};Queries are classified into intents for routing:
| Intent | Example Query | Priority |
|---|---|---|
contraindication_check |
"can I give nitro with viagra" | 100 |
pediatric_specific |
"peds epi dose" | 90 |
differential_diagnosis |
"afib vs aflutter" | 80 |
protocol_lookup |
"protocol 502" | 75 |
procedure_steps |
"how to intubate" | 70 |
assessment_criteria |
"stroke criteria" | 60 |
medication_dosing |
"adenosine dose" | 50 |
| Property | Value |
|---|---|
| Type | LRU (Least Recently Used) |
| Max Size | 1000 entries |
| TTL | 24 hours |
| Key | SHA-256 hash of input text |
// server/_core/embeddings/cache.ts
class EmbeddingCache {
private cache: Map<string, CacheEntry> = new Map();
// Cleanup runs hourly
}| Property | Value |
|---|---|
| Storage | Upstash Redis |
| TTL | 1 hour (3600s) |
| Key Format | search:{md5(query:agencyId:stateCode:limit)} |
| Headers | Cache-Control: public, max-age=3600, stale-while-revalidate=300 |
// server/_core/search-cache.ts
function getSearchCacheKey(params: SearchCacheParams): string {
const hash = createHash('md5').update(JSON.stringify(normalized)).digest('hex');
return `search:${hash}`;
}For initial data import and bulk updates:
// scripts/generate-embeddings.ts
await generateAllEmbeddings({
batchSize: 128, // Gemini batch size (Voyage removed 2026-03-24)
onProgress: (current, total) => console.log(`${current}/${total}`)
});- Query protocols without embeddings (
embedding IS NULL) - Combine title + section + content for each
- Truncate to 8000 chars
- Send batch to Gemini (max 128 per request; Voyage removed 2026-03-24)
- Update database with embeddings
- Repeat until complete
| Component | Simple Query | Complex Query |
|---|---|---|
| Query normalization | 10ms | 10ms |
| Embedding (cached) | 0ms | - |
| Embedding (new) | 250ms | 300ms (3 parallel) |
| Vector search | 150ms | 200ms (3 parallel) |
| RRF merge | - | 10ms |
| Advanced re-ranking | 40ms | 50ms |
| LLM inference (Haiku) | 1200ms | 1200ms |
| Total | ~1400ms | ~1770ms |
| Metric | Before Optimization | After |
|---|---|---|
| Medication query accuracy | ~78% | ~92% |
| General query accuracy | ~82% | ~90% |
| Recall (multi-query mode) | ~70% | ~85% |
| Protocol number lookup | ~60% | ~95% |
// Generate embedding for text
import { generateEmbedding } from './server/_core/embeddings';
const embedding: number[] = await generateEmbedding("cardiac arrest protocol");
// Semantic search
import { semanticSearchProtocols } from './server/_core/embeddings';
const results = await semanticSearchProtocols({
query: "epinephrine dose anaphylaxis",
agencyId: 123,
stateCode: "CA",
limit: 10,
threshold: 0.35
});
// Normalize EMS query
import { normalizeEmsQuery } from './server/_core/ems-query-normalizer';
const normalized = normalizeEmsQuery("epi dose anaph peds");
// { normalized: "epinephrine dose anaphylaxis pediatric", intent: "medication_dosing" }
// Chunk protocol
import { chunkProtocol } from './server/_core/protocol-chunker';
const chunks = chunkProtocol(text, "502", "Cardiac Arrest");// Semantic search (rate limited)
trpc.search.semantic.query({
query: string,
countyId?: number,
limit?: number,
stateFilter?: string
});
// Agency-specific search
trpc.search.searchByAgency.query({
query: string,
agencyId: number,
limit?: number
});
// Summarize for field use
trpc.search.summarize.query({
query: string,
content: string,
protocolTitle?: string
});| File | Purpose |
|---|---|
server/_core/embeddings/index.ts |
Main exports, re-exports all modules |
server/_core/embeddings/generate.ts |
Single embedding generation |
server/_core/embeddings/batch.ts |
Batch embedding for imports |
server/_core/embeddings/cache.ts |
LRU embedding cache |
server/_core/embeddings/search.ts |
Semantic search + hybrid search |
server/_core/ems-query-normalizer.ts |
Query preprocessing, abbreviations, synonyms |
server/_core/protocol-chunker.ts |
Semantic-aware chunking |
server/_core/search-cache.ts |
Redis result caching |
server/_core/rag/*.ts |
RAG pipeline optimization, re-ranking |
server/routers/search.ts |
tRPC search endpoints |
docs/update-search-rpc.sql |
PostgreSQL search function |
# Required
GOOGLE_API_KEY=your-google-api-key
SUPABASE_URL=https://your-project.supabase.co
SUPABASE_SERVICE_ROLE_KEY=your-service-role-key
# Optional (for Redis caching)
UPSTASH_REDIS_REST_URL=https://your-redis.upstash.io
UPSTASH_REDIS_REST_TOKEN=your-token- Check embedding cache stats:
embeddingCache.getStats() - Verify query is being normalized: log
normalizeEmsQuery()output - Lower similarity threshold temporarily
- Enable multi-query fusion for the query type
- Check Redis cache hit rate:
getSearchCacheStats() - Verify embeddings are cached (24h TTL)
- Monitor Gemini response times
- Check pgvector index health
- Verify embedding exists for expected protocol
- Check similarity threshold isn't too high
- Try query variations manually
- Check if protocol is in correct agency/state
-
Evaluate— Superseded by Gemini Embedding 2 Preview migration (2026-03-24)voyage-3for improved medical performance - Add cross-encoder re-ranking for top-10 results
- Implement query expansion with SNOMED/ICD-10 ontologies
- Pre-compute embeddings for top 1000 common queries
- Add user feedback loop for retrieval quality
Last updated: 2025-01-28 See also: RAG_OPTIMIZATION.md