Skip to content

Latest commit

 

History

History
108 lines (82 loc) · 5.4 KB

File metadata and controls

108 lines (82 loc) · 5.4 KB

Protocol Guide — Data Gap Analysis

Generated: 2026-03-08
Source: manus_agencies + manus_protocol_chunks tables (Supabase)

Summary

Metric Count
Total agencies (manus_agencies) 22,839
Agencies WITH protocol chunks 2,037 (8.9%)
Agencies WITHOUT chunks 20,802 (91.1%)
Total protocol chunks 64,397
States with any coverage 51 (all states + DC)
Counties in counties table 3,090
Counties with uses_state_protocols=true 0

Key Finding

The "681 agencies with zero chunks" number from earlier analysis was based on the Drizzle-managed agencies table (admin/subscription system). The actual protocol data lives in manus_agencies (22,839 entries) and manus_protocol_chunks (64,397 chunks across 2,037 agencies).

91% of agencies have zero protocol chunks. But this is expected — most are placeholder entries from the NASEMSO national agency seed. Only agencies with actively ingested protocols have chunks.

Coverage by State (Top 20)

State Agencies w/ Chunks Total Agencies Coverage
GA 159 695 22.9%
KY 121 464 26.1%
MO 115 624 18.4%
KS 105 596 17.6%
IL 102 1,083 9.4%
IA 98 917 10.7%
NE 93 524 17.7%
IN 92 633 14.5%
MN 87 533 16.3%
MI 83 659 12.6%
MS 82 442 18.6%
NC 78 430 18.1%
AR 75 419 17.9%
AL 68 320 21.3%
FL 68 772 8.8%
CO 66 451 14.6%
LA 64 338 18.9%
NY 64 1,346 4.8%
MT 56 238 23.5%
ID 44 218 20.2%

Worst Coverage (Large States)

State Agencies w/ Chunks Total Agencies Coverage
TX 8 1,771 0.5%
PA 7 1,388 0.5%
OH 7 1,169 0.6%
NY 64 1,346 4.8%
WI 3 593 0.5%
NJ 22 539 4.1%

Data That Exists But Isn't Chunked

Only 1 agency has protocol_count > 0 but zero chunks:

  • Lake County EMS (CA) — protocol_count=10

This means the ingestion pipeline is well-maintained: if data is ingested, chunks exist.

uses_state_protocols Status

The counties.uses_state_protocols field is universally false/null (0 out of 3,090 counties). This field was designed to flag counties that defer to state-level protocols instead of having LEMSA-specific ones, but it was never populated.

Architecture Notes

  1. manus_agencies — NASEMSO-seeded national directory (22,839 entries). Most are placeholders.
  2. manus_protocol_chunks — Actual protocol text chunks with embeddings. Only populated for actively ingested agencies.
  3. agencies — Drizzle-managed table for the subscription/admin system. Separate from protocol data.
  4. county_agency_mapping — Maps counties to their LEMSA for jurisdiction-scoped search.
  5. Ingestion scripts: ingest-state.ts (multi-state CLI), ingest-ca-protocols.ts (CA-specific), ingest-local-pdfs.ts (manual PDF upload)

Categorization of 20,802 Uncovered Agencies

Category Est. Count Description
NASEMSO placeholder entries ~19,800 Seeded from national directory. No protocol source identified.
State-protocol agencies ~500-800 Follow state-level protocols (no unique LEMSA protocols). Should use uses_state_protocols flag.
Genuinely missing (protocols exist online) ~200-400 Have published protocols but haven't been ingested yet.
Lake County EMS (CA) 1 Has protocol_count=10 but zero chunks — ingestion incomplete.

Actionable Next Steps

Immediate (This Week)

  1. Fix Lake County EMS — Only agency with protocol_count>0 but no chunks. Run: npx tsx scripts/ingest-ca-protocols.ts --lemsa "Lake County"
  2. Populate uses_state_protocols — Audit which counties/agencies defer to state protocols. Start with CA (well-documented LEMSA structure).

Short-Term (This Month)

  1. Prioritize TX, PA, OH — Largest states with <1% coverage. Research state EMS protocol structures.
  2. Audit NASEMSO seed data — Determine which of the 22K agencies are actual protocol-publishing LEMSAs vs. individual fire departments/ambulance services that follow a parent LEMSA.
  3. Add is_lemsa flag to manus_agencies — Distinguish protocol-publishing authorities from individual agencies. Most of the 22K are individual services that follow a LEMSA's protocols.

Medium-Term (This Quarter)

  1. State-level protocol ingestion — For states with centralized protocols (e.g., state EMS offices), ingest once and map all agencies in that state.
  2. Coverage dashboard — Add a /admin/coverage page showing ingestion status by state/county.
  3. Automated gap detection — Cron job to identify new agencies added without chunks.

Key Insight

The 20,802 "missing" agencies are mostly not a data gap — they're individual fire departments and ambulance services that follow their regional LEMSA's protocols. The real metric is LEMSA coverage, not individual agency coverage. There are roughly 200-300 LEMSAs nationally; Protocol Guide covers ~50 of them well (primarily through CA's 33 LEMSAs and scattered coverage in other states).

True coverage: ~50/250 LEMSAs nationally (20%). Expanding to the remaining ~200 LEMSAs would effectively cover all 22,839 agencies.