Last updated: 2026-01-28
This document covers all automated processes, scheduled jobs, data ingestion pipelines, and CI/CD workflows.
- GitHub Actions Workflows
- Scheduled Jobs (Cron)
- Server Background Jobs
- Data Ingestion Pipeline
- PDF Processing Workflow
- Netlify Edge Functions
- Docker Automation
- npm Scripts
Located in: .github/workflows/
Triggers:
- Push to
mainordevelopbranches - Pull requests to
mainordevelopbranches
Jobs:
| Job | Purpose | Timeout |
|---|---|---|
lint-and-test |
TypeScript check, ESLint, Vitest unit tests | 15 min |
security-scan |
TruffleHog secret scanning, .env file check | 10 min |
build |
Build server (esbuild) + web export (Expo) | 20 min |
e2e-tests |
Playwright E2E tests on Chromium | 30 min |
deploy-netlify-staging |
Deploy to Netlify staging (develop branch only) | 15 min |
deploy-netlify |
Deploy to Netlify production (main branch only) | 15 min |
deploy-railway |
Deploy API server to Railway (main branch only) | 15 min |
Pipeline Flow:
push/PR
│
├── lint-and-test ──┬── build ──┬── e2e-tests ──┬── deploy-netlify-staging (develop)
│ │ │ │
└── security-scan ──┘ │ ├── deploy-netlify (main)
│ │
│ └── deploy-railway (main)
Key Features:
- Playwright browser caching by version (saves ~30-60s)
- pnpm store caching
- Build artifacts uploaded for 7 days
- Health check after Railway deployment
- Visual regression tests (currently disabled - see comments in workflow)
Schedule: Daily at 9:00 AM UTC (4 AM EST / 1 AM PST)
schedule:
- cron: '0 9 * * *'Purpose: Sends onboarding drip emails to users based on signup date:
- Day 3: Tips email - "3 tips to get the most out of Protocol Guide"
- Day 7: Pro pitch email - "Unlock unlimited Protocol Guide searches" (free users only)
How it Works:
- GitHub Action triggers daily
- Calls tRPC endpoint:
POST /api/trpc/jobs.runDripEmails - Authenticates with
CRON_SECRETenvironment variable - Server job queries users who signed up X days ago
- Sends emails via email service (Paubox — Resend BANNED per Ops Rules, no BAA)
- Records sent emails in
drip_emails_senttable
Required Secrets:
CRON_SECRET- Authentication token for job endpointAPI_URL(variable) - Defaults to Railway production URL
Schedule: Every 15 minutes
schedule:
- cron: '*/15 * * * *'Purpose: Production uptime monitoring with automatic issue creation on failure.
Endpoints Checked:
/api/health- Full health check (database, services)/api/live- Liveness probe (basic availability)
Failure Handling:
- Creates GitHub issue with label
health-check-failure - If issue already exists, adds comment with new failure timestamp
- Issue includes runbook links and action items
Manual Trigger:
gh workflow run "Health Monitor" --field environment=staging| Job | Schedule | Source | Description |
|---|---|---|---|
| Drip Emails | Daily 9 AM UTC | GitHub Actions | Onboarding email sequence |
| Health Monitor | Every 15 min | GitHub Actions | Production uptime checks |
Note: All scheduled jobs run via GitHub Actions, not server-side cron. This provides:
- Built-in logging and history
- Failure notifications
- No server resource consumption
- Easy manual re-runs
Located in: server/jobs/
Type: On-demand (triggered by API)
Purpose: Processes uploaded PDF protocols through the full RAG pipeline.
Pipeline Steps:
- Download PDF from storage URL
- Extract text using
pdf-parse - Chunk text into semantic sections (~1500 chars with 200 char overlap)
- Generate embeddings via Google Gemini (
gemini-embedding-2-previewmodel; Voyage removed 2026-03-24) - Insert chunks into Supabase
manus_protocol_chunkstable - Update upload status throughout
Status Flow:
pending → processing → chunking → embedding → completed
↘ failed
Key Functions:
processProtocolUpload(uploadId)- Process single uploadprocessPendingUploads()- Batch process up to 5 pending uploads
Environment Variables:
SUPABASE_URLSUPABASE_SERVICE_ROLE_KEYGOOGLE_API_KEY
Type: Cron-triggered (via GitHub Actions)
Purpose: Implements the drip email sequence for user onboarding.
Sequence:
| Day | Email Type | Template | Target |
|---|---|---|---|
| 3 | tips |
ONBOARDING_TIPS |
All users |
| 7 | pro_pitch |
ONBOARDING_PRO_PITCH |
Free tier users only |
Logic:
- Calculate target signup date (today - N days)
- Query users who signed up on that date
- Filter by tier (free users for pro_pitch)
- Check
drip_emails_senttable to avoid duplicates - Send email via Paubox (Resend BANNED per Ops Rules — no BAA)
- Record sent email
Located in: scripts/import-*.ts (40+ scripts, ~710KB total)
PDF Source (Web/Local)
│
▼
Download Script ─────────────────┐
│ │
▼ │
Import Script │
│ │
├── Extract metadata │
│ (protocol #, title) │
│ │
├── Parse PDF text ──────────┤
│ │
├── Chunk content │
│ │
├── Generate embeddings │
│ (Gemini Embedding 2) │
│ │
└── Insert to Supabase │
(manus_protocol_chunks) │
| State | Script | Agency |
|---|---|---|
| CA | import-alameda-protocols.ts |
Alameda County EMS |
| CA | import-contra-costa-protocols.ts |
Contra Costa EMS |
| CA | import-el-dorado-protocols.ts |
El Dorado County EMS |
| CA | import-imperial-county-protocols.ts |
Imperial County EMS |
| CA | import-kern-county-protocols.ts |
Kern County EMS |
| CA | import-la-county-local-pdfs.ts |
Los Angeles County EMS |
| CA | import-marin-protocols.ts |
Marin County EMS |
| CA | import-merced-protocols.ts |
Merced County EMS |
| CA | import-napa-protocols.ts |
Napa County EMS |
| CA | import-orange-county-protocols.ts |
Orange County EMS |
| CA | import-riverside-protocols.ts |
Riverside County EMS |
| CA | import-sacramento-protocols.ts |
Sacramento County EMS |
| CA | import-san-benito-protocols.ts |
San Benito County EMS |
| CA | import-san-diego-protocols.ts |
San Diego County EMS |
| CA | import-san-francisco-protocols.ts |
San Francisco EMS |
| CA | import-san-joaquin-protocols.ts |
San Joaquin County EMS |
| CA | import-san-luis-obispo-protocols.ts |
SLO County EMS |
| CA | import-san-mateo-protocols.ts |
San Mateo County EMS |
| CA | import-santa-barbara-protocols.ts |
Santa Barbara County EMS |
| CA | import-santa-clara-protocols.ts |
Santa Clara County EMS |
| CA | import-santa-cruz-protocols.ts |
Santa Cruz County EMS |
| CA | import-slo-county-protocols.ts |
SLO County EMS (alt) |
| CA | import-solano-protocols.ts |
Solano County EMS |
| CA | import-ssvems-protocols.ts |
South Santa Barbara VEMS |
| CA | import-ventura-county-protocols.ts |
Ventura County EMS |
| CA | import-yolo-county-protocols.ts |
Yolo County EMS |
| NY | import-ny-protocols.ts |
New York State |
| TX | import-tx-fl-protocols.ts |
Texas |
| FL | import-tx-fl-protocols.ts |
Florida |
| IL | import-il-pa-protocols.ts |
Illinois |
| PA | import-il-pa-protocols.ts |
Pennsylvania |
| OH | import-oh-ga-protocols.ts |
Ohio |
| GA | import-oh-ga-protocols.ts |
Georgia |
# Single agency import
npx tsx scripts/import-la-county-local-pdfs.ts
# With environment variables
SUPABASE_URL=xxx SUPABASE_SERVICE_ROLE_KEY=xxx GOOGLE_API_KEY=xxx npx tsx scripts/import-*.ts| Script | Purpose |
|---|---|
download-el-dorado-pdfs.ts |
Download from El Dorado county website |
download-riverside-protocols.ps1 |
PowerShell downloader for Riverside |
download-santa-clara-pdfs.ts |
Playwright-based PDF scraper |
cdp-download.js |
Chrome DevTools Protocol PDF downloader |
playwright-download-pdf.ts |
Playwright-based PDF downloader |
// Using pdf-parse library
const pdfParse = require('pdf-parse');
const data = await pdfParse(pdfBuffer);
const text = data.text;Parameters:
- Max chunk size: 1500 characters
- Overlap: 200 characters
- Split on: paragraph breaks (
\n\n)
Section Detection Patterns:
- Markdown headers:
^#{1,3}\s+(.+)$ - Section markers:
^Section\s*(\d+[\.\d]*)[:\s]+(.+)$ - Chapter markers:
^Chapter\s*(\d+)[:\s]+(.+)$ - Numbered sections:
^\d+\.\d+[\.\d]*\s+(.+)$ - Procedure headers:
^(PROCEDURE|TREATMENT|ASSESSMENT)[:\s]*(.*)$
Service: Google Gemini (Voyage AI removed 2026-03-24)
Model: gemini-embedding-2-preview
Batch Size: 100 texts per request
Rate Limiting: 100ms delay between batches
// See server/_core/embeddings/config.ts for canonical config
const response = await fetch(
`https://generativelanguage.googleapis.com/v1beta/models/gemini-embedding-2-preview:batchEmbedContents?key=${GOOGLE_API_KEY}`,
{
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
requests: texts.map((text) => ({
model: 'models/gemini-embedding-2-preview',
content: { parts: [{ text }] },
taskType: 'RETRIEVAL_DOCUMENT',
})),
}),
},
);For bulk embedding generation of all protocols:
npx tsx scripts/generate-embeddings.tsFeatures:
- Progress reporting with ETA
- Configurable batch size (default: 128)
- Error counting
Located in: netlify/edge-functions/
Path: /api/static/*
Purpose: CDN-level caching for protocol statistics and coverage data.
Cache Durations:
| Path Pattern | TTL |
|---|---|
/api/static/stats/* |
1 hour |
/api/static/coverage/* |
1 hour |
/api/static/agencies/* |
30 minutes |
Other /api/static/* |
10 minutes |
Headers Added:
Cache-Control: public, max-age=X, s-maxage=X, stale-while-revalidate=2XX-Edge-Cache: MISSX-Edge-Cache-TTL: X
Path: /* (excludes static assets)
Purpose: Adds geolocation headers for personalized state/region content.
Headers Added:
| Header | Description |
|---|---|
X-Geo-Country |
ISO country code |
X-Geo-Region |
ISO 3166-2 region code |
X-Geo-State |
US state abbreviation |
X-Geo-City |
City name |
X-Geo-Lat |
Latitude |
X-Geo-Lon |
Longitude |
X-Geo-Timezone |
IANA timezone |
X-Geo-Data |
JSON object with all geo data |
Located in: docker-compose.yml
| Service | Container | Port | Purpose |
|---|---|---|---|
api |
protocol-guide-api |
3000 | Express + tRPC API server |
web |
protocol-guide-web |
8081 | Expo web frontend |
dev |
protocol-guide-dev |
3000, 8081 | Full dev environment |
# Start production-like stack
pnpm docker:up
# Start development mode (with hot reload)
pnpm docker:dev
# View logs
pnpm docker:logs
# Stop all
pnpm docker:down
# Rebuild images
pnpm docker:buildAPI container has built-in health check:
healthcheck:
test: ["CMD", "wget", "--spider", "http://localhost:3000/api/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 10s| Script | Command | Purpose |
|---|---|---|
dev |
concurrently dev:server dev:metro |
Start full dev environment |
dev:server |
tsx watch server/_core/index.ts |
Start API with hot reload |
dev:metro |
expo start --web --port 8081 |
Start Expo web server |
| Script | Purpose |
|---|---|
build |
Build server with esbuild |
build:web |
Export Expo web, inject PWA meta, copy assets |
start |
Run production server |
| Script | Purpose |
|---|---|
test |
Run Vitest unit tests |
test:integration |
Run integration tests (single fork) |
test:e2e |
Run Playwright E2E tests |
test:e2e:ui |
Playwright with UI mode |
test:e2e:visual |
Visual regression tests |
test:all |
Vitest + Playwright |
| Script | Purpose |
|---|---|
db:push |
Generate and run Drizzle migrations |
sitemap |
Generate sitemap.xml |
| Script | Purpose |
|---|---|
analyze |
Build and analyze bundle sizes |
bench |
Run Vitest benchmarks |
bench:report |
Generate benchmark report |
| Variable | Used By | Purpose |
|---|---|---|
CRON_SECRET |
drip-emails.yml | Auth for job endpoint |
RAILWAY_TOKEN |
ci.yml | Railway deployment |
NETLIFY_AUTH_TOKEN |
ci.yml | Netlify deployment |
NETLIFY_SITE_ID |
ci.yml | Netlify site identifier |
SENTRY_DSN |
ci.yml | Error tracking |
| Variable | Purpose |
|---|---|
SUPABASE_URL |
Supabase project URL |
SUPABASE_SERVICE_ROLE_KEY |
Admin access to Supabase |
GOOGLE_API_KEY |
Embedding generation (Gemini; Voyage removed 2026-03-24) |
- View runs:
https://github.com/<owner>/Protocol-Guide/actions - Failure notifications: GitHub email notifications
- Health check failures: Creates GitHub issues
- Health endpoint:
/api/health - Liveness endpoint:
/api/live - Monitored every 15 minutes
- Error tracking enabled in production
- DSN configured via
SENTRY_DSNenvironment variable
- Check GitHub Actions workflow run logs
- Verify
CRON_SECRETmatches server config - Check
drip_emails_senttable for duplicates - Verify user has email and correct
createdAt
- Check
protocol_uploadstable for status - Look for
errorMessagein failed uploads - Verify
GOOGLE_API_KEYis valid - Check PDF is accessible at
fileUrl
- Check GitHub issue for details
- Verify Railway deployment is healthy
- Check database connectivity
- Review Sentry for errors
- Copy existing script as template (e.g.,
import-la-county-local-pdfs.ts) - Update agency name, state code, URL patterns
- Adjust PDF parsing for source format
- Test with single PDF first
- Run full import with monitoring
- Create job in
server/jobs/ - Add tRPC endpoint in router
- Create GitHub Actions workflow with cron schedule
- Add
CRON_SECRETauthentication - Document in this file