Protocol Guide - Automations & Jobs Documentation

Last updated: 2026-01-28
This document covers all automated processes, scheduled jobs, data ingestion pipelines, and CI/CD workflows.

GitHub Actions Workflows
Scheduled Jobs (Cron)
Server Background Jobs
Data Ingestion Pipeline
PDF Processing Workflow
Netlify Edge Functions
Docker Automation
npm Scripts

1. GitHub Actions Workflows

Located in: .github/workflows/

1.1 CI Pipeline (`ci.yml`)

Triggers:

Push to main or develop branches
Pull requests to main or develop branches

Jobs:

Job	Purpose	Timeout
`lint-and-test`	TypeScript check, ESLint, Vitest unit tests	15 min
`security-scan`	TruffleHog secret scanning, .env file check	10 min
`build`	Build server (esbuild) + web export (Expo)	20 min
`e2e-tests`	Playwright E2E tests on Chromium	30 min
`deploy-netlify-staging`	Deploy to Netlify staging (develop branch only)	15 min
`deploy-netlify`	Deploy to Netlify production (main branch only)	15 min
`deploy-railway`	Deploy API server to Railway (main branch only)	15 min

Pipeline Flow:

push/PR
   │
   ├── lint-and-test ──┬── build ──┬── e2e-tests ──┬── deploy-netlify-staging (develop)
   │                   │           │               │
   └── security-scan ──┘           │               ├── deploy-netlify (main)
                                   │               │
                                   │               └── deploy-railway (main)

Key Features:

Playwright browser caching by version (saves ~30-60s)
pnpm store caching
Build artifacts uploaded for 7 days
Health check after Railway deployment
Visual regression tests (currently disabled - see comments in workflow)

1.2 Drip Email Sender (`drip-emails.yml`)

Schedule: Daily at 9:00 AM UTC (4 AM EST / 1 AM PST)

schedule:
  - cron: '0 9 * * *'

Purpose: Sends onboarding drip emails to users based on signup date:

Day 3: Tips email - "3 tips to get the most out of Protocol Guide"
Day 7: Pro pitch email - "Unlock unlimited Protocol Guide searches" (free users only)

How it Works:

GitHub Action triggers daily
Calls tRPC endpoint: POST /api/trpc/jobs.runDripEmails
Authenticates with CRON_SECRET environment variable
Server job queries users who signed up X days ago
Sends emails via email service (Paubox — Resend BANNED per Ops Rules, no BAA)
Records sent emails in drip_emails_sent table

Required Secrets:

CRON_SECRET - Authentication token for job endpoint
API_URL (variable) - Defaults to Railway production URL

1.3 Health Monitor (`health-monitor.yml`)

Schedule: Every 15 minutes

schedule:
  - cron: '*/15 * * * *'

Purpose: Production uptime monitoring with automatic issue creation on failure.

Endpoints Checked:

/api/health - Full health check (database, services)
/api/live - Liveness probe (basic availability)

Failure Handling:

Creates GitHub issue with label health-check-failure
If issue already exists, adds comment with new failure timestamp
Issue includes runbook links and action items

Manual Trigger:

gh workflow run "Health Monitor" --field environment=staging

2. Scheduled Jobs (Cron)

Job	Schedule	Source	Description
Drip Emails	Daily 9 AM UTC	GitHub Actions	Onboarding email sequence
Health Monitor	Every 15 min	GitHub Actions	Production uptime checks

Note: All scheduled jobs run via GitHub Actions, not server-side cron. This provides:

Built-in logging and history
Failure notifications
No server resource consumption
Easy manual re-runs

3. Server Background Jobs

Located in: server/jobs/

3.1 Protocol Processor (`protocol-processor.ts`)

Type: On-demand (triggered by API)

Purpose: Processes uploaded PDF protocols through the full RAG pipeline.

Pipeline Steps:

Download PDF from storage URL
Extract text using pdf-parse
Chunk text into semantic sections (~1500 chars with 200 char overlap)
Generate embeddings via Google Gemini (gemini-embedding-2-preview model; Voyage removed 2026-03-24)
Insert chunks into Supabase manus_protocol_chunks table
Update upload status throughout

Status Flow:

pending → processing → chunking → embedding → completed
                                           ↘ failed

Key Functions:

processProtocolUpload(uploadId) - Process single upload
processPendingUploads() - Batch process up to 5 pending uploads

Environment Variables:

SUPABASE_URL
SUPABASE_SERVICE_ROLE_KEY
GOOGLE_API_KEY

3.2 Drip Email Job (`send-drip-emails.ts`)

Type: Cron-triggered (via GitHub Actions)

Purpose: Implements the drip email sequence for user onboarding.

Sequence:

Day	Email Type	Template	Target
3	`tips`	`ONBOARDING_TIPS`	All users
7	`pro_pitch`	`ONBOARDING_PRO_PITCH`	Free tier users only

Logic:

Calculate target signup date (today - N days)
Query users who signed up on that date
Filter by tier (free users for pro_pitch)
Check drip_emails_sent table to avoid duplicates
Send email via Paubox (Resend BANNED per Ops Rules — no BAA)
Record sent email

4. Data Ingestion Pipeline

Located in: scripts/import-*.ts (40+ scripts, ~710KB total)

4.1 Architecture

PDF Source (Web/Local)
       │
       ▼
   Download Script ─────────────────┐
       │                            │
       ▼                            │
   Import Script                    │
       │                            │
       ├── Extract metadata         │
       │   (protocol #, title)      │
       │                            │
       ├── Parse PDF text ──────────┤
       │                            │
       ├── Chunk content            │
       │                            │
       ├── Generate embeddings      │
       │   (Gemini Embedding 2)     │
       │                            │
       └── Insert to Supabase       │
           (manus_protocol_chunks)  │

4.2 Import Scripts by Region

State	Script	Agency
CA	`import-alameda-protocols.ts`	Alameda County EMS
CA	`import-contra-costa-protocols.ts`	Contra Costa EMS
CA	`import-el-dorado-protocols.ts`	El Dorado County EMS
CA	`import-imperial-county-protocols.ts`	Imperial County EMS
CA	`import-kern-county-protocols.ts`	Kern County EMS
CA	`import-la-county-local-pdfs.ts`	Los Angeles County EMS
CA	`import-marin-protocols.ts`	Marin County EMS
CA	`import-merced-protocols.ts`	Merced County EMS
CA	`import-napa-protocols.ts`	Napa County EMS
CA	`import-orange-county-protocols.ts`	Orange County EMS
CA	`import-riverside-protocols.ts`	Riverside County EMS
CA	`import-sacramento-protocols.ts`	Sacramento County EMS
CA	`import-san-benito-protocols.ts`	San Benito County EMS
CA	`import-san-diego-protocols.ts`	San Diego County EMS
CA	`import-san-francisco-protocols.ts`	San Francisco EMS
CA	`import-san-joaquin-protocols.ts`	San Joaquin County EMS
CA	`import-san-luis-obispo-protocols.ts`	SLO County EMS
CA	`import-san-mateo-protocols.ts`	San Mateo County EMS
CA	`import-santa-barbara-protocols.ts`	Santa Barbara County EMS
CA	`import-santa-clara-protocols.ts`	Santa Clara County EMS
CA	`import-santa-cruz-protocols.ts`	Santa Cruz County EMS
CA	`import-slo-county-protocols.ts`	SLO County EMS (alt)
CA	`import-solano-protocols.ts`	Solano County EMS
CA	`import-ssvems-protocols.ts`	South Santa Barbara VEMS
CA	`import-ventura-county-protocols.ts`	Ventura County EMS
CA	`import-yolo-county-protocols.ts`	Yolo County EMS
NY	`import-ny-protocols.ts`	New York State
TX	`import-tx-fl-protocols.ts`	Texas
FL	`import-tx-fl-protocols.ts`	Florida
IL	`import-il-pa-protocols.ts`	Illinois
PA	`import-il-pa-protocols.ts`	Pennsylvania
OH	`import-oh-ga-protocols.ts`	Ohio
GA	`import-oh-ga-protocols.ts`	Georgia

4.3 Running Import Scripts

# Single agency import
npx tsx scripts/import-la-county-local-pdfs.ts

# With environment variables
SUPABASE_URL=xxx SUPABASE_SERVICE_ROLE_KEY=xxx GOOGLE_API_KEY=xxx npx tsx scripts/import-*.ts

4.4 Download Scripts

Script	Purpose
`download-el-dorado-pdfs.ts`	Download from El Dorado county website
`download-riverside-protocols.ps1`	PowerShell downloader for Riverside
`download-santa-clara-pdfs.ts`	Playwright-based PDF scraper
`cdp-download.js`	Chrome DevTools Protocol PDF downloader
`playwright-download-pdf.ts`	Playwright-based PDF downloader

5. PDF Processing Workflow

5.1 Text Extraction

// Using pdf-parse library
const pdfParse = require('pdf-parse');
const data = await pdfParse(pdfBuffer);
const text = data.text;

5.2 Chunking Strategy

Parameters:

Max chunk size: 1500 characters
Overlap: 200 characters
Split on: paragraph breaks (\n\n)

Section Detection Patterns:

Markdown headers: ^#{1,3}\s+(.+)$
Section markers: ^Section\s*(\d+[\.\d]*)[:\s]+(.+)$
Chapter markers: ^Chapter\s*(\d+)[:\s]+(.+)$
Numbered sections: ^\d+\.\d+[\.\d]*\s+(.+)$
Procedure headers: ^(PROCEDURE|TREATMENT|ASSESSMENT)[:\s]*(.*)$

5.3 Embedding Generation

Service: Google Gemini (Voyage AI removed 2026-03-24)
Model: gemini-embedding-2-preview
Batch Size: 100 texts per request
Rate Limiting: 100ms delay between batches

// See server/_core/embeddings/config.ts for canonical config
const response = await fetch(
  `https://generativelanguage.googleapis.com/v1beta/models/gemini-embedding-2-preview:batchEmbedContents?key=${GOOGLE_API_KEY}`,
  {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
      requests: texts.map((text) => ({
        model: 'models/gemini-embedding-2-preview',
        content: { parts: [{ text }] },
        taskType: 'RETRIEVAL_DOCUMENT',
      })),
    }),
  },
);

5.4 One-Time Embedding Generation

For bulk embedding generation of all protocols:

npx tsx scripts/generate-embeddings.ts

Features:

Progress reporting with ETA
Configurable batch size (default: 128)
Error counting

6. Netlify Edge Functions

Located in: netlify/edge-functions/

6.1 Static Cache (`cache-static.ts`)

Path: /api/static/*

Purpose: CDN-level caching for protocol statistics and coverage data.

Cache Durations:

Path Pattern	TTL
`/api/static/stats/*`	1 hour
`/api/static/coverage/*`	1 hour
`/api/static/agencies/*`	30 minutes
Other `/api/static/*`	10 minutes

Headers Added:

Cache-Control: public, max-age=X, s-maxage=X, stale-while-revalidate=2X
X-Edge-Cache: MISS
X-Edge-Cache-TTL: X

6.2 Geo Routing (`geo-route.ts`)

Path: /* (excludes static assets)

Purpose: Adds geolocation headers for personalized state/region content.

Headers Added:

Header	Description
`X-Geo-Country`	ISO country code
`X-Geo-Region`	ISO 3166-2 region code
`X-Geo-State`	US state abbreviation
`X-Geo-City`	City name
`X-Geo-Lat`	Latitude
`X-Geo-Lon`	Longitude
`X-Geo-Timezone`	IANA timezone
`X-Geo-Data`	JSON object with all geo data

7. Docker Automation

Located in: docker-compose.yml

7.1 Services

Service	Container	Port	Purpose
`api`	`protocol-guide-api`	3000	Express + tRPC API server
`web`	`protocol-guide-web`	8081	Expo web frontend
`dev`	`protocol-guide-dev`	3000, 8081	Full dev environment

7.2 Commands

# Start production-like stack
pnpm docker:up

# Start development mode (with hot reload)
pnpm docker:dev

# View logs
pnpm docker:logs

# Stop all
pnpm docker:down

# Rebuild images
pnpm docker:build

7.3 Health Checks

API container has built-in health check:

healthcheck:
  test: ["CMD", "wget", "--spider", "http://localhost:3000/api/health"]
  interval: 30s
  timeout: 10s
  retries: 3
  start_period: 10s

8. npm Scripts

8.1 Development

Script	Command	Purpose
`dev`	`concurrently dev:server dev:metro`	Start full dev environment
`dev:server`	`tsx watch server/_core/index.ts`	Start API with hot reload
`dev:metro`	`expo start --web --port 8081`	Start Expo web server

8.2 Build & Deploy

Script	Purpose
`build`	Build server with esbuild
`build:web`	Export Expo web, inject PWA meta, copy assets
`start`	Run production server

8.3 Testing

Script	Purpose
`test`	Run Vitest unit tests
`test:integration`	Run integration tests (single fork)
`test:e2e`	Run Playwright E2E tests
`test:e2e:ui`	Playwright with UI mode
`test:e2e:visual`	Visual regression tests
`test:all`	Vitest + Playwright

8.4 Database

Script	Purpose
`db:push`	Generate and run Drizzle migrations
`sitemap`	Generate sitemap.xml

8.5 Analysis

Script	Purpose
`analyze`	Build and analyze bundle sizes
`bench`	Run Vitest benchmarks
`bench:report`	Generate benchmark report

Environment Variables Reference

Required for GitHub Actions

Variable	Used By	Purpose
`CRON_SECRET`	drip-emails.yml	Auth for job endpoint
`RAILWAY_TOKEN`	ci.yml	Railway deployment
`NETLIFY_AUTH_TOKEN`	ci.yml	Netlify deployment
`NETLIFY_SITE_ID`	ci.yml	Netlify site identifier
`SENTRY_DSN`	ci.yml	Error tracking

Required for Import Scripts

Variable	Purpose
`SUPABASE_URL`	Supabase project URL
`SUPABASE_SERVICE_ROLE_KEY`	Admin access to Supabase
`GOOGLE_API_KEY`	Embedding generation (Gemini; Voyage removed 2026-03-24)

Monitoring & Alerts

GitHub Actions

View runs: https://github.com/<owner>/Protocol-Guide/actions
Failure notifications: GitHub email notifications
Health check failures: Creates GitHub issues

Production Health

Health endpoint: /api/health
Liveness endpoint: /api/live
Monitored every 15 minutes

Sentry

Error tracking enabled in production
DSN configured via SENTRY_DSN environment variable

Troubleshooting

Drip Emails Not Sending

Check GitHub Actions workflow run logs
Verify CRON_SECRET matches server config
Check drip_emails_sent table for duplicates
Verify user has email and correct createdAt

PDF Processing Failing

Check protocol_uploads table for status
Look for errorMessage in failed uploads
Verify GOOGLE_API_KEY is valid
Check PDF is accessible at fileUrl

Health Check Failures

Check GitHub issue for details
Verify Railway deployment is healthy
Check database connectivity
Review Sentry for errors

Adding New Automations

New Import Script

Copy existing script as template (e.g., import-la-county-local-pdfs.ts)
Update agency name, state code, URL patterns
Adjust PDF parsing for source format
Test with single PDF first
Run full import with monitoring

New Scheduled Job

Create job in server/jobs/
Add tRPC endpoint in router
Create GitHub Actions workflow with cron schedule
Add CRON_SECRET authentication
Document in this file

FilesExpand file tree

AUTOMATIONS.md

Latest commit

History