Skip to content

Latest commit

 

History

History
264 lines (180 loc) · 6.11 KB

File metadata and controls

264 lines (180 loc) · 6.11 KB

Google Cloud Vision API Setup Guide

This guide explains how to set up Google Cloud Vision API for OCR on scanned protocol PDFs.

Why Vision API?

Problem: Orange County's 93 PDFs are scanned images (photos of paper documents) Solution: Google Vision API provides high-accuracy OCR that handles poor-quality scans

Cost: $1.50 per 1,000 pages

  • Orange County: 93 PDFs × ~3 pages = 279 pages = $0.42 one-time
  • Future LEMSAs with scanned PDFs: ~$0.30-0.50 per LEMSA

Benefits:

  • 95%+ accuracy even on low-resolution scans
  • 10-20x faster than Tesseract.js
  • Handles multi-column layouts
  • Detects text orientation automatically

Setup Steps (10 minutes)

1. Create Google Cloud Project

  1. Go to Google Cloud Console
  2. Click "Select a project" → "New Project"
  3. Project name: protocol-guide-ocr
  4. Click "Create"

2. Enable Vision API

  1. Go to Vision API Library
  2. Select your project (protocol-guide-ocr)
  3. Click "Enable"
  4. Wait 1-2 minutes for API to activate

3. Create Service Account

  1. Go to Service Accounts
  2. Click "Create Service Account"
  3. Name: protocol-guide-ocr-bot
  4. Description: OCR service for protocol PDF ingestion
  5. Click "Create and Continue"

4. Grant Permissions

  1. Role: Select "Cloud Vision AI Service Agent"
  2. Click "Continue"
  3. Click "Done"

5. Create JSON Key

  1. Click on the service account you just created
  2. Go to "Keys" tab
  3. Click "Add Key" → "Create new key"
  4. Key type: JSON
  5. Click "Create"
  6. Save the downloaded JSON file as /Users/tanner-osterkamp/Protocol-Guide/google-cloud-key.json

6. Update .env File

The .env file has been pre-configured with:

GOOGLE_APPLICATION_CREDENTIALS=./google-cloud-key.json

Verify the file exists:

ls -la google-cloud-key.json

7. Add to .gitignore

The key file is already in .gitignore to prevent accidental commits:

google-cloud-key.json

Testing

Test Vision API on Single PDF

npx tsx scripts/test-oc-ocr-vision.ts

Expected output:

Testing Google Vision API OCR...
  Vision API: Processing PDF...
  Vision API: Extracted 2,847 characters from 2 pages

=== EXTRACTION RESULTS ===
Protocol: SO-M-15
Title: Allergic Reaction / Anaphylaxis
Content: [Full protocol text extracted]

Compare Vision API vs Tesseract.js

Tesseract.js result (current):

  • 2-page PDF → 84 characters (gibberish)
  • Accuracy: ~10%

Vision API result (expected):

  • 2-page PDF → 2,000-3,000 characters (clean text)
  • Accuracy: 95%+

Cost Monitoring

Current Usage (After Orange County)

  • Pages processed: 279 (93 PDFs × ~3 pages)
  • Cost: ~$0.42
  • API calls: 93 (one per PDF)

Monthly Estimates

If ingesting 10 new LEMSAs per month with scanned PDFs:

  • Pages: ~1,000 per month
  • Cost: ~$1.50 per month
  • Annual: ~$18

Set Billing Alert

  1. Go to Billing
  2. Click "Budgets & alerts"
  3. Create budget: $5/month
  4. Set alert at 50%, 90%, 100%

Security Best Practices

1. Service Account Permissions

Correct: Cloud Vision AI Service Agent (read-only access to Vision API)
Avoid: Owner, Editor (overly broad permissions)

2. Key Rotation

Rotate service account keys every 90 days:

# Revoke old key in Google Cloud Console
# Create new key
# Update google-cloud-key.json
# Restart ingestion services

3. Key Storage

Local development: ./google-cloud-key.json (gitignored)
Railway production: Upload key as environment variable:

# Minify JSON (remove whitespace)
cat google-cloud-key.json | jq -c > minified-key.json

# Set in Railway
railway variables set GOOGLE_APPLICATION_CREDENTIALS="$(cat minified-key.json)"

Troubleshooting

Error: "Could not load the default credentials"

Cause: GOOGLE_APPLICATION_CREDENTIALS not set or file not found

Fix:

export GOOGLE_APPLICATION_CREDENTIALS="./google-cloud-key.json"
npx tsx scripts/ingest-ca-protocols.ts --lemsa "Orange"

Error: "Vision API has not been enabled"

Cause: API not enabled for your project

Fix: Go to Vision API Library and click "Enable"

Error: "The caller does not have permission"

Cause: Service account lacks Vision API permissions

Fix:

  1. Go to IAM
  2. Find your service account
  3. Click "Edit"
  4. Add role: "Cloud Vision AI Service Agent"

Error: "Quota exceeded"

Cause: Hit free tier limit (1,000 images/month)

Fix: Enable billing or wait until next month

Free tier: 1,000 images/month free
After free tier: $1.50 per 1,000 images


Production Deployment (Railway)

Option 1: Environment Variable (Recommended)

# Minify JSON key
cat google-cloud-key.json | jq -c > minified-key.json

# Upload to Railway
railway link protocol-guide-production
railway variables set GOOGLE_APPLICATION_CREDENTIALS="$(cat minified-key.json)"

# Redeploy
railway up

Option 2: Application Default Credentials

If Railway is hosted on Google Cloud:

railway variables set GOOGLE_CLOUD_PROJECT=protocol-guide-ocr

Fallback Behavior

The OCR extractor automatically falls back to Tesseract.js if Vision API is unavailable:

  1. Vision API configured? → Try Vision API first
  2. Vision API fails? → Fall back to Tesseract.js
  3. Tesseract fails? → Return error

Log example:

  Using Google Vision API for OCR...
  Vision API failed: Invalid authentication credentials
  Falling back to Tesseract.js...
  OCR: Processing 2 pages...

Next Steps

After setup:

  1. Test: npx tsx scripts/test-oc-ocr-vision.ts
  2. Ingest Orange County: npx tsx scripts/ingest-ca-protocols.ts --lemsa "Orange"
  3. Verify: Check database for 600-800 new Orange County chunks

Last Updated: 2026-02-18
Status: Ready for testing