Production-ready medical document OCR and PHI anonymization using Google Cloud.
- ✅ High-accuracy OCR - Google Cloud Document AI
- ✅ PHI Detection - Automatic detection of patient data (DLP API)
- ✅ HIPAA Compliance - Anonymize PHI with one click
- ✅ Production-ready - Enterprise-grade, scalable
- ✅ Smart extraction - Structured medical data (patient info, meds, labs, diagnoses)
- ✅ Cost-effective - Pay per use, ~$1.50 per 1000 documents
npm install⚡ AUTOMATED (Recommended) - One command:
./setup-google-cloud.shThis script automatically:
- ✅ Creates Google Cloud project
- ✅ Enables all required APIs
- ✅ Creates Document AI processor
- ✅ Sets up service account
- ✅ Generates
.env.local
Time: ~5 minutes
📝 MANUAL - Step by step guide:
Follow GOOGLE_CLOUD_SETUP.md for manual setup
npm run devUpload Document (PDF/Image)
↓
1. Try direct text extraction (PDF only) - Fast & Free
↓
2. If scanned → Document AI OCR - High accuracy
↓
3. Extract structured medical data
↓
4. Optional: Detect & anonymize PHI (DLP)
↓
Download JSON
The DLP API detects and anonymizes:
- 👤 Patient names
- 📅 Dates of birth
- 🏥 Medical record numbers (MRN)
- 📞 Phone numbers
- 📧 Email addresses
- 🏠 Addresses
- 🆔 Social Security Numbers
- And more...
// Upload PDF/image → Get structured data
{
"patient_info": {
"name": "John Doe",
"mrn": "MRN-123456",
"date_of_birth": "1980-05-15"
},
"medications": [...],
"lab_results": [...]
}// Same upload, but with "Anonymize PHI" enabled
{
"patient_info": {
"name": "[PERSON_NAME]",
"mrn": "[MEDICAL_RECORD_NUMBER]",
"date_of_birth": "[DATE_OF_BIRTH]"
},
"phi_anonymization": {
"replacements": 15,
"stats": {
"PERSON_NAME": 3,
"DATE_OF_BIRTH": 2,
"MEDICAL_RECORD_NUMBER": 1,
...
}
}
}Extract data from medical documents.
Request:
FormData {
file: File, // PDF or image
mode: 'structured' | 'raw', // Extraction mode
anonymize: 'true' | 'false' // Enable PHI anonymization
}Response:
{
success: true,
mode: 'structured',
data: {...}, // Extracted medical data
method: 'document-ai-ocr',
phi_anonymization: { // If anonymize=true
found: 15,
replacements: 15,
stats: {...}
}
}| Service | Free Tier | After Free Tier |
|---|---|---|
| Document AI | 1,000 pages/month | $1.50 per 1,000 pages |
| DLP API | 1GB text/month | $0.20 per GB |
Example costs:
- 1,000 docs/month: ~$2-5/month
- 10,000 docs/month: ~$15-20/month
- 100,000 docs/month: ~$150-200/month
- ✅ Service account authentication
- ✅ Never expose API keys to frontend
- ✅ DLP API for PHI detection
- ✅ HIPAA-compliant infrastructure (Google Cloud)
- ✅ Encryption at rest and in transit
Never commit:
google-cloud-key.json.env.local
See GOOGLE_CLOUD_SETUP.md for common issues.
Quick fixes:
# Check environment variables
cat .env.local
# Verify service account key exists
ls -la google-cloud-key.json
# Test API access
gcloud auth application-default print-access-tokengcloud run deploy clinera-mdt \
--source . \
--platform managed \
--region us-central1 \
--set-env-vars GOOGLE_CLOUD_PROJECT_ID=your-project-id \
--set-env-vars DOCUMENT_AI_PROCESSOR_ID=your-processor-idgcloud app deployclinera-mdt-pdf-extraction-poc/
├── src/
│ ├── app/
│ │ ├── api/extract/route.ts # Main API endpoint
│ │ └── page.tsx # UI
│ └── lib/
│ ├── google-cloud-config.ts # Config
│ ├── document-ai-service.ts # Document AI integration
│ └── dlp-service.ts # DLP API integration
├── .env.example # Environment template
├── GOOGLE_CLOUD_SETUP.md # Complete setup guide
└── package.json
Private - Clinera MDT
Built with ❤️ for healthcare professionals