Automatically extract structured data from receipt images using Claude AI and generate formatted Excel reports.
Status: Active Development — core features complete, enhancements in progress
- Drag & drop receipt images (JPG, PNG, PDF) into the web UI
- Claude API extracts vendor, date, subtotal, tax, tip, total, category, and more
- Cross-validates amounts (subtotal + tax + tip = total) with confidence scoring
- Flags low-confidence items for manual review
- Downloads everything as a formatted Excel file (All Receipts / Needs Review / Summary)
| Layer | Tech |
|---|---|
| Backend | Python, Flask |
| AI | Claude API (Sonnet 4) |
| Image Processing | Pillow, PyMuPDF |
| Excel | pandas, openpyxl |
| Frontend | HTML, Tailwind CSS, Vanilla JS |
git clone https://github.com/YOUR_USERNAME/expense-automation.git
cd receipt-scannerWindows:
install.bat
macOS / Linux:
chmod +x install.sh run.sh
./install.shCopy the example config and add your API key:
cp config.example.json config.json"api_key": "sk-ant-api03-your-actual-key-here"Or use a .env file:
ANTHROPIC_API_KEY=sk-ant-api03-your-actual-key-here
Windows: run.bat | macOS/Linux: ./run.sh
Open http://localhost:5000 in your browser.
receipt-scanner/
├── app.py # Flask server (routing, security, file handling)
├── processor.py # Receipt analysis engine (Claude API, parsing, validation)
├── excel_generator.py # Excel report generator (formatting, formulas, sheets)
├── config.example.json # Configuration template
├── requirements.txt
├── templates/
│ └── index.html
├── static/
│ ├── style.css
│ ├── animations.css
│ └── script.js
├── install.sh / install.bat
└── run.sh / run.bat
Receipt Analysis
- Structured data extraction via Claude API
- Supports JPEG, PNG, WebP, PDF (PDF uses PyMuPDF → pdf2image → raw PDF fallback chain)
- Auto EXIF rotation correction and image resizing
- Date normalization across multiple formats → MM/DD/YYYY
- Auto-categorization into 10 expense categories with fuzzy matching
Validation & Confidence
- Amount cross-validation (subtotal + tax + tip = total, $0.02 tolerance)
- Confidence-based auto/review classification (configurable thresholds)
- Required field validation and amount range checks
Security
- File extension + magic byte dual verification
- UUID-based filename sanitization
- Per-IP rate limiting
- Localhost-only binding
Excel Reports
- 3 sheets: All Receipts / Needs Review / Summary
- SUM formulas, auto-filter, freeze panes, zebra striping
- Category-level aggregation
- Enhanced image preprocessing (crop, contrast) to reduce token usage
- Low-res first pass → high-res retry on failure (2-pass strategy)
- Prompt caching for batch processing
- Response schema optimization to minimize output tokens
- Few-shot examples per receipt type
- Receipt type detection → type-specific prompts
- Improved handwriting recognition (tips, signatures)
- Multi-language receipt support
- Monthly/category spending visualization
- Trend analysis charts
- Inline editing for flagged items
- Processing history and statistics
- SQLite/PostgreSQL persistence
- User authentication
- Docker support
- CI/CD pipeline
| Format | Extensions | Max Size |
|---|---|---|
| JPEG | .jpg, .jpeg | 15MB |
| PNG | .png | 15MB |
| 15MB | ||
| WebP | .webp | 15MB |
Up to 100 files per batch.
| Volume | Est. Cost |
|---|---|
| 1 receipt | ~$0.02–0.04 |
| 100/month | ~$2–4 |
| 500/month | ~$10–20 |
Based on Claude Sonnet 4. Varies by image size and complexity. Design consideration: Cost-aware processing. The system minimizes token usage through image resizing, fallback logic, and response schema constraints.
| Issue | Solution |
|---|---|
python not found |
Install Python 3.9+ and add to PATH |
| API key error | Check config.json or .env file |
| Port 5000 in use | Auto-tries 5001, 5002, 8080, 8888 |
| ModuleNotFoundError | Re-run install.bat or install.sh |
- Runs on localhost only (no external access)
config.jsonis in.gitignore— never commit it- Uploaded images are auto-deleted after processing
- Images are sent to Anthropic's API over HTTPS
MIT
Powered by Claude AI (Anthropic)