Skip to content

minwoo-data/expense-automation

Repository files navigation

Receipt Scanner

Automatically extract structured data from receipt images using Claude AI and generate formatted Excel reports.

Status: Active Development — core features complete, enhancements in progress


What It Does

  1. Drag & drop receipt images (JPG, PNG, PDF) into the web UI
  2. Claude API extracts vendor, date, subtotal, tax, tip, total, category, and more
  3. Cross-validates amounts (subtotal + tax + tip = total) with confidence scoring
  4. Flags low-confidence items for manual review
  5. Downloads everything as a formatted Excel file (All Receipts / Needs Review / Summary)

Tech Stack

Layer Tech
Backend Python, Flask
AI Claude API (Sonnet 4)
Image Processing Pillow, PyMuPDF
Excel pandas, openpyxl
Frontend HTML, Tailwind CSS, Vanilla JS

Quick Start

1. Clone & Install

git clone https://github.com/YOUR_USERNAME/expense-automation.git
cd receipt-scanner

Windows:

install.bat

macOS / Linux:

chmod +x install.sh run.sh
./install.sh

2. Set API Key

Copy the example config and add your API key:

cp config.example.json config.json
"api_key": "sk-ant-api03-your-actual-key-here"

Or use a .env file:

ANTHROPIC_API_KEY=sk-ant-api03-your-actual-key-here

3. Run

Windows: run.bat | macOS/Linux: ./run.sh

Open http://localhost:5000 in your browser.


Project Structure

receipt-scanner/
├── app.py                  # Flask server (routing, security, file handling)
├── processor.py            # Receipt analysis engine (Claude API, parsing, validation)
├── excel_generator.py      # Excel report generator (formatting, formulas, sheets)
├── config.example.json     # Configuration template
├── requirements.txt
├── templates/
│   └── index.html
├── static/
│   ├── style.css
│   ├── animations.css
│   └── script.js
├── install.sh / install.bat
└── run.sh / run.bat

Features

Receipt Analysis

  • Structured data extraction via Claude API
  • Supports JPEG, PNG, WebP, PDF (PDF uses PyMuPDF → pdf2image → raw PDF fallback chain)
  • Auto EXIF rotation correction and image resizing
  • Date normalization across multiple formats → MM/DD/YYYY
  • Auto-categorization into 10 expense categories with fuzzy matching

Validation & Confidence

  • Amount cross-validation (subtotal + tax + tip = total, $0.02 tolerance)
  • Confidence-based auto/review classification (configurable thresholds)
  • Required field validation and amount range checks

Security

  • File extension + magic byte dual verification
  • UUID-based filename sanitization
  • Per-IP rate limiting
  • Localhost-only binding

Excel Reports

  • 3 sheets: All Receipts / Needs Review / Summary
  • SUM formulas, auto-filter, freeze panes, zebra striping
  • Category-level aggregation

Roadmap

Token Optimization

  • Enhanced image preprocessing (crop, contrast) to reduce token usage
  • Low-res first pass → high-res retry on failure (2-pass strategy)
  • Prompt caching for batch processing
  • Response schema optimization to minimize output tokens

Prompt Engineering

  • Few-shot examples per receipt type
  • Receipt type detection → type-specific prompts
  • Improved handwriting recognition (tips, signatures)
  • Multi-language receipt support

Dashboard

  • Monthly/category spending visualization
  • Trend analysis charts
  • Inline editing for flagged items
  • Processing history and statistics

Infrastructure

  • SQLite/PostgreSQL persistence
  • User authentication
  • Docker support
  • CI/CD pipeline

Supported Files

Format Extensions Max Size
JPEG .jpg, .jpeg 15MB
PNG .png 15MB
PDF .pdf 15MB
WebP .webp 15MB

Up to 100 files per batch.


Estimated API Cost

Volume Est. Cost
1 receipt ~$0.02–0.04
100/month ~$2–4
500/month ~$10–20

Based on Claude Sonnet 4. Varies by image size and complexity. Design consideration: Cost-aware processing. The system minimizes token usage through image resizing, fallback logic, and response schema constraints.


Troubleshooting

Issue Solution
python not found Install Python 3.9+ and add to PATH
API key error Check config.json or .env file
Port 5000 in use Auto-tries 5001, 5002, 8080, 8888
ModuleNotFoundError Re-run install.bat or install.sh

Security Notes

  • Runs on localhost only (no external access)
  • config.json is in .gitignore — never commit it
  • Uploaded images are auto-deleted after processing
  • Images are sent to Anthropic's API over HTTPS

License

MIT


Powered by Claude AI (Anthropic)

About

Automate expense tracking - from receipt scanning to statement reconciliation.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors