Lightweight CLI to extract invoice totals from a directory of PDFs and compute sums.
Approach:
- Prefer text extraction from PDF when available.
- Fallback to OCR for scanned PDFs.
- Optional LLM-based extraction as the last resort.
- Persist results as JSONL; compute totals using code (
decimal.Decimal).
- Create a virtualenv and install deps
python3 -m venv .venv
. .venv/bin/activate
pip install -r requirements.txt- Configure LLM fallback (optional)
- Copy
.env.templateto.env - Fill in
OPENAI_API_KEY
cp .env.template .env
$EDITOR .envNotes:
.envis ignored by git (safe to keep keys locally).- If you don't want LLM fallback, keep
INVOICE_SUM_LLM=off.
python -m invoice_sum --dir /path/to/invoices --out outEnable LLM fallback explicitly:
python -m invoice_sum --dir /path/to/invoices --out out --llm onout/results.jsonl: one JSON per processed PDF (append-only)out/invoices.csv: latest successful extraction per fileout/summary.json: totals + stats
MVP scaffold.