Jarvis is a lightweight personal assistant web app. The first module helps you read a water meter photo, review the detected value, and draft an email in Gmail.
- Docs index:
docs/README.md - OCR app logic flow:
docs/app-logic.md - Backend API guide:
docs/backend-api.md - OCR tuning playbook:
docs/ocr-tuning-playbook.md
- Upload a meter photo and preview it.
- OCR from a neural-ROI crop with conservative acceptance (unsupported OCR guesses are rejected to manual input).
- Auto-fill an email draft with the current date in Italian format.
- Open a Gmail draft or use a mailto fallback.
- Run a built-in OCR test set table with
Detected,Absolute Error, andFailure Reasoncolumns plus MAE/exact-match/no-read summary stats.
- Ensure Python 3 and Node.js are installed.
- Run the dev server:
npm run serveThen open http://localhost:8000.
If you also want to run Playwright checks, install JS dependencies once:
npm installYou can run a Python backend that detects the meter digit window using a fine-tuned pretrained model.
- Open a second terminal and set up backend dependencies:
cd backend
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtFor CPU-only environments (for example Vercel), install:
pip install -r requirements-cpu.txt- Train/fine-tune a model (copies best checkpoint to
backend/models/roi.pt):
python train_roi.py \
--data data/roi_dataset.yaml \
--base-model yolov8n.pt \
--rotation-angles 90,180,270,360 \
--heavy-augmentThe API default ROI checkpoint is pinned to backend/models/roi-rotaug-e30-640.pt.
To run with a newly trained checkpoint, set ROI_MODEL_PATH explicitly before starting the backend.
train_roi.py now enforces heavy augmentation + rotation expansion by default; weaker runs require explicit --allow-no-augment-policy.
Optional: train the per-cell digit classifier checkpoint:
python train_digit_classifier.py --device cpuFor dataset expansion/QA before retraining:
python plan_digit_expansion.py --target-train-per-digit 12 --priority-digits 4,5,6,9
python validate_digit_dataset.py- Start the API:
uvicorn app:app --host 127.0.0.1 --port 8001 --reloadBy default, the frontend calls http://127.0.0.1:8001/roi/detect and requires neural ROI detection before OCR.
The frontend can also call http://127.0.0.1:8001/digit/predict-cells when OCR_CONFIG.digitClassifier.enabled is set to true.
Check backend readiness with:
curl -s http://127.0.0.1:8001/healthRun Playwright checks for neural-ROI failure handling and OCR selection guard regressions:
npm run test:e2eGenerate a per-image ROI checkpoint comparison report (roi-rotaug-e30-640.pt vs roi.pt) with stage 5/6 debug snapshots:
npm run benchmark:roi-diffReport artifacts are written under output/roi-checkpoint-diff/<timestamp>/.
Per-image diff tables include selected OCR metadata (sourceLabel, method, preprocessMode) and stage 6 exports use the last 6. OCR input candidate frame from each debug session.
To benchmark with digit-classifier fallback enabled (gated to ocr-no-digits), run:
JARVIS_DIGIT_FALLBACK=1 npm run benchmark:roi-diffCI runs these tests on every pull request and on pushes to master.
index.html: UI layout.styles.css: Styling.app.js: Thin entrypoint that importssrc/main.js.src/main.js: UI orchestration and event wiring.src/ocr/: OCR pipeline and neural ROI integration.src/testset/: Manual OCR test-set runner.backend/: Optional FastAPI service for neural ROI and digit-classifier inference/training.AGENTS.md: Contributor guide.assets/: Static assets and example uploads.
- OCR runs fully in the browser using Tesseract.js.
- OCR now relies on neural ROI detection; if the backend is unavailable or ROI fails, the app asks for manual reading input.
- ROI word-pass defaults to raw strip input; stage
6. OCR input candidatemirrors the configured OCR input mode. - Edge-derived ROI strip candidates are enabled by default and can be toggled with
OCR_CONFIG.roiDeterministic.useEdgeCandidates. - Digit decoding can optionally use a backend classifier (
src/ocr/config.js->digitClassifier.enabled), which isfalseby default. - The selection layer is fail-safe: isolated edge-only single hits are rejected unless independently corroborated.
- Use the UI
Run test setaction plusnpm run test:e2efor OCR regressions before and after tuning. - The Gmail flow opens a draft; you always review and send manually.
- Use the EXIF
DateTimeOriginalvalue as the source of truth for the acquisition date. - Rename files to
meter_mmddyyyy(zero-padded) and keep the original extension. - If multiple images share the same date, keep one as-is and add numeric suffixes to the rest (e.g.,
_1,_2). - If EXIF is missing, prefer a known date from the filename or capture notes and document it.