Jarvis

Jarvis is a lightweight personal assistant web app. The first module helps you read a water meter photo, review the detected value, and draft an email in Gmail.

Documentation

Docs index: docs/README.md
OCR app logic flow: docs/app-logic.md
Backend API guide: docs/backend-api.md
OCR tuning playbook: docs/ocr-tuning-playbook.md

Features

Upload a meter photo and preview it.
OCR from a neural-ROI crop with conservative acceptance (unsupported OCR guesses are rejected to manual input).
Auto-fill an email draft with the current date in Italian format.
Open a Gmail draft or use a mailto fallback.
Run a built-in OCR test set table with Detected, Absolute Error, and Failure Reason columns plus MAE/exact-match/no-read summary stats.

Local Development

Ensure Python 3 and Node.js are installed.
Run the dev server:

npm run serve

Then open http://localhost:8000.

If you also want to run Playwright checks, install JS dependencies once:

npm install

Optional Neural ROI Backend (recommended)

You can run a Python backend that detects the meter digit window using a fine-tuned pretrained model.

Open a second terminal and set up backend dependencies:

cd backend
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

For CPU-only environments (for example Vercel), install:

pip install -r requirements-cpu.txt

Train/fine-tune a model (copies best checkpoint to backend/models/roi.pt):

python train_roi.py \
  --data data/roi_dataset.yaml \
  --base-model yolov8n.pt \
  --rotation-angles 90,180,270,360 \
  --heavy-augment

The API default ROI checkpoint is pinned to backend/models/roi-rotaug-e30-640.pt. To run with a newly trained checkpoint, set ROI_MODEL_PATH explicitly before starting the backend. train_roi.py now enforces heavy augmentation + rotation expansion by default; weaker runs require explicit --allow-no-augment-policy.

Optional: train the per-cell digit classifier checkpoint:

python train_digit_classifier.py --device cpu

For dataset expansion/QA before retraining:

python plan_digit_expansion.py --target-train-per-digit 12 --priority-digits 4,5,6,9
python validate_digit_dataset.py

Start the API:

uvicorn app:app --host 127.0.0.1 --port 8001 --reload

By default, the frontend calls http://127.0.0.1:8001/roi/detect and requires neural ROI detection before OCR. The frontend can also call http://127.0.0.1:8001/digit/predict-cells when OCR_CONFIG.digitClassifier.enabled is set to true. Check backend readiness with:

curl -s http://127.0.0.1:8001/health

E2E Tests

Run Playwright checks for neural-ROI failure handling and OCR selection guard regressions:

npm run test:e2e

Generate a per-image ROI checkpoint comparison report (roi-rotaug-e30-640.pt vs roi.pt) with stage 5/6 debug snapshots:

npm run benchmark:roi-diff

Report artifacts are written under output/roi-checkpoint-diff/<timestamp>/. Per-image diff tables include selected OCR metadata (sourceLabel, method, preprocessMode) and stage 6 exports use the last 6. OCR input candidate frame from each debug session. To benchmark with digit-classifier fallback enabled (gated to ocr-no-digits), run:

JARVIS_DIGIT_FALLBACK=1 npm run benchmark:roi-diff

CI runs these tests on every pull request and on pushes to master.

File Overview

index.html: UI layout.
styles.css: Styling.
app.js: Thin entrypoint that imports src/main.js.
src/main.js: UI orchestration and event wiring.
src/ocr/: OCR pipeline and neural ROI integration.
src/testset/: Manual OCR test-set runner.
backend/: Optional FastAPI service for neural ROI and digit-classifier inference/training.
AGENTS.md: Contributor guide.
assets/: Static assets and example uploads.

Notes

OCR runs fully in the browser using Tesseract.js.
OCR now relies on neural ROI detection; if the backend is unavailable or ROI fails, the app asks for manual reading input.
ROI word-pass defaults to raw strip input; stage 6. OCR input candidate mirrors the configured OCR input mode.
Edge-derived ROI strip candidates are enabled by default and can be toggled with OCR_CONFIG.roiDeterministic.useEdgeCandidates.
Digit decoding can optionally use a backend classifier (src/ocr/config.js -> digitClassifier.enabled), which is false by default.
The selection layer is fail-safe: isolated edge-only single hits are rejected unless independently corroborated.
Use the UI Run test set action plus npm run test:e2e for OCR regressions before and after tuning.
The Gmail flow opens a draft; you always review and send manually.

Asset Naming (Meter Images)

Use the EXIF DateTimeOriginal value as the source of truth for the acquisition date.
Rename files to meter_mmddyyyy (zero-padded) and keep the original extension.
If multiple images share the same date, keep one as-is and add numeric suffixes to the rest (e.g., _1, _2).
If EXIF is missing, prefer a known date from the filename or capture notes and document it.

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
.github/workflows		.github/workflows
assets		assets
backend		backend
docs		docs
scripts		scripts
src		src
tests/e2e		tests/e2e
.gitignore		.gitignore
AGENTS.md		AGENTS.md
LICENSE		LICENSE
README.md		README.md
app.js		app.js
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
playwright.config.js		playwright.config.js
styles.css		styles.css

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Jarvis

Documentation

Features

Local Development

Optional Neural ROI Backend (recommended)

E2E Tests

File Overview

Notes

Asset Naming (Meter Images)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Jarvis

Documentation

Features

Local Development

Optional Neural ROI Backend (recommended)

E2E Tests

File Overview

Notes

Asset Naming (Meter Images)

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages