Skip to content

AndreaPi/Jarvis

Repository files navigation

Jarvis

Jarvis is a lightweight personal assistant web app. The first module helps you read a water meter photo, review the detected value, and draft an email in Gmail.

Documentation

Features

  • Upload a meter photo and preview it.
  • OCR from a neural-ROI crop with conservative acceptance (unsupported OCR guesses are rejected to manual input).
  • Auto-fill an email draft with the current date in Italian format.
  • Open a Gmail draft or use a mailto fallback.
  • Run a built-in OCR test set table with Detected, Absolute Error, and Failure Reason columns plus MAE/exact-match/no-read summary stats.

Local Development

  1. Ensure Python 3 and Node.js are installed.
  2. Run the dev server:
npm run serve

Then open http://localhost:8000.

If you also want to run Playwright checks, install JS dependencies once:

npm install

Optional Neural ROI Backend (recommended)

You can run a Python backend that detects the meter digit window using a fine-tuned pretrained model.

  1. Open a second terminal and set up backend dependencies:
cd backend
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

For CPU-only environments (for example Vercel), install:

pip install -r requirements-cpu.txt
  1. Train/fine-tune a model (copies best checkpoint to backend/models/roi.pt):
python train_roi.py \
  --data data/roi_dataset.yaml \
  --base-model yolov8n.pt \
  --rotation-angles 90,180,270,360 \
  --heavy-augment

The API default ROI checkpoint is pinned to backend/models/roi-rotaug-e30-640.pt. To run with a newly trained checkpoint, set ROI_MODEL_PATH explicitly before starting the backend. train_roi.py now enforces heavy augmentation + rotation expansion by default; weaker runs require explicit --allow-no-augment-policy.

Optional: train the per-cell digit classifier checkpoint:

python train_digit_classifier.py --device cpu

For dataset expansion/QA before retraining:

python plan_digit_expansion.py --target-train-per-digit 12 --priority-digits 4,5,6,9
python validate_digit_dataset.py
  1. Start the API:
uvicorn app:app --host 127.0.0.1 --port 8001 --reload

By default, the frontend calls http://127.0.0.1:8001/roi/detect and requires neural ROI detection before OCR. The frontend can also call http://127.0.0.1:8001/digit/predict-cells when OCR_CONFIG.digitClassifier.enabled is set to true. Check backend readiness with:

curl -s http://127.0.0.1:8001/health

E2E Tests

Run Playwright checks for neural-ROI failure handling and OCR selection guard regressions:

npm run test:e2e

Generate a per-image ROI checkpoint comparison report (roi-rotaug-e30-640.pt vs roi.pt) with stage 5/6 debug snapshots:

npm run benchmark:roi-diff

Report artifacts are written under output/roi-checkpoint-diff/<timestamp>/. Per-image diff tables include selected OCR metadata (sourceLabel, method, preprocessMode) and stage 6 exports use the last 6. OCR input candidate frame from each debug session. To benchmark with digit-classifier fallback enabled (gated to ocr-no-digits), run:

JARVIS_DIGIT_FALLBACK=1 npm run benchmark:roi-diff

CI runs these tests on every pull request and on pushes to master.

File Overview

  • index.html: UI layout.
  • styles.css: Styling.
  • app.js: Thin entrypoint that imports src/main.js.
  • src/main.js: UI orchestration and event wiring.
  • src/ocr/: OCR pipeline and neural ROI integration.
  • src/testset/: Manual OCR test-set runner.
  • backend/: Optional FastAPI service for neural ROI and digit-classifier inference/training.
  • AGENTS.md: Contributor guide.
  • assets/: Static assets and example uploads.

Notes

  • OCR runs fully in the browser using Tesseract.js.
  • OCR now relies on neural ROI detection; if the backend is unavailable or ROI fails, the app asks for manual reading input.
  • ROI word-pass defaults to raw strip input; stage 6. OCR input candidate mirrors the configured OCR input mode.
  • Edge-derived ROI strip candidates are enabled by default and can be toggled with OCR_CONFIG.roiDeterministic.useEdgeCandidates.
  • Digit decoding can optionally use a backend classifier (src/ocr/config.js -> digitClassifier.enabled), which is false by default.
  • The selection layer is fail-safe: isolated edge-only single hits are rejected unless independently corroborated.
  • Use the UI Run test set action plus npm run test:e2e for OCR regressions before and after tuning.
  • The Gmail flow opens a draft; you always review and send manually.

Asset Naming (Meter Images)

  • Use the EXIF DateTimeOriginal value as the source of truth for the acquisition date.
  • Rename files to meter_mmddyyyy (zero-padded) and keep the original extension.
  • If multiple images share the same date, keep one as-is and add numeric suffixes to the rest (e.g., _1, _2).
  • If EXIF is missing, prefer a known date from the filename or capture notes and document it.

About

My personal home assistant

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors