AndreaPi · AndreaPi · Mar 4, 2026 · Feb 5, 2026 · Feb 10, 2026 · Feb 10, 2026
diff --git a/.github/workflows/e2e.yml b/.github/workflows/e2e.yml
@@ -0,0 +1,38 @@
+name: E2E
+
+on:
+  pull_request:
+  push:
+    branches:
+      - master
+
+jobs:
+  playwright:
+    runs-on: ubuntu-latest
+    timeout-minutes: 20
+    steps:
+      - name: Checkout
+        uses: actions/checkout@v4
+
+      - name: Setup Node.js
+        uses: actions/setup-node@v4
+        with:
+          node-version: 20
+          cache: npm
+
+      - name: Install dependencies
+        run: npm ci
+
+      - name: Install Playwright browsers
+        run: npx playwright install --with-deps chromium
+
+      - name: Run e2e tests
+        run: npm run test:e2e -- --project=chromium
+
+      - name: Upload Playwright report
+        if: always()
+        uses: actions/upload-artifact@v4
+        with:
+          name: playwright-report
+          path: playwright-report
+          retention-days: 7
diff --git a/.gitignore b/.gitignore
@@ -1,21 +1,32 @@
-# Node
+*:Zone.Identifier
 node_modules/
-
-# Logs
-npm-debug.log*
-yarn-debug.log*
-yarn-error.log*
-pnpm-debug.log*
-
-# OS files
-.DS_Store
-Thumbs.db
-
-# Env files
+output/
+assets/*.jpg
+assets/*.jpeg
+assets/*.png
+assets/*.JPG
+assets/*.JPEG
+assets/*.PNG
+.venv/
+backend/.venv/
+__pycache__/
+*.pyc
+backend/__pycache__/
+backend/runs/
+backend/models/*.pt
+backend/models/*.onnx
+backend/yolov8*.pt
+/yolov8*.pt
+backend/data/roi_dataset/labels/*.cache
+backend/data/roi_dataset/previews/
+backend/data/roi_dataset/qa_previews/
+backend/data/roi_dataset/roi_boxes.json
 .env
-.env.local
-.env.*.local
 
-# Build output
-coverage/
-dist/
+# Playwright CLI generated artifacts
+.playwright-cli/
+playwright-report/
+test-results/
+
+# Local parking area for unrelated files
+AOB/
diff --git a/AGENTS.md b/AGENTS.md
@@ -3,27 +3,53 @@
 ## Project Structure & Module Organization
 - `index.html`: Single-page UI layout and content.
 - `styles.css`: Global styles and visual system.
-- `app.js`: Client-side logic (OCR flow, email draft generation).
+- `app.js`: Thin module entrypoint that imports `src/main.js`.
+- `src/main.js`: UI orchestration and event wiring.
+- `src/ocr/`: Neural-ROI-first OCR pipeline with strip-first decoding and selection safeguards.
+- `src/email/`: Email draft generation and link helpers.
+- `src/testset/`: Manual test-set runner logic.
+- `src/debug/`: Debug overlay rendering helpers.
+- `backend/`: Optional FastAPI service for neural ROI + digit classifier inference and training scripts.
+- `backend/build_digit_dataset.py`: Export strip/cell OCR datasets + QA previews from ROI labels.
+- `backend/generate_synthetic_digit_dataset.py`: Build synthetic train-only digit sections (direct cell augmentation + optional composed windows re-split equispaced).
+- `backend/plan_digit_expansion.py`: Generate prioritized capture plan for underrepresented digits.
+- `backend/validate_digit_dataset.py`: Validate manifest consistency and QA preview coverage.
+- `backend/train_digit_classifier.py`: Train per-cell digit classifier checkpoint.
 - `package.json`: Local dev scripts.
 - `README.md`: Project overview and setup notes.
 - `assets/`: Static assets and example uploads.
-- `assets/meter_13012026.jpg`: Example upload asset.
 
 ## Build, Test, and Development Commands
 - `npm run serve`: Start a simple local web server on port 8000.
 - `npm run dev`: Alias of `npm run serve`.
+- `cd backend && python3 -m venv .venv && source .venv/bin/activate && pip install -r requirements.txt`: Backend setup.
+- `cd backend && source .venv/bin/activate && python train_roi.py --data data/roi_dataset.yaml --base-model yolov8n.pt --rotation-angles 90,180,270,360 --heavy-augment`: Fine-tune pretrained ROI detector with enforced augmentation policy.
+- `cd backend && source .venv/bin/activate && python build_digit_dataset.py --clean`: Rebuild digit strip/cell exports and QA previews.
+- `cd backend && source .venv/bin/activate && python validate_digit_dataset.py`: Validate dataset/manifests before training.
+- `cd backend && source .venv/bin/activate && python generate_synthetic_digit_dataset.py --clean --direct-per-real 6 --compose-window-count 180`: Generate synthetic train-only digit sections from real train labels.
+- `cd backend && source .venv/bin/activate && python plan_digit_expansion.py --target-train-per-digit 12 --priority-digits 4,5,6,9`: Refresh targeted capture checklist.
+- `cd backend && source .venv/bin/activate && python train_digit_classifier.py --device cpu`: Train per-cell digit classifier model (real-only).
+- `cd backend && source .venv/bin/activate && python train_digit_classifier.py --device cpu --synthetic-root data/digit_dataset/sections_synthetic --synthetic-target-ratio 2.0`: Train on mixed real + synthetic train split while keeping val/test real-only.
+- `cd backend && source .venv/bin/activate && uvicorn app:app --host 127.0.0.1 --port 8001 --reload`: Run neural ROI API.
 
-Open `http://localhost:8000` after running a serve command.
+Open `http://localhost:8000` after running a serve command. Backend endpoints default to `http://127.0.0.1:8001/roi/detect` and `http://127.0.0.1:8001/digit/predict-cells`.
 
 ## Coding Style & Naming Conventions
 - Use 2-space indentation in HTML/CSS/JS.
 - Keep files ASCII-only unless there is a strong reason for Unicode.
 - Use descriptive, lower-case IDs and class names (e.g., `photo-input`, `module-grid`).
-- Prefer clear, small functions in `app.js` and avoid deep nesting.
+- Prefer clear, small functions in `src/` modules and avoid deep nesting.
 
 ## Testing Guidelines
-- No automated tests are configured.
-- Manual checks: upload image, run OCR, verify email draft fields, and confirm Gmail draft link.
+- Automated browser tests are configured with Playwright.
+- `npm run test:e2e`: Runs `tests/e2e/neural-roi.spec.js` (neural ROI failure handling + ROI geometry + strip-only OCR behavior).
+- CI: `.github/workflows/e2e.yml` runs on each pull request and on pushes to `master`.
+- Frontend manual checks: upload image, run OCR, verify email draft fields, and confirm Gmail draft link.
+- OCR test-set checks: run "Run test set" and inspect `MAE`, `Exact Match`, `No-read`, `Failure Reason`, and debug stages.
+- Backend sanity checks: `GET /health` and confirm `ready: true`, `roi_ready: true`, and expected `model_path`.
+- Prefer running the test set from UI with debug overlay enabled.
+- Before committing OCR changes, run both `npm run test:e2e` and the UI "Run test set".
+- ROI training policy: always use heavy augmentation and rotation expansion (`90,180,270,360`). `train_roi.py` enforces this by default and only allows weaker runs with `--allow-no-augment-policy`.
 
 ## Commit & Pull Request Guidelines
 - No commit message convention is established in this repo.
@@ -33,3 +59,61 @@ Open `http://localhost:8000` after running a serve command.
 ## Security & Configuration Tips
 - The Gmail draft flow opens a client-side draft; no credentials are stored in code.
 - OCR runs in the browser; avoid adding API keys to the client without a secure proxy.
+- Backend is intended for local use; keep host/CORS scoped to localhost unless explicitly deploying.
+
+## IMPORTANT
+- When using Playwright in this environment, global `playwright-cli` may be more reliable than the wrapper if npm network is flaky.
+
+## OCR Working State
+
+- App + backend run locally on `127.0.0.1:8000` and `127.0.0.1:8001`.
+- Neural ROI is mandatory in the frontend OCR flow (heuristic ROI fallback removed).
+- On neural ROI failure, the UI shows an explicit reason and asks for manual measurement input.
+- Backend default ROI model is pinned to `backend/models/roi-rotaug-e30-640.pt` (override with `ROI_MODEL_PATH`).
+- `train_roi.py` enforces augmentation policy by default: heavy online augmentation + rotation expansion `90,180,270,360`.
+- Digit-classifier inference is optional behind `OCR_CONFIG.digitClassifier.enabled` (default `false`).
+- Backend serves ROI + digit endpoints and reports readiness via `GET /health`.
+- Test-set table includes `Detected`, `Absolute Error`, `Failure Reason`, and `Result`.
+- Frontend OCR branch evaluation is strip-only (word-pass + sparse scan); the 4-cell refine stage is removed from the active pipeline.
+- ROI word-pass defaults to raw candidate input (`roiDeterministic.wordPassModes: ['raw']`); debug stage `6. OCR input candidate` mirrors this mode.
+- `roiDeterministic.minWordPassHits` is `1`, but isolated edge-only single hits are rejected unless corroborated by non-edge evidence or very strong per-cell confidence.
+- Edge-derived candidate generation is toggleable via `roiDeterministic.useEdgeCandidates` (default `true`) for controlled A/B experiments.
+- Current local benchmark set has `15` images.
+- Historical checkpoint comparison (March 2, 2026, fallback `OFF`, 14-image snapshot):
+  - `roi-rotaug-e30-640.pt` (default pinned): exact-match `0/14`, failure mix `ocr-no-digits` (7), `mismatch` (6), `no-detection` (1).
+  - `roi.pt` (challenger): exact-match `0/14`, failure mix `ocr-no-digits` (10), `mismatch` (4), `no-detection` (0).
+- Automated diff workflow is available via `npm run benchmark:roi-diff` (recent artifacts: `output/roi-checkpoint-diff/20260303-194206-fallback-off/roi-diff-report.md`).
+- Gated digit-classifier fallback is implemented in pipeline but remains disabled by default (`digitClassifier.enabled: false`).
+- Historical fallback benchmark (March 2, 2026, 14-image snapshot):
+  - Fallback `OFF` (`output/roi-checkpoint-diff/20260302-083324-fallback-off`): baseline `mismatch` 6 / `ocr-no-digits` 7; challenger `mismatch` 4 / `ocr-no-digits` 10.
+  - Fallback `ON` (`output/roi-checkpoint-diff/20260302-083529-fallback-on`): baseline `mismatch` 10 / `ocr-no-digits` 3; challenger `mismatch` 13 / `ocr-no-digits` 1.
+  - Net: no exact-match gain (`0/14` stays `0/14`), with strong false-positive shift (`ocr-no-digits` -> `mismatch`), so fallback stays disabled.
+- Promotion and rollback decisions should now use `MAE` from `roi-diff-report` as the primary signal, with exact-match and no-read as guardrails.
+- ROI diff reports now include per-image selected metadata columns (`sourceLabel`, `method`, `preprocessMode`) and explicitly export the last stage `6. OCR input candidate` snapshot.
+
+## Next TODOs
+
+1. Keep `roi-rotaug-e30-640.pt` as default until a challenger beats it on end-to-end OCR metrics, not only detection presence.
+2. Re-run `npm run benchmark:roi-diff` after each ROI challenger to track per-image movement (`Detected`, stage `5/6` snapshots, reject reason), then summarize deltas in notes/PR.
+3. Tune strip preprocessing and candidate ranking for the current hard failures (`meter_07012020.JPEG`, `meter_02192026.JPEG`, `meter_02202026.JPEG`, `meter_02242026.JPEG`).
+4. Keep classifier fallback disabled until it beats fallback-off on `MAE` while respecting exact-match and no-read guardrails; focus on stricter fallback acceptance/ranking before re-testing.
+5. Enforce checkpoint promotion gates from docs: no MAE regression, no exact-match regression, no no-read regression, and no regression in `ocr-no-digits`.
+6. Keep running both `npm run test:e2e` and UI `Run test set` before commits; include histogram deltas in commit/PR notes.
+7. Medium-term: evaluate YOLO OBB ROI detection to reduce rotation/edge ambiguity; this requires OBB relabeling, retraining, and backend response/schema changes before frontend adoption.
+
+### OBB Notes (Re-verify Before Implementation)
+
+- OBB inference outputs rotated geometry (`xywhr`) and polygon corners.
+- OBB training labels use corners format: `class x1 y1 x2 y2 x3 y3 x4 y4`.
+- OBB angle handling has constraints (Ultralytics OBB uses angles in the `0-90` exclusive range).
+
+## Dataset Expansion Loop (`4/5/6/9`)
+
+1. Refresh capture planning:
+   - `cd backend && source .venv/bin/activate && python plan_digit_expansion.py --target-train-per-digit 12 --priority-digits 4,5,6,9`
+2. Add labeled captures with QA previews.
+3. Validate manifests after each dataset update:
+   - `cd backend && source .venv/bin/activate && python validate_digit_dataset.py`
+4. Retrain classifier only after class coverage improves:
+   - `cd backend && source .venv/bin/activate && python train_digit_classifier.py --device cpu`
+5. Keep classifier fallback disabled by default; only enable if benchmarked `MAE` improves without exact-match/no-read guardrail regressions.
diff --git a/README.md b/README.md
@@ -2,14 +2,22 @@
 
 Jarvis is a lightweight personal assistant web app. The first module helps you read a water meter photo, review the detected value, and draft an email in Gmail.
 
+## Documentation
+
+- Docs index: [`docs/README.md`](./docs/README.md)
+- OCR app logic flow: [`docs/app-logic.md`](./docs/app-logic.md)
+- Backend API guide: [`docs/backend-api.md`](./docs/backend-api.md)
+- OCR tuning playbook: [`docs/ocr-tuning-playbook.md`](./docs/ocr-tuning-playbook.md)
+
 ## Features
 - Upload a meter photo and preview it.
-- OCR the reading (manual override supported).
+- OCR from a neural-ROI crop with conservative acceptance (unsupported OCR guesses are rejected to manual input).
 - Auto-fill an email draft with the current date in Italian format.
 - Open a Gmail draft or use a mailto fallback.
+- Run a built-in OCR test set table with `Detected`, `Absolute Error`, and `Failure Reason` columns plus MAE/exact-match/no-read summary stats.
 
 ## Local Development
-1. Install dependencies (none required beyond Python).
+1. Ensure Python 3 and Node.js are installed.
 2. Run the dev server:
 
 ```bash
@@ -18,19 +26,119 @@ npm run serve
 
 Then open `http://localhost:8000`.
 
+If you also want to run Playwright checks, install JS dependencies once:
+
+```bash
+npm install
+```
+
+### Optional Neural ROI Backend (recommended)
+You can run a Python backend that detects the meter digit window using a fine-tuned pretrained model.
+
+1. Open a second terminal and set up backend dependencies:
+
+```bash
+cd backend
+python3 -m venv .venv
+source .venv/bin/activate
+pip install -r requirements.txt
+```
+
+For CPU-only environments (for example Vercel), install:
+
+```bash
+pip install -r requirements-cpu.txt
+```
+
+
+2. Train/fine-tune a model (copies best checkpoint to `backend/models/roi.pt`):
+
+```bash
+python train_roi.py \
+  --data data/roi_dataset.yaml \
+  --base-model yolov8n.pt \
+  --rotation-angles 90,180,270,360 \
+  --heavy-augment
+```
+
+The API default ROI checkpoint is pinned to `backend/models/roi-rotaug-e30-640.pt`.
+To run with a newly trained checkpoint, set `ROI_MODEL_PATH` explicitly before starting the backend.
+`train_roi.py` now enforces heavy augmentation + rotation expansion by default; weaker runs require explicit `--allow-no-augment-policy`.
+
+Optional: train the per-cell digit classifier checkpoint:
+
+```bash
+python train_digit_classifier.py --device cpu
+```
+
+For dataset expansion/QA before retraining:
+
+```bash
+python plan_digit_expansion.py --target-train-per-digit 12 --priority-digits 4,5,6,9
+python validate_digit_dataset.py
+```
+
+3. Start the API:
+
+```bash
+uvicorn app:app --host 127.0.0.1 --port 8001 --reload
+```
+
+By default, the frontend calls `http://127.0.0.1:8001/roi/detect` and requires neural ROI detection before OCR.
+The frontend can also call `http://127.0.0.1:8001/digit/predict-cells` when `OCR_CONFIG.digitClassifier.enabled` is set to `true`.
+Check backend readiness with:
+
+```bash
+curl -s http://127.0.0.1:8001/health
+```
+
+### E2E Tests
+
+Run Playwright checks for neural-ROI failure handling and OCR selection guard regressions:
+
+```bash
+npm run test:e2e
+```
+
+Generate a per-image ROI checkpoint comparison report (`roi-rotaug-e30-640.pt` vs `roi.pt`) with stage `5/6` debug snapshots:
+
+```bash
+npm run benchmark:roi-diff
+```
+
+Report artifacts are written under `output/roi-checkpoint-diff/<timestamp>/`.
+Per-image diff tables include selected OCR metadata (`sourceLabel`, `method`, `preprocessMode`) and stage `6` exports use the last `6. OCR input candidate` frame from each debug session.
+To benchmark with digit-classifier fallback enabled (gated to `ocr-no-digits`), run:
+
+```bash
+JARVIS_DIGIT_FALLBACK=1 npm run benchmark:roi-diff
+```
+
+CI runs these tests on every pull request and on pushes to `master`.
+
 ## File Overview
 - `index.html`: UI layout.
 - `styles.css`: Styling.
-- `app.js`: OCR + email draft logic.
+- `app.js`: Thin entrypoint that imports `src/main.js`.
+- `src/main.js`: UI orchestration and event wiring.
+- `src/ocr/`: OCR pipeline and neural ROI integration.
+- `src/testset/`: Manual OCR test-set runner.
+- `backend/`: Optional FastAPI service for neural ROI and digit-classifier inference/training.
 - `AGENTS.md`: Contributor guide.
 - `assets/`: Static assets and example uploads.
 
 ## Notes
 - OCR runs fully in the browser using Tesseract.js.
+- OCR now relies on neural ROI detection; if the backend is unavailable or ROI fails, the app asks for manual reading input.
+- ROI word-pass defaults to raw strip input; stage `6. OCR input candidate` mirrors the configured OCR input mode.
+- Edge-derived ROI strip candidates are enabled by default and can be toggled with `OCR_CONFIG.roiDeterministic.useEdgeCandidates`.
+- Digit decoding can optionally use a backend classifier (`src/ocr/config.js` -> `digitClassifier.enabled`), which is `false` by default.
+- The selection layer is fail-safe: isolated edge-only single hits are rejected unless independently corroborated.
+- Use the UI `Run test set` action plus `npm run test:e2e` for OCR regressions before and after tuning.
 - The Gmail flow opens a draft; you always review and send manually.
 
 ## Asset Naming (Meter Images)
 - Use the EXIF `DateTimeOriginal` value as the source of truth for the acquisition date.
 - Rename files to `meter_mmddyyyy` (zero-padded) and keep the original extension.
-- If multiple images share the same date, keep one as-is and add suffixes to the rest (e.g., `_b`, `_c`).
+- If multiple images share the same date, keep one as-is and add numeric suffixes to the rest (e.g., `_1`, `_2`).
 - If EXIF is missing, prefer a known date from the filename or capture notes and document it.