-
Notifications
You must be signed in to change notification settings - Fork 0
Phase 1 — Core pipeline MVP (capture → OCR → translate → terminal) #1
Copy link
Copy link
Open
Description
Phase 1 — Core pipeline MVP (capture → OCR → translate → terminal)
Purpose
Implement the end-to-end minimal pipeline to capture a frame, run OCR, translate, and output to terminal. This phase enables a CLI-based MVP that proves pipeline wiring before adding overlays, voice, or UI.
Tasks (atomic, AI-sized)
-
1.1 Implement
capture_regionandcapture_full- File:
src/capture/screen.py - Work: Add
capture_region(region: Tuple[int,int,int,int]) -> numpy.ndarrayandcapture_full() -> numpy.ndarrayusingmss. - Tests:
tests/test_capture.pythat mocksmssand validates return type and shape. - DoD: functions exist, documented, and unit tests pass locally.
- File:
-
1.2 Implement Region dataclass and profile save/load
- File:
src/capture/regions.py - Work:
Regiondataclass (id, name, coords, profile metadata). Addsave_profile(name, regions)andload_profile(name)storing JSON under XDG path or repo-local.kanjilens/profiles. - Tests:
tests/test_regions.pysaves and loads a temp profile. - DoD: roundtrip save/load works and is used by capture calls.
- File:
-
1.3 Implement frame change detection
- File:
src/capture/change_detector.py - Work:
has_changed(prev, new, threshold=0.02) -> boolusing OpenCV/numpy diffs; add debounce helpershould_ocr. - Tests:
tests/test_change_detector.pyverifying identical vs different frames for multiple thresholds. - DoD: change detection used by pipeline to skip OCR when unchanged.
- File:
-
1.4 Add CRAFT detector wrapper (interface only)
- File:
src/ocr/detector.py - Work:
class CraftDetectorwithload_model()anddetect_text_regions(image) -> List[Rect]. Provide a mockable interface; implement a no-op stub mode for CI. - Tests:
tests/test_detector.pyverifies API usage with a mock. - DoD: detector class present, documented I/O, tests pass.
- File:
-
1.5 Add MangaOCR reader wrapper (interface only)
- File:
src/ocr/reader.py - Work:
class MangaOcrReaderwithload_model()andread_region(image) -> (text, confidence). Provide a fallback/no-op mode for CI. - Tests:
tests/test_reader.pyusing a fake model. - DoD: reader class present and callable from pipeline.
- File:
-
1.6 Compose OCR pipeline
- File:
src/ocr/pipeline.py - Work: Create
translate_frame(image) -> List[WordResult]combining detector + reader returning bounding box, surface text, and confidence. - Tests:
tests/test_pipeline.pymocking detector/reader to assert output schema. - DoD: pipeline returns deterministic structured output.
- File:
-
1.7 Minimal terminal runner (single-frame mode)
- Files:
src/core/app.py, adds CLI flag--mode terminal - Work: Capture single frame, run pipeline, print numbered words (surface + confidence).
- Tests:
tests/test_cli_terminal.pythat runs main in dry-run with mocks. - DoD:
python -m src.core.app --mode terminalprints numbered words in CI dry-run.
- Files:
Notes
- Keep model-loading optional in Phase 1 (tests use stubs).
- Focus on interfaces and contract stability for downstream phases.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels