Skip to content

Phase 1 — Core pipeline MVP (capture → OCR → translate → terminal) #1

@Swiftburn

Description

@Swiftburn

Phase 1 — Core pipeline MVP (capture → OCR → translate → terminal)

Purpose

Implement the end-to-end minimal pipeline to capture a frame, run OCR, translate, and output to terminal. This phase enables a CLI-based MVP that proves pipeline wiring before adding overlays, voice, or UI.

Tasks (atomic, AI-sized)

  • 1.1 Implement capture_region and capture_full

    • File: src/capture/screen.py
    • Work: Add capture_region(region: Tuple[int,int,int,int]) -> numpy.ndarray and capture_full() -> numpy.ndarray using mss.
    • Tests: tests/test_capture.py that mocks mss and validates return type and shape.
    • DoD: functions exist, documented, and unit tests pass locally.
  • 1.2 Implement Region dataclass and profile save/load

    • File: src/capture/regions.py
    • Work: Region dataclass (id, name, coords, profile metadata). Add save_profile(name, regions) and load_profile(name) storing JSON under XDG path or repo-local .kanjilens/profiles.
    • Tests: tests/test_regions.py saves and loads a temp profile.
    • DoD: roundtrip save/load works and is used by capture calls.
  • 1.3 Implement frame change detection

    • File: src/capture/change_detector.py
    • Work: has_changed(prev, new, threshold=0.02) -> bool using OpenCV/numpy diffs; add debounce helper should_ocr.
    • Tests: tests/test_change_detector.py verifying identical vs different frames for multiple thresholds.
    • DoD: change detection used by pipeline to skip OCR when unchanged.
  • 1.4 Add CRAFT detector wrapper (interface only)

    • File: src/ocr/detector.py
    • Work: class CraftDetector with load_model() and detect_text_regions(image) -> List[Rect]. Provide a mockable interface; implement a no-op stub mode for CI.
    • Tests: tests/test_detector.py verifies API usage with a mock.
    • DoD: detector class present, documented I/O, tests pass.
  • 1.5 Add MangaOCR reader wrapper (interface only)

    • File: src/ocr/reader.py
    • Work: class MangaOcrReader with load_model() and read_region(image) -> (text, confidence). Provide a fallback/no-op mode for CI.
    • Tests: tests/test_reader.py using a fake model.
    • DoD: reader class present and callable from pipeline.
  • 1.6 Compose OCR pipeline

    • File: src/ocr/pipeline.py
    • Work: Create translate_frame(image) -> List[WordResult] combining detector + reader returning bounding box, surface text, and confidence.
    • Tests: tests/test_pipeline.py mocking detector/reader to assert output schema.
    • DoD: pipeline returns deterministic structured output.
  • 1.7 Minimal terminal runner (single-frame mode)

    • Files: src/core/app.py, adds CLI flag --mode terminal
    • Work: Capture single frame, run pipeline, print numbered words (surface + confidence).
    • Tests: tests/test_cli_terminal.py that runs main in dry-run with mocks.
    • DoD: python -m src.core.app --mode terminal prints numbered words in CI dry-run.

Notes

  • Keep model-loading optional in Phase 1 (tests use stubs).
  • Focus on interfaces and contract stability for downstream phases.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions