Smart image preprocessor for e-ink picture frames. Uses local ML to detect art subjects in museum, gallery, and street art photos, then crops and zooms to highlight them for portrait display.
- ML-powered smart cropping -- YOLO-World + Grounding DINO ensemble detects art, sculptures, murals, and more
- VLM fallback -- Qwen3-VL-2B grounding pass via llama.cpp (~20s/image, cached) activates when YOLO/DINO are uncertain. Enabled by default
- Text detection -- EasyOCR filters signs, labels, and text-heavy regions from primary selection and secondary crops
- Focal point detection -- for large murals that fill the frame, a second Grounding DINO pass finds faces/figures inside the primary to use as the crop anchor
- Contextual zoom -- zooms in on small or distant subjects, leaves large ones untouched
- Multi-crop -- detects multiple art pieces and produces separate crops for each (enabled by default)
- Batch processing -- parallel workers with model caching
- Local processing -- no cloud dependencies, optional OpenVINO acceleration on Intel
# Setup
python3 -m venv venv && source venv/bin/activate
pip install -e .
python scripts/download_models.py
# Process a single image (VLM + multi-crop enabled by default)
frame-prep process -i photo.jpg -o output/ -v
# Without VLM (faster, slightly lower accuracy)
frame-prep process -i photo.jpg -o output/ --no-vlm -v
# Batch process a directory
frame-prep batch -i ~/photos/art/ -o ~/photos/processed/ --skip-existingOutput is 480x800 JPEG by default (3:5 portrait ratio for e-ink frames).
Gallery art -- painting and sculpture detection with smart crop:
Street art -- rotated photo with subject detection:
Focal point detection -- wide mural fills the frame, second pass finds the face/figure to use as crop anchor:
Text detection -- EasyOCR filters text-heavy detections (signs, labels), selecting the actual artwork instead:
- Usage Reference -- full CLI options, cropping strategies, performance tuning
- Testing Guide -- quality assessment with interactive HTML reports
- Contextual Zoom -- how zoom logic works
- Hardware Acceleration -- OpenVINO and threading optimization
# Generate interactive detection report
frame-prep report
# Opens reports/interactive_detection_report.html
# Rate results, export feedback as JSONCurrent accuracy: 94% IoU hit rate (115/122) on ground truth test set (with --vlm; 88% without).
- onedrive-album-download -- download photo albums from OneDrive
- librespot-epd-nowplaying -- Spotify now-playing display for e-ink frames





