Picture Frame Preprocessor

Smart image preprocessor for e-ink picture frames. Uses local ML to detect art subjects in museum, gallery, and street art photos, then crops and zooms to highlight them for portrait display.

Features

ML-powered smart cropping -- YOLO-World + Grounding DINO ensemble detects art, sculptures, murals, and more
VLM fallback -- Qwen3-VL-2B grounding pass via llama.cpp (~20s/image, cached) activates when YOLO/DINO are uncertain. Enabled by default
Text detection -- EasyOCR filters signs, labels, and text-heavy regions from primary selection and secondary crops
Focal point detection -- for large murals that fill the frame, a second Grounding DINO pass finds faces/figures inside the primary to use as the crop anchor
Contextual zoom -- zooms in on small or distant subjects, leaves large ones untouched
Multi-crop -- detects multiple art pieces and produces separate crops for each (enabled by default)
Batch processing -- parallel workers with model caching
Local processing -- no cloud dependencies, optional OpenVINO acceleration on Intel

Quick Start

# Setup
python3 -m venv venv && source venv/bin/activate
pip install -e .
python scripts/download_models.py

# Process a single image (VLM + multi-crop enabled by default)
frame-prep process -i photo.jpg -o output/ -v

# Without VLM (faster, slightly lower accuracy)
frame-prep process -i photo.jpg -o output/ --no-vlm -v

# Batch process a directory
frame-prep batch -i ~/photos/art/ -o ~/photos/processed/ --skip-existing

Output is 480x800 JPEG by default (3:5 portrait ratio for e-ink frames).

More Samples

Gallery art -- painting and sculpture detection with smart crop:

Street art -- rotated photo with subject detection:

Focal point detection -- wide mural fills the frame, second pass finds the face/figure to use as crop anchor:

Text detection -- EasyOCR filters text-heavy detections (signs, labels), selecting the actual artwork instead:

Documentation

Usage Reference -- full CLI options, cropping strategies, performance tuning
Testing Guide -- quality assessment with interactive HTML reports
Contextual Zoom -- how zoom logic works
Hardware Acceleration -- OpenVINO and threading optimization

Quality Assessment

# Generate interactive detection report
frame-prep report

# Opens reports/interactive_detection_report.html
# Rate results, export feedback as JSON

Current accuracy: 94% IoU hit rate (115/122) on ground truth test set (with --vlm; 88% without).

Related Projects

onedrive-album-download -- download photo albums from OneDrive
librespot-epd-nowplaying -- Spotify now-playing display for e-ink frames

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
.claude/commands		.claude/commands
archive		archive
docs		docs
models		models
reports		reports
samples		samples
scripts		scripts
src/frame_prep		src/frame_prep
test_real_images		test_real_images
tests		tests
.gitignore		.gitignore
BACKLOG.md		BACKLOG.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
eval-baseline.json		eval-baseline.json
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Picture Frame Preprocessor

Features

Quick Start

More Samples

Documentation

Quality Assessment

Related Projects

License

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Picture Frame Preprocessor

Features

Quick Start

More Samples

Documentation

Quality Assessment

Related Projects

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages