-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Summary
Add the ability to record a short video clip of a bin's contents and use AI vision to automatically identify and list the items inside.
Motivation
Manually typing out bin contents is tedious, especially for bins with many small items. A short video pan across a bin could automatically populate the items list, saving time and improving accuracy.
Proposed Solution
Extract key frames from a short video clip and send them to an AI vision API for analysis.
- Capture — User records a short video clip (3-10s) of a bin's contents, or selects an existing video from their device
- Frame extraction — Client-side extraction of N key frames (e.g., 3-5) from the video using canvas/
<video>element - Analysis — Send extracted frames to a vision-capable AI API (configurable provider) with a prompt to identify and list visible items
- Review & confirm — Display detected items for the user to review, edit, and confirm before saving to the bin
This approach is provider-agnostic — any API that supports image input works (OpenAI GPT-4o, Anthropic Claude, Google Gemini, etc.). The provider and API key would be configurable in server settings.
Note: Google Gemini natively accepts video file uploads, which would skip the frame-extraction step entirely. This could be offered as an optimized path when Gemini is the configured provider.
Acceptance Criteria
- User can record or select a short video clip from the bin detail page
- Frames are extracted client-side from the video
- Extracted frames are sent to a configurable AI vision API
- Detected items are presented for user review before saving
- AI provider and API key are configurable in settings
- Works on mobile (primary use case — phone pointed at bin)
- Graceful fallback if AI analysis fails or returns low-confidence results