Releases: aritropaul/voom
Releases · aritropaul/voom
v3.1.0 — Bring Your Own AI
Bring Your Own AI
Connect your own API key from OpenAI, Anthropic, Google, or xAI to power AI-generated titles, summaries, and chapters. No middleman — Voom calls provider APIs directly.
What's new
- BYOK AI integration — paste your API key in Settings and Voom auto-detects the provider (OpenAI, Anthropic, Google, xAI) from the key prefix
- Direct provider APIs — no OpenRouter or proxy services; Voom calls each provider's native API (OpenAI chat completions, Anthropic Messages, Google Gemini, xAI)
- Current models — GPT-5.4, Claude Opus/Sonnet 4.6, Gemini 3.1 Pro, Grok 4.1 Fast, and more
- Seamless fallback — without an API key, everything works exactly as before using Apple's on-device Foundation Models
- New VoomAI package — clean separation of AI provider logic into its own Swift package
Providers & models
| Provider | Models |
|---|---|
| OpenAI | GPT-5.4, GPT-5 Mini, GPT-5 Nano |
| Anthropic | Claude Opus 4.6, Claude Sonnet 4.6, Claude Haiku 4.5 |
| Gemini 3.1 Pro, Gemini 3 Flash, Gemini 3.1 Flash Lite | |
| xAI | Grok 4.1 Fast, Grok 4 Fast |
Zero regression
If you don't configure an API key, Voom behaves identically to v3.0.2 — on-device Apple Foundation Models handle all AI features automatically.
Full Changelog: v3.0.2...v3.1.0
v3.0.2 — Meeting Recorder Pipeline & AI Fixes
What's New
Meeting Recording Pipeline
- Meeting recordings now use the dedicated
MeetingRecorder— HD/2K resolution (capped at 2560), 30fps, with split-track audio for speaker diarization - Previously, meetings incorrectly used
ScreenRecorderat native retina resolution with no diarization
AI Generation Fixes
- Title and summary generation now subsamples transcripts (60-80 segments) to fit Apple Foundation Models' context window — fixes failures on long recordings
- Speaker labels from diarization are included in AI prompts for meeting-aware titles and summaries
- Lowered word count thresholds (title: 5→2, summary: 10→3) so short recordings still get AI-generated metadata
UI Improvements
- Summary card in player view fits content height without internal scrolling
- Edit summary via pencil icon in the section header; commits on focus loss
Full Changelog: v3.0.1...v3.0.2
v3.0.1 — Web-Optimized Sharing
What's New
- Web-Optimized Sharing — Shared videos are re-encoded to H.264 for universal browser playback with smaller file sizes and faster loading.
- Captions & Speaker Labels — Share pages now serve WebVTT captions with speaker names, visible in the player and in fullscreen.
- Chapters on Share Pages — Auto-generated chapters appear on share pages with clickable timestamps and seekbar markers.
- Share Pipeline Progress — Optimization progress is shown in the share sheet before upload begins.
- What's New Sheet — First launch after update highlights v3.0.1 features via WhatsNewKit.
Worker Changes
- New
/vtt/:coderoute serves WebVTT captions with speaker labels. - Chapters and speaker metadata included in share uploads.
- D1 migration
0003_chapters_speakers.sqladds chapters table, speaker column, and is_meeting flag.
Full Changelog
https://github.com/aritropaul/voom/blob/main/CHANGELOG.md
Full Changelog: v3.0.0...v3.0.1
v3.0.0
Voom 3.0.0
Major Changes
- FluidAudio migration — replaced WhisperKit with FluidAudio for on-device ASR and speaker diarization
- Architecture split — refactored into VoomCore, VoomApp, and VoomMeetings packages
Speaker Diarization
- Split-track diarization: saves separate mic and system audio during meeting recordings for accurate speaker separation
- "You" identification: mic audio diarized separately so the local user's segments are labeled "You"
- Remote speaker separation: system audio diarized independently for cleaner multi-speaker identification
Auto Chapters
- Chapters now auto-generate after transcription for both regular and meeting recordings
- Fixed chapter generation only covering first ~2 minutes by subsampling transcript evenly across full duration
Bug Fixes
- Fixed garbled mic audio caused by AVAudioEngine voice processing conflicting with ScreenCaptureKit audio capture
- Speaker labels displayed in transcript view
Debug
- Added debug mode in settings (debug builds only)
Full Changelog: v2.8.1...v3.0.0
v2.8.1
What's Changed
Performance
- ~40% smaller video files — Optimized HEVC compression with 8 Mbps bitrate, B-frames enabled, and 4-second GOP intervals. Screen capture frame rate synced to 30fps to match encoding.
Audio
- Hardware echo cancellation & noise suppression — Replaced raw
AVCaptureSessionmic input withAVAudioEnginevoice processing unit for cleaner audio in all recordings.
AI Improvements
- Smarter titles — AI title generation now uses the full transcript instead of only the first 2 minutes. Prompts tuned to focus on primary work topics, ignoring small talk and filler.
- Better summaries — AI summaries now produce 4–8 sentences covering key decisions, outcomes, and action items.
Meeting Detection
- Instant auto-stop — Meeting recordings now stop immediately when camera turns off, instead of waiting for 10 seconds of audio silence.
Full Changelog: v2.8.0...v2.8.1
v2.8.0 — Meeting Detection & Theme Polish
What's New
Meeting Detection
- Voom can now detect when you're in a meeting by checking your calendar events and prompting you to record
- Upcoming meetings appear in the menu bar tray with a quick-join link
- Auto-start recording when a meeting begins (skips countdown) and auto-stop when it ends
- System audio activity tracking ensures recordings don't stop while participants are still talking
- New setting toggle in preferences with calendar permission request
Theme & UI
- ShareSettingsSheet fully redesigned with VoomTheme — custom styled buttons, consistent typography, proper dividers
- FillerWordSheet upgraded with themed progress indicator, ScrollView-based list, and improved layout
- Removed camera/mic/system audio badges from player detail view for a cleaner look
AI Improvements
- Title and summary generation now handles short or trivial transcripts gracefully (minimum word count guards)
- Hardened prompts prevent the AI from misinterpreting raw transcript text as instructions
- Recordings with very short transcripts get their filename restored instead of a confused AI title
Full Changelog: v2.7.3...v2.8.0
v2.7.3
Full Changelog: v2.7.2...v2.7.3
v2.7.2
Full Changelog: v2.7.1...v2.7.2
v2.7.1
Full Changelog: v2.7.0...v2.7.1
v2.7.0
Full Changelog: v2.6.2...v2.7.0