Extract video frames and transcripts from YouTube videos into Obsidian-compatible markdown notes.
Watching a lecture, tutorial, or presentation on YouTube? ytcapture turns any video into a searchable, skimmable markdown note with:
- Embedded frame images at regular intervals so you can see what's on screen
- Timestamped transcript segments aligned to each frame
- Obsidian-ready format with YAML frontmatter and
![[wikilink]]embeds - Smart deduplication that removes redundant frames (great for slide-based content)
No more scrubbing through hour-long videos to find that one slide. Your notes become a visual index of the entire video.
On macOS:
brew install ffmpeg yt-dlp# Clone the repository
git clone https://github.com/jdmonaco/ytcapture.git
cd ytcapture
# Install as a CLI tool with uv (recommended)
uv tool install -e .
# Or install with pip
pip install -e .# Basic usage - outputs to current directory
ytcapture "https://www.youtube.com/watch?v=VIDEO_ID"
# Multiple videos at once
ytcapture URL1 URL2 URL3
# Process an entire playlist (auto-expands)
ytcapture "https://www.youtube.com/playlist?list=PLAYLIST_ID"
# On macOS, just copy a YouTube URL (or playlist) and run without arguments
ytcapture
# Skip confirmation for large playlists (>10 videos)
ytcapture "https://www.youtube.com/playlist?list=PLAYLIST_ID" -y
# Specify output directory
ytcapture URL -o my-notes/
# Adjust frame interval (default: 15 seconds)
ytcapture URL --interval 30
# Extract more frames with aggressive deduplication
ytcapture URL --interval 5 --dedup-threshold 0.80./
├── images/
│ └── VIDEO_ID/
│ ├── frame-0000.jpg
│ ├── frame-0001.jpg
│ └── ...
├── transcripts/
│ └── raw-transcript-VIDEO_ID.json
└── Video Title (Channel Name) 20241120.md
Assets are organized by video ID to support multiple video captures in the same directory.
The generated markdown looks like this:
---
title: Understanding Neural Networks
source: https://www.youtube.com/watch?v=abc123
author:
- Deep Learning Channel
created: '2024-12-15'
published: '2024-11-20'
description: An introduction to neural networks and deep learning fundamentals...
tags:
- youtube
---
# Understanding Neural Networks
> An introduction to neural networks and deep learning fundamentals.
## 00:00:00
![[images/abc123/frame-0000.jpg]]
Welcome to this tutorial on neural networks. Today we'll cover the basics.
## 00:00:15
![[images/abc123/frame-0001.jpg]]
Let's start by understanding what a neuron is and how it processes information.| Option | Default | Description |
|---|---|---|
-o, --output |
. |
Output directory |
--interval |
15 | Frame extraction interval in seconds |
--max-frames |
None | Maximum number of frames to extract |
--frame-format |
jpg | Frame format: jpg or png |
--language |
en | Transcript language code |
--dedup-threshold |
0.85 | Similarity threshold for removing duplicate frames (0.0-1.0) |
--no-dedup |
- | Disable frame deduplication |
--prefer-manual |
- | Only use manual transcripts |
--keep-video |
- | Keep downloaded video file after frame extraction |
-y, --yes |
- | Skip confirmation prompt for large batches (>10 videos) |
-v, --verbose |
- | Verbose output |
-h, --help |
- | Show help message |
Use a shorter interval with deduplication to catch slide transitions:
ytcapture URL --interval 5 --dedup-threshold 0.90Disable deduplication to keep all frames:
ytcapture URL --interval 10 --no-dedupLimit the number of frames to avoid huge output:
ytcapture URL --max-frames 50If you have mdformat installed, ytcapture will automatically format the output markdown:
pip install mdformat mdformat-gfm mdformat-frontmatterMIT