A flexible, open-source pipeline to process YouTube videos using Google's Gemini models. It supports transcript analysis, native video understanding (multimodality), batch processing, and structured outputs.
- Flexible Input: Process single videos, lists of URLs, or entire channels.
- Advanced Filters: Filter channel videos by date (e.g., "1y"), duration (e.g., "medium"), and limit.
- Multimodality:
- Transcript Mode: Fast, text-based analysis using video captions.
- Video Mode: Deep visual and audio analysis using Gemini's native YouTube support (no downloads required).
- Structured Output: Get results in plain text or structured JSON using defined schemas.
- Clean Interface: Simple
Source,Command,Outputclass-based API.
-
Clone the repository:
git clone https://github.com/GtPluto/YoutubePipeline.git cd YoutubePipeline -
Create and activate a virtual environment:
python -m venv venv source venv/bin/activate -
Install the package:
pip install -e .
Create a .env file in the root directory with your API keys:
YOUTUBE_API_KEY=your_youtube_api_key
GEMINI_API_KEY=your_gemini_api_key- YouTube API Key: Get it from Google Cloud Console.
- Gemini API Key: Get it from Google AI Studio.
import os
from dotenv import load_dotenv
from yt_pipeline.core import Pipeline, Source, Command, Output
load_dotenv()
pipeline = Pipeline(
youtube_api_key=os.getenv("YOUTUBE_API_KEY"),
gemini_api_key=os.getenv("GEMINI_API_KEY")
)
# 1. Define Source
source = Source("https://www.youtube.com/watch?v=dQw4w9WgXcQ")
# 2. Define Command
command = Command(prompt="Summarize this video.", modality="transcript")
# 3. Define Output
output = Output(format="text")
# 4. Process
result = pipeline.process(source, command, output)
print(result)Fetch the last 5 "medium" length videos from a channel published in the last year.
source = Source(
value="https://www.youtube.com/@GoogleDevelopers", # Google Developers Channel URL
filters={
"limit": 5,
"duration": "medium", # short (<4m), medium (4-20m), long (>20m)
"date": "1y", # 1y, 30d, 24h, or YYYY-MM-DD
"order": "date"
}
)
command = Command(prompt="What is the main topic?", modality="transcript")
output = Output(format="text")
results = pipeline.process(source, command, output)
for res in results:
print(f"[{res['title']}] {res['result']}")Analyze the visual content of a video directly (uses Gemini's native video understanding).
source = Source("https://www.youtube.com/watch?v=98DcoXwGX6I")
command = Command(
prompt="Describe the visual style and key scenes.",
modality="video" # Uses native video understanding
)
output = Output(format="text")
result = pipeline.process(source, command, output)
print(result)Force the model to return a specific JSON structure.
schema = {
"type": "object",
"properties": {
"title": {"type": "string"},
"sentiment": {"type": "string", "enum": ["positive", "neutral", "negative"]},
"tags": {"type": "array", "items": {"type": "string"}}
},
"required": ["title", "sentiment", "tags"]
}
source = Source("https://www.youtube.com/watch?v=dQw4w9WgXcQ")
command = Command(prompt="Analyze sentiment and extract tags.", modality="transcript")
output = Output(format="json", schema=schema)
result = pipeline.process(source, command, output)
print(result)
# Output: {'title': '...', 'sentiment': 'positive', 'tags': ['music', '80s']}Defines the input content and filtering rules.
Source(value: Union[str, List[str]], filters: Dict[str, Any] = None)value: A single YouTube URL, a Channel ID (starting withUC), a Handle (e.g.,@google), a Channel URL, or a list of them.filters: A dictionary of filters to apply (mostly for channel fetching).
Supported Filters:
limit(int): Maximum number of videos to fetch from a channel (default: 10).order(str): Sort order for channel videos. Values:"date","rating","relevance","title","videoCount","viewCount".date(str): Filter by publish date.- Relative:
"1y"(1 year),"30d"(30 days),"24h"(24 hours). - Absolute:
"YYYY-MM-DD".
- Relative:
duration(str): Filter by video duration.- Values:
"short"(<4m),"medium"(4-20m),"long"(>20m). - Aliases:
"<4m","4-20m",">20m".
- Values:
language(str): Filter by language code (e.g.,"en","es").
Defines the instruction and processing mode.
Command(prompt: str, modality: str = "transcript")prompt: The natural language instruction for Gemini (e.g., "Summarize this").modality: The method of analysis."transcript": Uses the video captions. Fast and cost-effective. Best for speech-heavy content."video": Uses Gemini's native video understanding. Analyzes visual frames and audio. Best for visual content.
Defines the structure and format of the result.
Output(format: str = "text", schema: Any = None, destination: str = None)format:"text"(default) or"json".schema: A Python dictionary (JSON Schema) or Pydantic model defining the structure. Required ifformat="json".destination: (Optional) Path to save the output.
Executes the pipeline.
def process(self, source: Source, command: Command, output: Output) -> Union[str, List[Dict]]- Returns:
- If processing a single video: The result string (or dict if JSON).
- If processing multiple videos (batch/channel): A list of dictionaries containing
video_id,title, andresult.