Skip to content

Releases: kaya70875/ytfetcher

v2.3.1 (HOTFIX)

19 Apr 14:00

Choose a tag to compare

YTFetcher now can handle network outage and Youtube related exceptions better -- retrying with tenacity and additionally with recovery pass so users do not use their progress for long running tasks.

Added

  • Added retry logic for _fetch_single using tenacity.

Changed

  • Changed retryable exceptions.

Fixed

  • Fixed general exceptions hides video_id information.
  • Fixed ytfetcher retries IPBlocked exceptions causing unnecessary resource usage.

v2.3

14 Apr 19:50
917db15

Choose a tag to compare

What's Changed

This update improves transcript fetching reliability, error handling, and logging. It adds structured failure reporting, automatic retries for temporary transcript errors, and smarter caching that stores only permanent failures. Validation has been tightened, CLI logging now uses built-in logging with colorful output, and dependencies were updated, including yt-dlp to 2026.03.17. It also improves type consistency and replaces broad exception handling with pydantic.ValidationError for cleaner, safer error management.

Added

  • Added new exceptions for _youtube_dl file and it's classes.
  • Added new exceptions and improve error handling for TranscriptFetcher.
  • Simplified logging by removing custom log method and use built-in logging for colorful CLI logs.
  • Removed --quiet argument from CLI.
  • Updated yt-dlp to latest version 2026.03.17.
  • Added YTFetcher.get_failed_transcripts() to expose structured failures (video_id, reason, message) after fetch calls.
  • Added explicit transient failure categories used by the retry and caching pipeline for transcript fetching.

Changed

  • Used ValidationError from pydantic instead of using general Exception class.
  • Transcript fetching now performs an automatic retry pass for transient failures before marking them as final failures.
  • Cache behavior now stores only permanent transcript failures, so transient failures can recover in future runs.

Fixed

  • Improved transcript result validation to guarantee successful transcript payloads are present before processing.
  • Fixed transcript fetch return typing to consistently return list[VideoTranscript] and list[FailedTranscript] tuples.

v2.2

04 Mar 17:03
c95eea4

Choose a tag to compare

Introduction

This update introduces channel tab selection (videos, shorts, streams) for both CLI and Python API, adds verbose logging support, improves transcript language handling, enhances logging clarity, and fixes session management and data consistency issues for better reliability and stability.

Added

  • Added tab option for both CLI and Python API to fetch from different tabs for a channel. (videos, shorts, streams)
  • Verbose logging mode via --verbose CLI flag.
  • Comprehensive debug and info logs for core operations.

Changed

  • Updated transcript languages behavior for better UX and less friction.
  • Improved log messages and levels for better clarity.
  • Refactored filtering logic into a separate method.

Fixed

  • Fixed session resource leak by closing requests.session properly in TranscriptFetcher.
  • Improved CLI error handling for graceful exits on exceptions.
  • Fix users will be forcing to fetch only English transcripts if they are not set a languages parameter.
  • Fixed potential data loss in YoutubeDLFetcher.

v2.1

15 Feb 12:03

Choose a tag to compare

Description

This PR introduces YTFetcher v2.1 which includes performance optimization with a sqlite3, improving user experience with --all argument which fetches all the videos from a channel or playlist and exposes a utility method for converting ChannelData to Python dict rows for easily feed fetched data into your ML and RAG pipelines.

What's Changed ?

Added

  • Added convert_to_rows utility method for converting ChannelData objects to Python dict for easily feed data to ML and RAG pipelines.
  • Added built-in cache strategy for fetching transcripts.
  • Added CLI argument for channel fetcher and playlist fetcher; --all argument now fetches ALL videos from a channel or playlist.
  • Added necessary tests for PreviewRenderer class.

Changed

  • Changed max_results parameter to be optionally None which leads to fetch all videos from a channel if explicitly set.
  • Removed timeout parameter from HTTPConfig class.
  • Removed httpx library since it is unused.

Fixed

  • Fixed PreviewRenderer fails if metadata values are None.
  • Pypi downloading mkdocs as first-dependency which is unnecesary.
  • Fixed TranscriptFetcher docstrings by @zhanglinqian

Summary

  • Implement persistent SQLite caching for transcript fetching with configurable cache paths
  • Add channel_data_to_rows utility for converting ChannelData to flat dictionaries for ML/RAG pipelines
  • Extend CLI with cache management commands (--no-cache, --cache-path, cache --clean)
  • Add comprehensive tests for PreviewRenderer, cache functionality, and CLI arguments
  • Fix PreviewRenderer to gracefully handle None metadata values

Contributors Of This Release

Thanks to everyone who contributed to YTFetcher.

@kaya70875 @zhanglinqian

v2.0

31 Jan 10:37

Choose a tag to compare

YTFetcher 2.0 Release

This release introduces major architectural improvements, performance enhancements, and important fixes that required breaking changes.

Version 2.0 establishes a cleaner and more predictable API while resolving critical data alignment issues present in previous releases.


⚠️ Important Notice

All versions prior to 2.0 contain a critical issue where metadata, transcripts, and comments may become misaligned. This issue could not be safely fixed without breaking API changes, and therefore the correction is released as part of this major version.

Users are strongly advised to upgrade.


Added

  • Introduced a new FetchOptions data class for defining fetcher options such as languages and filters.
  • Added a --sort argument for selecting top or new comments in the CLI.
  • Added from_search support to both Python API and CLI, allowing fetching directly from a YouTube-style search query.
  • Added --quiet CLI flag.
  • Added pre-fetch filtering support.

Changed

  • Removed deprecated Exporter class.
  • Eliminated network requests during object initialization.
  • YTFetcher now initializes the appropriate BaseYoutubeDLFetcher within class methods.
  • TranscriptFetcher now creates a Session per thread for improved thread safety.
  • TranscriptFetcher now returns VideoTranscript instead of ChannelData.
  • Exported files no longer include None values, reducing noise and file size.
  • CLI arguments redesigned for improved usability.
  • Python API is now silent by default; logging appears only in CLI or when verbose mode is enabled.
  • ytfetcher is now fully synchronous, simplifying usage and architecture.
  • CLI arguments simplifed with channel, playlist, video and search instead of from_ prefixes.

Fixed

  • Fixed critical bug where metadata, transcripts, and comments were not aligned.
  • Fixed HTTPConfig header validation logic.
  • Improved VideoListFetcher performance via ThreadPoolExecutor.
  • Fixed issue where CommentFetcher did not correctly fetch top comments.

Breaking Changes

  • Transcript fetching now returns VideoTranscript objects.
  • CLI argument structure has changed.
  • Initialization behavior changed to remove implicit network operations.
  • Export behavior changed to omit None values.

Users upgrading from previous versions may need to update integrations accordingly.


Summary

YTFetcher 2.0 provides a cleaner API, improved performance, thread safety, and resolves long-standing data consistency issues. This release sets a stable foundation for future development.

Update

Don't forget to update ytfetcher with:

pip install ytfetcher --upgrade

v.1.5.3

08 Jan 17:27

Choose a tag to compare

This release focuses on usability and developer experience, making it easier to inspect, debug, and trust fetched YouTube data before exporting it.

What’s new

Rich CLI Preview

example-cli
  • Added a PreviewRenderer powered by rich for beautifully formatted terminal output.

  • Preview transcripts, metadata, and comments before exporting.

  • Designed for fast iteration and dataset inspection.

Improved CLI Output Handling

  • Clean separation between preview output and raw data dumping.

  • --stdout now prints structured data only, avoiding mixed console noise.

  • Better behavior when working with large datasets and pipelines.

Documentation Improvements

  • Added a live CLI demo (asciinema) to the README.

  • Improved onboarding for new users with clearer examples and flow.

Upgrade

Don't forget to upgrade ytfetcher

pip install --upgrade ytfetcher

v1.5

31 Dec 16:48

Choose a tag to compare

This release introduces major architectural improvements, including a new comment extraction feature and a flexible exporter system.

New Features

Comment Fetching Engine: High-performance extraction of YouTube comments.

  • Fetch comments alongside transcripts with fetch_with_comments.

  • Extract standalone comment datasets with fetch_comments.

  • Docker Support: Added Dockerfile and docker-compose.yml for a standardized development environment.

  • Pydantic v2 Integration: Robust data validation and automatic type-casting for all metadata.

Changes & Improvements

  • Refactored Exporters: Transitioned to specialized subclasses (JSONExporter, TXTExporter, CSVExporter) for better control over output formats.

  • Enhanced Mapping: Leveraged Pydantic Aliases to handle inconsistent YouTube metadata fields (e.g., _time_text to time_text).

Deprecation Notice
The monolithic Exporter class is now deprecated and replaced by format-specific subclasses. It will be removed in future versions of ytfetcher

Examples

Fetch comments with or without transcripts

To fetch comments for every video with transcript and metadata you can use --comments arg in CLI.

ytfetcher from_channel -c TEDx -m 20 --comments 10 -f csv

If you only need comments with metadata you can use --comments-only argument.

ytfetcher from_channel -c TEDx -m 20 --comments-only 10 -f json

Using New Exporters

You can now import individual exporter and use them individually for better control over data.

from ytfetcher.services import JSONExporter, TXTExporter

fetcher = YTFetcher.from_channel("TheOffice", max_results=5)
async def get_channel_data_with_comments():
    channel_data = await fetcher.fetch_with_comments(max_comments=2)

    txt_exporter = TXTExporter(channel_data=channel_data, allowed_metadata_list=['title', 'description']).write()
    json_exporter = JSONExporter(channel_data=channel_data, allowed_metadata_list=['title', 'view_count']).write()

Upgrade

Don't forget to upgrade ytfetcher

pip install --upgrade ytfetcher

v1.4.1

10 Nov 11:59

Choose a tag to compare

What's New

Fix

  • Fixed an issue where running the CLI with from_video_ids would fail with a KeyError: 'url' if the yt_dlp result for a video did not include a url key by @vabe44

Upgrade

Don't forget to upgrade ytfetcher

pip install --upgrade ytfetcher

Contribute

Found a bug or have an idea? Open an issue

v1.4

26 Oct 12:26

Choose a tag to compare

What's New

Fetching Only Manually Created Transcripts

You can now fetch only manually created transcripts with ytfetcher which allows you to get more precise transcripts. (Closes #2)

This feature could be used channels like TEDx which has more manually created transcripts.

Here how you can use it:

Python API

fetcher = YTFetcher.from_channel(channel_handle="TEDx", manually_created=True)

CLI Usage

ytfetcher from_channel -c TEDx -m 50 --manually-created

Fixed Transcript Cleaner Method

Transcript cleaner method wasn't removing >> signs correctly from transcripts. We fixed that.

Upgrade

Don't forget to upgrade ytfetcher

pip install --upgrade ytfetcher

Contribute

Found a bug or have an idea? Open an issue

v1.3.1

18 Oct 11:08

Choose a tag to compare

What's New

Fetch From Playlist ID

You can now fetch bulk videos from a playlist id using from_playlist_id method in both CLI and Python API:

Note: YTFetcher fetches up to 50 videos by default.
To fetch all videos from a playlist with more than 50 videos, set max_results to the total number of videos in that playlist.

We are planning to find a better solution for this in the future.

Using CLI

ytfetcher from_playlist_id -p playlistid123 -m 20

Using Python API

fetcher = YTFetcher.from_playlist_id(playlist_id="playlistid123")

Better Exporter

  • Exporter now exports all available data as default.
  • Added --metadata argument for CLI to choose desired metadata.
  • You can also exclude timings from transcripts using --no-timing argument in CLI.

This update improves exporter especially in CLI side. Here couple of examples how you can use these new arguments:

Filtering Metadata

ytfetcher from_playlist_id -p playlistid123 --metadata title description

Excluding Timings From Transcripts

ytfetcher from_channel -c TheOffice --no-timings

Accepting Full URL

ytfetcher now can accept full urls for channel_handle and playlist_id parameters.

You can now input full url's like:

Full URL is not supported for from_video_ids method right now so you still have to provide exact video id's for it.

Trim @ Character

İf you accidently write channel handle with @ included, ytfetcher now fixes that for you. So now you can also write:

ytfetcher from_channel -c @TheOffice -m 20

Upgrade

Don't forget to upgrade ytfetcher

pip install --upgrade ytfetcher

Contribute

Found a bug or have an idea? Open an issue