19 Apr 14:00

76f2e76

v2.3.1 (HOTFIX) Latest

Latest

YTFetcher now can handle network outage and Youtube related exceptions better -- retrying with tenacity and additionally with recovery pass so users do not use their progress for long running tasks.

Added

Added retry logic for _fetch_single using tenacity.

Changed

Changed retryable exceptions.

Fixed

Fixed general exceptions hides video_id information.
Fixed ytfetcher retries IPBlocked exceptions causing unnecessary resource usage.

Assets 2

14 Apr 19:50

kaya70875

v2.3

917db15

v2.3

What's Changed

This update improves transcript fetching reliability, error handling, and logging. It adds structured failure reporting, automatic retries for temporary transcript errors, and smarter caching that stores only permanent failures. Validation has been tightened, CLI logging now uses built-in logging with colorful output, and dependencies were updated, including yt-dlp to 2026.03.17. It also improves type consistency and replaces broad exception handling with pydantic.ValidationError for cleaner, safer error management.

Added

Added new exceptions for _youtube_dl file and it's classes.
Added new exceptions and improve error handling for TranscriptFetcher.
Simplified logging by removing custom log method and use built-in logging for colorful CLI logs.
Removed --quiet argument from CLI.
Updated yt-dlp to latest version 2026.03.17.
Added YTFetcher.get_failed_transcripts() to expose structured failures (video_id, reason, message) after fetch calls.
Added explicit transient failure categories used by the retry and caching pipeline for transcript fetching.

Changed

Used ValidationError from pydantic instead of using general Exception class.
Transcript fetching now performs an automatic retry pass for transient failures before marking them as final failures.
Cache behavior now stores only permanent transcript failures, so transient failures can recover in future runs.

Fixed

Improved transcript result validation to guarantee successful transcript payloads are present before processing.
Fixed transcript fetch return typing to consistently return list[VideoTranscript] and list[FailedTranscript] tuples.

Assets 2

04 Mar 17:03

kaya70875

v2.2

c95eea4

v2.2

Introduction

This update introduces channel tab selection (videos, shorts, streams) for both CLI and Python API, adds verbose logging support, improves transcript language handling, enhances logging clarity, and fixes session management and data consistency issues for better reliability and stability.

Added

Added tab option for both CLI and Python API to fetch from different tabs for a channel. (videos, shorts, streams)
Verbose logging mode via --verbose CLI flag.
Comprehensive debug and info logs for core operations.

Changed

Updated transcript languages behavior for better UX and less friction.
Improved log messages and levels for better clarity.
Refactored filtering logic into a separate method.

Fixed

Fixed session resource leak by closing requests.session properly in TranscriptFetcher.
Improved CLI error handling for graceful exits on exceptions.
Fix users will be forcing to fetch only English transcripts if they are not set a languages parameter.
Fixed potential data loss in YoutubeDLFetcher.

Assets 2

15 Feb 12:03

kaya70875

v2.1

12c603b

v2.1

Description

This PR introduces YTFetcher v2.1 which includes performance optimization with a sqlite3, improving user experience with --all argument which fetches all the videos from a channel or playlist and exposes a utility method for converting ChannelData to Python dict rows for easily feed fetched data into your ML and RAG pipelines.

What's Changed ?

Added

Added convert_to_rows utility method for converting ChannelData objects to Python dict for easily feed data to ML and RAG pipelines.
Added built-in cache strategy for fetching transcripts.
Added CLI argument for channel fetcher and playlist fetcher; --all argument now fetches ALL videos from a channel or playlist.
Added necessary tests for PreviewRenderer class.

Changed

Changed max_results parameter to be optionally None which leads to fetch all videos from a channel if explicitly set.
Removed timeout parameter from HTTPConfig class.
Removed httpx library since it is unused.

Fixed

Fixed PreviewRenderer fails if metadata values are None.
Pypi downloading mkdocs as first-dependency which is unnecesary.
Fixed TranscriptFetcher docstrings by @zhanglinqian

Summary

Implement persistent SQLite caching for transcript fetching with configurable cache paths
Add channel_data_to_rows utility for converting ChannelData to flat dictionaries for ML/RAG pipelines
Extend CLI with cache management commands (--no-cache, --cache-path, cache --clean)
Add comprehensive tests for PreviewRenderer, cache functionality, and CLI arguments
Fix PreviewRenderer to gracefully handle None metadata values

Contributors Of This Release

Thanks to everyone who contributed to YTFetcher.

@kaya70875 @zhanglinqian

Contributors

zhanglinqian and kaya70875

Assets 2

31 Jan 10:37

kaya70875

v2.0

c6fb7b5

v2.0

YTFetcher 2.0 Release

This release introduces major architectural improvements, performance enhancements, and important fixes that required breaking changes.

Version 2.0 establishes a cleaner and more predictable API while resolving critical data alignment issues present in previous releases.

⚠️ Important Notice

All versions prior to 2.0 contain a critical issue where metadata, transcripts, and comments may become misaligned. This issue could not be safely fixed without breaking API changes, and therefore the correction is released as part of this major version.

Users are strongly advised to upgrade.

Added

Introduced a new FetchOptions data class for defining fetcher options such as languages and filters.
Added a --sort argument for selecting top or new comments in the CLI.
Added from_search support to both Python API and CLI, allowing fetching directly from a YouTube-style search query.
Added --quiet CLI flag.
Added pre-fetch filtering support.

Changed

Removed deprecated Exporter class.
Eliminated network requests during object initialization.
YTFetcher now initializes the appropriate BaseYoutubeDLFetcher within class methods.
TranscriptFetcher now creates a Session per thread for improved thread safety.
TranscriptFetcher now returns VideoTranscript instead of ChannelData.
Exported files no longer include None values, reducing noise and file size.
CLI arguments redesigned for improved usability.
Python API is now silent by default; logging appears only in CLI or when verbose mode is enabled.
ytfetcher is now fully synchronous, simplifying usage and architecture.
CLI arguments simplifed with channel, playlist, video and search instead of from_ prefixes.

Fixed

Fixed critical bug where metadata, transcripts, and comments were not aligned.
Fixed HTTPConfig header validation logic.
Improved VideoListFetcher performance via ThreadPoolExecutor.
Fixed issue where CommentFetcher did not correctly fetch top comments.

Breaking Changes

Transcript fetching now returns VideoTranscript objects.
CLI argument structure has changed.
Initialization behavior changed to remove implicit network operations.
Export behavior changed to omit None values.

Users upgrading from previous versions may need to update integrations accordingly.

Summary

YTFetcher 2.0 provides a cleaner API, improved performance, thread safety, and resolves long-standing data consistency issues. This release sets a stable foundation for future development.

Update

Don't forget to update ytfetcher with:

pip install ytfetcher --upgrade

Assets 2

08 Jan 17:27

kaya70875

v1.5.3

15d61a2

v.1.5.3

This release focuses on usability and developer experience, making it easier to inspect, debug, and trust fetched YouTube data before exporting it.

What’s new

Rich CLI Preview

Added a PreviewRenderer powered by rich for beautifully formatted terminal output.
Preview transcripts, metadata, and comments before exporting.
Designed for fast iteration and dataset inspection.

Improved CLI Output Handling

Clean separation between preview output and raw data dumping.
--stdout now prints structured data only, avoiding mixed console noise.
Better behavior when working with large datasets and pipelines.

Documentation Improvements

Added a live CLI demo (asciinema) to the README.
Improved onboarding for new users with clearer examples and flow.

Upgrade

Don't forget to upgrade ytfetcher

pip install --upgrade ytfetcher

Assets 2

31 Dec 16:48

kaya70875

v1.5

f47b7df

v1.5

This release introduces major architectural improvements, including a new comment extraction feature and a flexible exporter system.

New Features

Comment Fetching Engine: High-performance extraction of YouTube comments.

Fetch comments alongside transcripts with fetch_with_comments.
Extract standalone comment datasets with fetch_comments.
Docker Support: Added Dockerfile and docker-compose.yml for a standardized development environment.
Pydantic v2 Integration: Robust data validation and automatic type-casting for all metadata.

Changes & Improvements

Refactored Exporters: Transitioned to specialized subclasses (JSONExporter, TXTExporter, CSVExporter) for better control over output formats.
Enhanced Mapping: Leveraged Pydantic Aliases to handle inconsistent YouTube metadata fields (e.g., _time_text to time_text).

Deprecation Notice
The monolithic Exporter class is now deprecated and replaced by format-specific subclasses. It will be removed in future versions of ytfetcher

Examples

Fetch comments with or without transcripts

To fetch comments for every video with transcript and metadata you can use --comments arg in CLI.

ytfetcher from_channel -c TEDx -m 20 --comments 10 -f csv

If you only need comments with metadata you can use --comments-only argument.

ytfetcher from_channel -c TEDx -m 20 --comments-only 10 -f json

Using New Exporters

You can now import individual exporter and use them individually for better control over data.

from ytfetcher.services import JSONExporter, TXTExporter

fetcher = YTFetcher.from_channel("TheOffice", max_results=5)
async def get_channel_data_with_comments():
    channel_data = await fetcher.fetch_with_comments(max_comments=2)

    txt_exporter = TXTExporter(channel_data=channel_data, allowed_metadata_list=['title', 'description']).write()
    json_exporter = JSONExporter(channel_data=channel_data, allowed_metadata_list=['title', 'view_count']).write()

Upgrade

Don't forget to upgrade ytfetcher

pip install --upgrade ytfetcher

Assets 2

10 Nov 11:59

kaya70875

v1.4.1

648c267

v1.4.1

What's New

Fix

Fixed an issue where running the CLI with from_video_ids would fail with a KeyError: 'url' if the yt_dlp result for a video did not include a url key by @vabe44

Upgrade

Don't forget to upgrade ytfetcher

pip install --upgrade ytfetcher

Contribute

Found a bug or have an idea? Open an issue

Contributors

vabe44

Assets 2

26 Oct 12:26

kaya70875

v1.4

85cbcbc

v1.4

What's New

Fetching Only Manually Created Transcripts

You can now fetch only manually created transcripts with ytfetcher which allows you to get more precise transcripts. (Closes #2)

This feature could be used channels like TEDx which has more manually created transcripts.

Here how you can use it:

Python API

fetcher = YTFetcher.from_channel(channel_handle="TEDx", manually_created=True)

CLI Usage

ytfetcher from_channel -c TEDx -m 50 --manually-created

Fixed Transcript Cleaner Method

Transcript cleaner method wasn't removing >> signs correctly from transcripts. We fixed that.

Upgrade

Don't forget to upgrade ytfetcher

pip install --upgrade ytfetcher

Contribute

Found a bug or have an idea? Open an issue

Assets 2

18 Oct 11:08

kaya70875

v1.3.1

c4f9b76

v1.3.1

What's New

Fetch From Playlist ID

You can now fetch bulk videos from a playlist id using from_playlist_id method in both CLI and Python API:

Note: YTFetcher fetches up to 50 videos by default.
To fetch all videos from a playlist with more than 50 videos, set max_results to the total number of videos in that playlist.

We are planning to find a better solution for this in the future.

Using CLI

ytfetcher from_playlist_id -p playlistid123 -m 20

Using Python API

fetcher = YTFetcher.from_playlist_id(playlist_id="playlistid123")

Better Exporter

Exporter now exports all available data as default.
Added --metadata argument for CLI to choose desired metadata.
You can also exclude timings from transcripts using --no-timing argument in CLI.

This update improves exporter especially in CLI side. Here couple of examples how you can use these new arguments:

Filtering Metadata

ytfetcher from_playlist_id -p playlistid123 --metadata title description

Excluding Timings From Transcripts

ytfetcher from_channel -c TheOffice --no-timings

Accepting Full URL

ytfetcher now can accept full urls for channel_handle and playlist_id parameters.

You can now input full url's like:

https://www.youtube.com/@TheOffice for channel_handle
https://www.youtube.com/playlist?list=PLuvRKGApO-zoF2WBPN2kW188YLke0Igv8 for playlist_id

Full URL is not supported for from_video_ids method right now so you still have to provide exact video id's for it.

Trim @ Character

İf you accidently write channel handle with @ included, ytfetcher now fixes that for you. So now you can also write:

ytfetcher from_channel -c @TheOffice -m 20

Upgrade

Don't forget to upgrade ytfetcher

pip install --upgrade ytfetcher

Contribute

Found a bug or have an idea? Open an issue

Assets 2

Releases: kaya70875/ytfetcher

v2.3.1 (HOTFIX)

Added

Changed

Fixed

Uh oh!

v2.3

What's Changed

Added

Changed

Fixed

Uh oh!

v2.2

Introduction

Added

Changed

Fixed

Uh oh!

v2.1

Description

What's Changed ?

Added

Changed

Fixed

Summary

Contributors Of This Release

Contributors

Uh oh!

v2.0

YTFetcher 2.0 Release

⚠️ Important Notice

Added

Changed

Fixed

Breaking Changes

Summary

Update

Uh oh!

v.1.5.3

What’s new

Rich CLI Preview

Improved CLI Output Handling

Documentation Improvements

Upgrade

Uh oh!

v1.5

New Features

Changes & Improvements

Examples

Fetch comments with or without transcripts

Using New Exporters

Upgrade

Uh oh!

v1.4.1

What's New

Fix

Upgrade

Contribute

Contributors

Uh oh!

v1.4

What's New

Fetching Only Manually Created Transcripts

Python API

CLI Usage

Fixed Transcript Cleaner Method

Upgrade

Contribute

Uh oh!

v1.3.1

What's New

Fetch From Playlist ID

Using CLI

Using Python API

Better Exporter

Filtering Metadata

Excluding Timings From Transcripts

Accepting Full URL

Trim @ Character

Upgrade