Releases: kaya70875/ytfetcher
v2.3.1 (HOTFIX)
YTFetcher now can handle network outage and Youtube related exceptions better -- retrying with tenacity and additionally with recovery pass so users do not use their progress for long running tasks.
Added
- Added retry logic for
_fetch_singleusingtenacity.
Changed
- Changed retryable exceptions.
Fixed
- Fixed general exceptions hides
video_idinformation. - Fixed ytfetcher retries
IPBlockedexceptions causing unnecessary resource usage.
v2.3
What's Changed
This update improves transcript fetching reliability, error handling, and logging. It adds structured failure reporting, automatic retries for temporary transcript errors, and smarter caching that stores only permanent failures. Validation has been tightened, CLI logging now uses built-in logging with colorful output, and dependencies were updated, including yt-dlp to 2026.03.17. It also improves type consistency and replaces broad exception handling with pydantic.ValidationError for cleaner, safer error management.
Added
- Added new exceptions for
_youtube_dlfile and it's classes. - Added new exceptions and improve error handling for
TranscriptFetcher. - Simplified logging by removing custom
logmethod and use built-inloggingfor colorful CLI logs. - Removed
--quietargument from CLI. - Updated
yt-dlpto latest version2026.03.17. - Added
YTFetcher.get_failed_transcripts()to expose structured failures (video_id,reason,message) after fetch calls. - Added explicit transient failure categories used by the retry and caching pipeline for transcript fetching.
Changed
- Used
ValidationErrorfrom pydantic instead of using generalExceptionclass. - Transcript fetching now performs an automatic retry pass for transient failures before marking them as final failures.
- Cache behavior now stores only permanent transcript failures, so transient failures can recover in future runs.
Fixed
- Improved transcript result validation to guarantee successful transcript payloads are present before processing.
- Fixed transcript fetch return typing to consistently return
list[VideoTranscript]andlist[FailedTranscript]tuples.
v2.2
Introduction
This update introduces channel tab selection (videos, shorts, streams) for both CLI and Python API, adds verbose logging support, improves transcript language handling, enhances logging clarity, and fixes session management and data consistency issues for better reliability and stability.
Added
- Added tab option for both CLI and Python API to fetch from different tabs for a channel. (
videos,shorts,streams) - Verbose logging mode via
--verboseCLI flag. - Comprehensive debug and info logs for core operations.
Changed
- Updated transcript languages behavior for better UX and less friction.
- Improved log messages and levels for better clarity.
- Refactored filtering logic into a separate method.
Fixed
- Fixed session resource leak by closing
requests.sessionproperly inTranscriptFetcher. - Improved CLI error handling for graceful exits on exceptions.
- Fix users will be forcing to fetch only English transcripts if they are not set a
languagesparameter. - Fixed potential data loss in
YoutubeDLFetcher.
v2.1
Description
This PR introduces YTFetcher v2.1 which includes performance optimization with a sqlite3, improving user experience with --all argument which fetches all the videos from a channel or playlist and exposes a utility method for converting ChannelData to Python dict rows for easily feed fetched data into your ML and RAG pipelines.
What's Changed ?
Added
- Added
convert_to_rowsutility method for convertingChannelDataobjects to Python dict for easily feed data to ML and RAG pipelines. - Added built-in cache strategy for fetching transcripts.
- Added CLI argument for channel fetcher and playlist fetcher;
--allargument now fetches ALL videos from a channel or playlist. - Added necessary tests for
PreviewRendererclass.
Changed
- Changed
max_resultsparameter to be optionally None which leads to fetch all videos from a channel if explicitly set. - Removed
timeoutparameter fromHTTPConfigclass. - Removed
httpxlibrary since it is unused.
Fixed
- Fixed
PreviewRendererfails if metadata values are None. - Pypi downloading
mkdocsas first-dependency which is unnecesary. - Fixed
TranscriptFetcherdocstrings by @zhanglinqian
Summary
- Implement persistent
SQLitecaching for transcript fetching with configurable cache paths - Add
channel_data_to_rowsutility for convertingChannelDatato flat dictionaries for ML/RAG pipelines - Extend CLI with cache management commands (--no-cache, --cache-path, cache --clean)
- Add comprehensive tests for
PreviewRenderer, cache functionality, and CLI arguments - Fix
PreviewRendererto gracefully handle None metadata values
Contributors Of This Release
Thanks to everyone who contributed to YTFetcher.
v2.0
YTFetcher 2.0 Release
This release introduces major architectural improvements, performance enhancements, and important fixes that required breaking changes.
Version 2.0 establishes a cleaner and more predictable API while resolving critical data alignment issues present in previous releases.
⚠️ Important Notice
All versions prior to 2.0 contain a critical issue where metadata, transcripts, and comments may become misaligned. This issue could not be safely fixed without breaking API changes, and therefore the correction is released as part of this major version.
Users are strongly advised to upgrade.
Added
- Introduced a new
FetchOptionsdata class for defining fetcher options such as languages and filters. - Added a
--sortargument for selecting top or new comments in the CLI. - Added
from_searchsupport to both Python API and CLI, allowing fetching directly from a YouTube-style search query. - Added
--quietCLI flag. - Added pre-fetch filtering support.
Changed
- Removed deprecated
Exporterclass. - Eliminated network requests during object initialization.
YTFetchernow initializes the appropriateBaseYoutubeDLFetcherwithin class methods.TranscriptFetchernow creates aSessionper thread for improved thread safety.TranscriptFetchernow returnsVideoTranscriptinstead ofChannelData.- Exported files no longer include
Nonevalues, reducing noise and file size. - CLI arguments redesigned for improved usability.
- Python API is now silent by default; logging appears only in CLI or when verbose mode is enabled.
ytfetcheris now fully synchronous, simplifying usage and architecture.- CLI arguments simplifed with
channel,playlist,videoandsearchinstead offrom_prefixes.
Fixed
- Fixed critical bug where metadata, transcripts, and comments were not aligned.
- Fixed
HTTPConfigheader validation logic. - Improved
VideoListFetcherperformance viaThreadPoolExecutor. - Fixed issue where
CommentFetcherdid not correctly fetch top comments.
Breaking Changes
- Transcript fetching now returns
VideoTranscriptobjects. - CLI argument structure has changed.
- Initialization behavior changed to remove implicit network operations.
- Export behavior changed to omit
Nonevalues.
Users upgrading from previous versions may need to update integrations accordingly.
Summary
YTFetcher 2.0 provides a cleaner API, improved performance, thread safety, and resolves long-standing data consistency issues. This release sets a stable foundation for future development.
Update
Don't forget to update ytfetcher with:
pip install ytfetcher --upgradev.1.5.3
This release focuses on usability and developer experience, making it easier to inspect, debug, and trust fetched YouTube data before exporting it.
What’s new
Rich CLI Preview
-
Added a
PreviewRendererpowered by rich for beautifully formatted terminal output. -
Preview transcripts, metadata, and comments before exporting.
-
Designed for fast iteration and dataset inspection.
Improved CLI Output Handling
-
Clean separation between preview output and raw data dumping.
-
--stdoutnow prints structured data only, avoiding mixed console noise. -
Better behavior when working with large datasets and pipelines.
Documentation Improvements
-
Added a live CLI demo (asciinema) to the README.
-
Improved onboarding for new users with clearer examples and flow.
Upgrade
Don't forget to upgrade ytfetcher
pip install --upgrade ytfetcherv1.5
This release introduces major architectural improvements, including a new comment extraction feature and a flexible exporter system.
New Features
Comment Fetching Engine: High-performance extraction of YouTube comments.
-
Fetch comments alongside transcripts with
fetch_with_comments. -
Extract standalone comment datasets with
fetch_comments. -
Docker Support: Added Dockerfile and docker-compose.yml for a standardized development environment.
-
Pydantic v2 Integration: Robust data validation and automatic type-casting for all metadata.
Changes & Improvements
-
Refactored Exporters: Transitioned to specialized subclasses (JSONExporter, TXTExporter, CSVExporter) for better control over output formats.
-
Enhanced Mapping: Leveraged Pydantic Aliases to handle inconsistent YouTube metadata fields (e.g., _time_text to time_text).
Deprecation Notice
The monolithic Exporter class is now deprecated and replaced by format-specific subclasses. It will be removed in future versions of ytfetcher
Examples
Fetch comments with or without transcripts
To fetch comments for every video with transcript and metadata you can use --comments arg in CLI.
ytfetcher from_channel -c TEDx -m 20 --comments 10 -f csvIf you only need comments with metadata you can use --comments-only argument.
ytfetcher from_channel -c TEDx -m 20 --comments-only 10 -f jsonUsing New Exporters
You can now import individual exporter and use them individually for better control over data.
from ytfetcher.services import JSONExporter, TXTExporter
fetcher = YTFetcher.from_channel("TheOffice", max_results=5)
async def get_channel_data_with_comments():
channel_data = await fetcher.fetch_with_comments(max_comments=2)
txt_exporter = TXTExporter(channel_data=channel_data, allowed_metadata_list=['title', 'description']).write()
json_exporter = JSONExporter(channel_data=channel_data, allowed_metadata_list=['title', 'view_count']).write()Upgrade
Don't forget to upgrade ytfetcher
pip install --upgrade ytfetcherv1.4.1
What's New
Fix
- Fixed an issue where running the CLI with
from_video_idswould fail with a KeyError: 'url' if theyt_dlpresult for a video did not include a url key by @vabe44
Upgrade
Don't forget to upgrade ytfetcher
pip install --upgrade ytfetcher
Contribute
Found a bug or have an idea? Open an issue
v1.4
What's New
Fetching Only Manually Created Transcripts
You can now fetch only manually created transcripts with ytfetcher which allows you to get more precise transcripts. (Closes #2)
This feature could be used channels like TEDx which has more manually created transcripts.
Here how you can use it:
Python API
fetcher = YTFetcher.from_channel(channel_handle="TEDx", manually_created=True)CLI Usage
ytfetcher from_channel -c TEDx -m 50 --manually-createdFixed Transcript Cleaner Method
Transcript cleaner method wasn't removing >> signs correctly from transcripts. We fixed that.
Upgrade
Don't forget to upgrade ytfetcher
pip install --upgrade ytfetcherContribute
Found a bug or have an idea? Open an issue
v1.3.1
What's New
Fetch From Playlist ID
You can now fetch bulk videos from a playlist id using from_playlist_id method in both CLI and Python API:
Note: YTFetcher fetches up to 50 videos by default.
To fetch all videos from a playlist with more than 50 videos, set max_results to the total number of videos in that playlist.
We are planning to find a better solution for this in the future.
Using CLI
ytfetcher from_playlist_id -p playlistid123 -m 20Using Python API
fetcher = YTFetcher.from_playlist_id(playlist_id="playlistid123")Better Exporter
- Exporter now exports all available data as default.
- Added
--metadataargument forCLIto choose desired metadata. - You can also exclude
timingsfromtranscriptsusing--no-timingargument in CLI.
This update improves exporter especially in CLI side. Here couple of examples how you can use these new arguments:
Filtering Metadata
ytfetcher from_playlist_id -p playlistid123 --metadata title descriptionExcluding Timings From Transcripts
ytfetcher from_channel -c TheOffice --no-timingsAccepting Full URL
ytfetcher now can accept full urls for channel_handle and playlist_id parameters.
You can now input full url's like:
- https://www.youtube.com/@TheOffice for
channel_handle - https://www.youtube.com/playlist?list=PLuvRKGApO-zoF2WBPN2kW188YLke0Igv8 for
playlist_id
Full URL is not supported for from_video_ids method right now so you still have to provide exact video id's for it.
Trim @ Character
İf you accidently write channel handle with @ included, ytfetcher now fixes that for you. So now you can also write:
ytfetcher from_channel -c @TheOffice -m 20Upgrade
Don't forget to upgrade ytfetcher
pip install --upgrade ytfetcherContribute
Found a bug or have an idea? Open an issue