This project enables users to:
- Convert audio and video files into transcribed text using OpenAI Whisper.
- Generate synthetic speech using
gTTS(Google Text-to-Speech) with male and female voices. - Create videos with generated audio overlaid on an image background.
- Automatic Speech Recognition (ASR): Converts spoken content from audio/video into text.
- Text-to-Speech (TTS): Generates male and female voices from the text.
- Video Generation: Combines generated speech with an image to create a video. (on working)
project-folder/
│── data_file/ # Folder for input audio/video files
│── transcript_file/ # Folder for storing transcriptions and generated files
│── project_file.py # Main script for processing audio/video files
│── audio.py # Script for generating audio from text
│── requirements.txt # Dependencies list
│── README.md # Project documentation
This project utilizes diverse audio and video data from multiple sources:
- Kaggle Dataset: Pre-existing audio datasets from Kaggle.
- Self-Recorded Videos: Custom video recordings for personalized content.
- YouTube Videos: Extracted audio and video for transcription and analysis.
- AI-Generated Videos: Videos generated using artificial intelligence tools.
- Text-to-Speech (TTS) Audio: Synthetic audio generated via TTS for experimentation.
git clone https://github.com/AritraOfficial/Media-Transcriber.git
cd Media-Transcriberpython -m venv venv
source venv/bin/activate # On macOS/Linux
venv\Scripts\activate # On Windowspip install -r requirements.txtFFmpeg is required for processing audio/video. Install it from:
- Windows: Download FFmpeg
- Linux/macOS:
sudo apt install ffmpeg # Ubuntu/Debian brew install ffmpeg # macOS (Homebrew)
ffmpeg -version # Check if FFmpeg is correctly installedModify the script to choose between male or female voice.
python audio.pypython video.pyThis will:
- Overlay generated speech onto
background.jpg. - Create a video file with the generated audio.
python project_file.pyThis will:
- Scan
data_file/for audio/video files. - Use OpenAI Whisper to transcribe them.
- Save transcriptions in the
transcript_file/folder.
- Add FFmpeg to System PATH:
- Open Environment Variables:
- Press Win + R, type sysdm.cpl, and hit Enter.
- Go to the Advanced tab → Click Environment Variables.
- Find the "Path" Variable:
- In System Variables, scroll down to Path → Click Edit.
- Click New, then add this path:
{
C:\ffmpeg_location/bin
}- Click OK and close all windows.
- Restart your terminal after installation.
- Manually Add FFmpeg Path in VS Code:
- Open VS Code.
- Go to Settings by pressing
Ctrl + Shift + P. - Search for
terminal.integrated.env.windowsin the search bar. - Click on Edit in settings.json to modify the configuration.
- Add the following lines inside the JSON file:
{
"terminal.integrated.env.windows": {
"Path": "C:\\ffmpeg\\bin;${env:Path}"
}
}Run:
pip install -r requirements.txtThis project is open-source and available under the MIT License.
- OpenAI Whisper for speech recognition.
- gTTS for text-to-speech conversion.
- MoviePy for video creation.
For issues or contributions, create a GitHub issue or reach out at Gmail.