This project is an AI-powered tool designed to extract, summarize, and answer questions based on YouTube video transcripts. By leveraging advanced technologies like LangChain, FAISS, and Retrieval-Augmented Generation (RAG), the tool provides concise summaries and precise answers to user queries. The system is built with a user-friendly interface using Gradio, making it accessible to a wide range of users.
- Video Transcript Extraction: Automatically fetches transcripts from YouTube videos.
- Summarization: Generates concise summaries of video content.
- Question Answering (QA): Answers specific user queries based on the video transcript.
- RAG Architecture: Combines retrieval-based methods with generative AI for accurate and context-aware responses.
- FAISS Integration: Utilizes Facebook AI Similarity Search (FAISS) for efficient vector storage and similarity search.
- User-Friendly Interface: Built with Gradio for an interactive and intuitive user experience.
The tool uses the youtube-transcript-api to fetch transcripts from YouTube videos. It supports both manually created and auto-generated transcripts, prioritizing the former for better accuracy.
The fetched transcript is processed and split into manageable chunks using the RecursiveCharacterTextSplitter from LangChain. This ensures that the text is optimized for embedding and retrieval.
The processed text chunks are converted into embeddings using the GoogleGenerativeAIEmbeddings model. These embeddings are stored in a FAISS index, enabling efficient similarity search.
The system employs a RAG architecture to combine retrieval-based methods with generative AI. When a user asks a question:
- Relevant transcript chunks are retrieved from the FAISS index based on the query.
- The retrieved context is passed to a language model (LLM) to generate a context-aware response.
The tool uses a predefined prompt template and an LLM to generate concise summaries of the video content. This makes it easier for users to grasp the main points of lengthy videos.
- LangChain: For building LLM-powered applications.
- FAISS: For vector storage and similarity search.
- Gradio: For creating an interactive user interface.
- Google Generative AI: For embeddings and language model capabilities.
- YouTube Transcript API: For fetching video transcripts.
- Clone the repository:
git clone https://github.com/your-repo/yt_summarizer.git cd yt_summarizer - Create a virtual environment and activate it:
python3 -m venv my_env source my_env/bin/activate - Install the required dependencies:
pip install -r requirements.txt
- Launch the application:
python ytbot_gemini.py
- Open the Gradio interface in your browser.
- Enter the YouTube video URL and choose to either summarize the video or ask a question about it.
- Build the Docker image (if not already built):
docker build -t yt_summarizer . - Run the Docker container:
docker run -p 7860:7860 yt_summarizer
- Open your browser and navigate to
http://localhost:7860.
Before running the application, ensure you create a .env file in the project root directory with the following content:
GOOGLE_API_KEY=your-google-api-key
Replace your-google-api-key with your actual Google API key for the Gemini model.
- Summarization:
- Input: YouTube video URL.
- Output: A concise summary of the video content.
- Question Answering:
- Input: YouTube video URL and a specific question.
- Output: A detailed answer based on the video transcript.
YouTube Video --> Transcript Extraction --> Text Processing --> Embedding --> FAISS Index --> RAG --> Summary/Answer
- Saves time by automating transcript analysis.
- Provides accurate and context-aware answers to user queries.
- Makes video content more accessible and insightful.
- Support for multilingual transcripts and queries.
- Integration with other video platforms.
- Advanced analytics and visualization for video content.
