Skip to content

pranotosh2/Multi-Source-RAG-App

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📄 RAG-based Question Answering & Summarization App

A Retrieval-Augmented Generation (RAG) application built using LangChain, Hugging Face LLMs, FAISS, and Streamlit.
This app allows users to upload or provide content from PDFs, Text files, Web URLs, or YouTube links, clean and process the content, and perform question answering and summarization.

📸 Application Preview

RAG App UI

🚀 Features

  • 📄 Upload PDF documents
  • 📝 Upload Text (.txt) files
  • 🌐 Paste Website URLs
  • 🎥 Paste YouTube video links (automatic transcript extraction)
  • 🧹 Automatic text cleaning & normalization
  • ✂️ Smart chunking using LangChain splitters
  • 🧠 Hugging Face embeddings
  • 🔍 FAISS vector store for similarity search
  • ❓ Context-aware Question Answering
  • 📘 Large document summarization
  • 🎨 Styled Streamlit UI

🏗️ Project Structure

Multi-Source-RAG-App/
│
├── app.py                     # Streamlit application
├── qna.py                     # Q&A and summarization chains
├── process.py                 # File processing logic
│
├── loader/
│   └── loader.py              # PDF / Text / Web / YouTube loaders & cleaners
│
├── embeddings/
│   └── embedding.py           # Embedding model loader
│
│
├── requirements.txt
├── README.md
└── .gitignore
---

🧠 Tech Stack

  • Python
  • Streamlit
  • LangChain
  • Hugging Face Inference API
  • FAISS
  • Sentence Transformers
  • YouTube Transcript API

1️⃣ Clone the repository

git clone https://github.com/pranotosh2/Multi-Source-RAG-App.git
cd REPO_NAME

2️⃣ Create virtual environment
conda create -n rag python=3.11
conda activate rag

3️⃣ Install dependencies

pip install -r requirements.txt

🔑 Environment Variables

Create a .env file:

HUGGINGFACEHUB_API_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxx

▶️ Run the Application

streamlit run app.py

🧩 How It Works

1.Load content (PDF / Text / Web / YouTube)
2. Clean and normalize text
3. Split text into chunks
4. Generate embeddings
5. Store vectors in FAISS
6. Retrieve relevant chunks
7. Generate answer or summary using LLM

👨‍💻 Author

Developed as a portfolio-ready NLP & RAG project using modern LLM tooling.

About

Summarization of PDF, TEXT, WEB link and Youtube link and questions and answers

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages