A Streamlit-based Retrieval-Augmented Generation (RAG) chatbot that answers questions from websites, PDFs, or raw text using embeddings, FAISS vector search, and a Groq-powered LLM.
- Website-based Q&A
- PDF-based Q&A
- Text-based Q&A
- Chat-style UI with history
- New Chat (session reset)
- Apply button (runs only on click)
- No hallucinations (context-only answers)
- Deployment-ready (Streamlit Cloud / AWS)
- User selects Website / PDF / Text
- Content is ingested and split into chunks
- Chunks are embedded and stored in FAISS
- User asks a question
- Relevant chunks are retrieved
- LLM generates an answer strictly from context
- Answer is shown in Streamlit UI
│
├── app.py # Streamlit UI (Frontend)
├── main.py # Backend Orchestrator
├── requirements.txt
├── README.md
├── .env
├── .gitignore
│
├── embeddings/ # FAISS vector database (auto-created)
├── logs/ # Application logs
├── uploaded_pdfs/ # Uploaded PDF files (optional use)
│
├── src/
│ ├── pycache/
│ │
│ ├── components/
│ │ ├── init.py
│ │ └── ragchatbot.py # RAG + Groq LLM logic
│ │
│ ├── datatransformer/
│ │ ├── init.py
│ │ ├── webdatatransfer.py # Website text extraction
│ │ ├── textdatatransfer.py# Text & PDF text splitting
│ │ └── pdfdatatransfer.py # (Optional PDF logic)
│ │
│ └── utils/
│ ├── init.py
│ ├── dataembedding.py # Embedding creation
│ └── dataingestion.py # Website ingestion logic
git clone https://github.com/kumar-kiran-24/chatbot
pip install -r requirments.txt
cd chatbot
streamlit run app.py.env
GROQ_API_KEY=your_groq_api_key_here