Ask complex questions, get expert answers. DocuSage is an AI-powered chat agent that learns any technical documentation you provide and becomes a specialized assistant, helping you debug code, understand concepts, and build faster.
Have you ever been stuck trying to find a specific piece of information in dense, multi-page documentation? DocuSage solves this by leveraging a powerful AI architecture called Retrieval-Augmented Generation (RAG).
You provide a URL to a documentation website. DocuSage reads, processes, and indexes that content into a specialized knowledge base. You can then ask questions in natural language, and the AI will synthesize answers, grounded in the provided documentation, to help you with your development tasks. The inspiration for building this came when i was stuck with Qiskit errors!
- Web Crawler: Intelligently scrapes and cleans content from a starting URL and relevant sub-pages.
- AI-Powered Search: Uses vector embeddings to search for information by semantic meaning, not just keywords.
- Expert Q&A: Leverages Google's Gemini LLM to understand your questions, correct syntax errors, and provide expert-level answers.
- Trustworthy & Transparent: The "Show Sources" feature displays the exact text excerpts the AI used to generate its answer.
- Streaming Responses: Answers appear word-by-word for a fast, interactive experience.
- Modern UI: A clean, dark-themed, and easy-to-use interface built with Gradio.
DocuSage is built on a Retrieval-Augmented Generation (RAG) pipeline:
- Scrape & Crawl: The user provides a URL. The application scrapes the content and follows same-domain links to build a comprehensive text corpus.
- Chunk: The collected text is broken down into smaller, manageable chunks.
- Embed & Index: Each chunk is converted into a numerical representation (an embedding) using a
SentenceTransformermodel. These embeddings are stored in aChromaDBvector database. - Retrieve: When a user asks a question, it's also converted into an embedding. The vector database performs a similarity search to find the most relevant text chunks from the documentation.
- Generate: The user's question and the retrieved chunks are passed to the Gemini LLM with a specialized prompt. The LLM then generates a high-quality, context-aware answer.
- Backend: Python
- AI & ML:
- LLM: Google Gemini 1.5 Flash
- Embeddings:
sentence-transformers - Vector Database:
ChromaDB - Framework:
langchain
- Web Scraping:
requests,trafilatura,BeautifulSoup4 - UI:
Gradio
-
Clone the repository:
git clone [https://github.com/Bhavya445/docu-sage.git](https://github.com/Bhavya445/docu-sage) cd DocuSage -
Create and activate a virtual environment:
python3 -m venv venv source venv/bin/activate -
Install the required libraries:
pip install -r requirements.txt
-
Set up your Google AI API Key:
- Get your key from Google AI Studio.
- Set it as an environment variable in your terminal:
export GOOGLE_API_KEY='YOUR_API_KEY'
With your environment set up and the API key exported, start the application:
python app.pyOpen your web browser and navigate to the local URL provided in the terminal (usually http://1227.0.0.1:7860).

