Skip to content

Bhavya445/docu-sage

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🧠 DocuSage: Your Documentation Expert

made-with-python gradio-badge

Ask complex questions, get expert answers. DocuSage is an AI-powered chat agent that learns any technical documentation you provide and becomes a specialized assistant, helping you debug code, understand concepts, and build faster.


Flowchart

DocuSage Interface Screenshot

📖 Description

Have you ever been stuck trying to find a specific piece of information in dense, multi-page documentation? DocuSage solves this by leveraging a powerful AI architecture called Retrieval-Augmented Generation (RAG).

You provide a URL to a documentation website. DocuSage reads, processes, and indexes that content into a specialized knowledge base. You can then ask questions in natural language, and the AI will synthesize answers, grounded in the provided documentation, to help you with your development tasks. The inspiration for building this came when i was stuck with Qiskit errors!

✨ Features

  • Web Crawler: Intelligently scrapes and cleans content from a starting URL and relevant sub-pages.
  • AI-Powered Search: Uses vector embeddings to search for information by semantic meaning, not just keywords.
  • Expert Q&A: Leverages Google's Gemini LLM to understand your questions, correct syntax errors, and provide expert-level answers.
  • Trustworthy & Transparent: The "Show Sources" feature displays the exact text excerpts the AI used to generate its answer.
  • Streaming Responses: Answers appear word-by-word for a fast, interactive experience.
  • Modern UI: A clean, dark-themed, and easy-to-use interface built with Gradio.

⚙️ How It Works (The RAG Pipeline)

DocuSage is built on a Retrieval-Augmented Generation (RAG) pipeline:

  1. Scrape & Crawl: The user provides a URL. The application scrapes the content and follows same-domain links to build a comprehensive text corpus.
  2. Chunk: The collected text is broken down into smaller, manageable chunks.
  3. Embed & Index: Each chunk is converted into a numerical representation (an embedding) using a SentenceTransformer model. These embeddings are stored in a ChromaDB vector database.
  4. Retrieve: When a user asks a question, it's also converted into an embedding. The vector database performs a similarity search to find the most relevant text chunks from the documentation.
  5. Generate: The user's question and the retrieved chunks are passed to the Gemini LLM with a specialized prompt. The LLM then generates a high-quality, context-aware answer.

🛠️ Tech Stack

  • Backend: Python
  • AI & ML:
    • LLM: Google Gemini 1.5 Flash
    • Embeddings: sentence-transformers
    • Vector Database: ChromaDB
    • Framework: langchain
  • Web Scraping: requests, trafilatura, BeautifulSoup4
  • UI: Gradio

🚀 Setup and Installation

  1. Clone the repository:

    git clone [https://github.com/Bhavya445/docu-sage.git](https://github.com/Bhavya445/docu-sage)
    cd DocuSage
  2. Create and activate a virtual environment:

    python3 -m venv venv
    source venv/bin/activate
  3. Install the required libraries:

    pip install -r requirements.txt
  4. Set up your Google AI API Key:

    • Get your key from Google AI Studio.
    • Set it as an environment variable in your terminal:
    export GOOGLE_API_KEY='YOUR_API_KEY'

▶️ How to Run

With your environment set up and the API key exported, start the application:

python app.py

Open your web browser and navigate to the local URL provided in the terminal (usually http://1227.0.0.1:7860).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages