Skip to content

Latest commit

ย 

History

History
46 lines (36 loc) ยท 2.56 KB

File metadata and controls

46 lines (36 loc) ยท 2.56 KB

DocuTalk: RAG-Based Technical Document Assistant

DocuTalk: Local-First RAG Assistant for Technical PDFs

๐Ÿš€ What is DocuTalk?

DocuTalk is a local-first RAG assistant built to answer questions over technical PDF documentation with high precision and low hallucination risk.

Instead of relying on paid external APIs, it uses Ollama + DeepSeek (or similar models) for generation and Ollama embeddings for retrieval.
This makes the project cost-efficient, privacy-friendly, and production-minded.

โœจ Why this project stands out

  • Grounded answers: responses are based on retrieved document chunks, not pure model memory.
  • Local inference: full pipeline can run on your own machine.
  • Lower operational cost: no mandatory token billing from external providers.
  • Modular architecture: easy to swap models, embedding backends, and vector stores.
  • Analytics-ready: tracks latency, token estimates, and feedback for continuous improvement.

๐Ÿ› ๏ธ Tech Stack

  • Core: Python 3.10+
  • LLM Orchestration: LangChain
  • Local LLM Runtime: Ollama
  • Generation Model: DeepSeek (deepseek-r1, deepseek-coder) or similar local model
  • Interface: Streamlit
  • Vector Store: FAISS (Facebook AI Similarity Search) - Chosen for efficient dense vector clustering.
  • Embeddings: Ollama embeddings model (for example, nomic-embed-text or mxbai-embed-large)
  • Data Analysis: Pandas (for conversation logging and performance metrics)

๐Ÿค– Local Model Strategy

To keep costs low and improve privacy, DocuTalk is configured to run fully local when possible:

  • LLM responses: served by Ollama using DeepSeek (or equivalent).
  • Embeddings: generated by an Ollama embedding model for the FAISS index.
  • No dependency on ChatGPT API: optional cloud providers can still be added later if needed.

๐Ÿงฎ Mathematical Concept

The retrieval system is based on Cosine Similarity between high-dimensional vectors. Given a query vector $A$ and a document vector $B$, relevance is calculated as:

$$\text{similarity} = \cos(\theta) = \frac{A \cdot B}{|A| |B|}$$

๐Ÿ”ฎ Future Improvements (GraphRAG)

Currently, the retrieval is based on vector similarity chunks. The next roadmap step is to implement a Knowledge Graph approach (using Neo4j or NetworkX) to map relationships between entities in the document, allowing for multi-hop reasoning โ€“ leveraging my background in Graph Theory.

๐Ÿ“Š Analytics

The application logs user interactions to a CSV file to monitor:

  • Response latency.
  • Token usage.
  • User feedback loops.