DocuTalk is a local-first RAG assistant built to answer questions over technical PDF documentation with high precision and low hallucination risk.
Instead of relying on paid external APIs, it uses Ollama + DeepSeek (or similar models) for generation and Ollama embeddings for retrieval.
This makes the project cost-efficient, privacy-friendly, and production-minded.
- Grounded answers: responses are based on retrieved document chunks, not pure model memory.
- Local inference: full pipeline can run on your own machine.
- Lower operational cost: no mandatory token billing from external providers.
- Modular architecture: easy to swap models, embedding backends, and vector stores.
- Analytics-ready: tracks latency, token estimates, and feedback for continuous improvement.
- Core: Python 3.10+
- LLM Orchestration: LangChain
- Local LLM Runtime: Ollama
- Generation Model: DeepSeek (
deepseek-r1,deepseek-coder) or similar local model - Interface: Streamlit
- Vector Store: FAISS (Facebook AI Similarity Search) - Chosen for efficient dense vector clustering.
- Embeddings: Ollama embeddings model (for example,
nomic-embed-textormxbai-embed-large) - Data Analysis: Pandas (for conversation logging and performance metrics)
To keep costs low and improve privacy, DocuTalk is configured to run fully local when possible:
- LLM responses: served by Ollama using DeepSeek (or equivalent).
- Embeddings: generated by an Ollama embedding model for the FAISS index.
- No dependency on ChatGPT API: optional cloud providers can still be added later if needed.
The retrieval system is based on Cosine Similarity between high-dimensional vectors. Given a query vector
Currently, the retrieval is based on vector similarity chunks. The next roadmap step is to implement a Knowledge Graph approach (using Neo4j or NetworkX) to map relationships between entities in the document, allowing for multi-hop reasoning โ leveraging my background in Graph Theory.
The application logs user interactions to a CSV file to monitor:
- Response latency.
- Token usage.
- User feedback loops.