Local RAG-based assistant for analyzing construction documents (contracts, technical specs, RFIs, drawing notes) using a local LLM via Ollama.
This application allows users to upload PDF/TXT documents and perform:
- General summaries
- Risk & obligation analysis
- Custom Q&A grounded in document excerpts
Runs fully locally using:
- Streamlit (UI)
- Ollama (local LLM + embeddings)
- LangChain (RAG pipeline)
- uv (environment & dependency management)
- Multi-document upload
- Automatic text extraction & chunking
- Local embedding + vector search
- RAG prompt construction
- Structured answers & retrieved source transparency
See INSTALLATION.md for full setup instructions.
Although this prototype is designed to run locally, the application can be deployed in a production-style environment with minimal changes. Since the system relies on Ollama (local LLM runtime), deployment requires a server capable of running both the Streamlit interface and the Ollama backend.
- Single Linux VPS or on-prem server, always on
- Ollama installed on the server (runs the LLM and embedding model)
- Streamlit application running as a service
- Reverse proxy (Nginx) to expose the app via HTTPS on a custom domain
- Optional: Docker + docker-compose for containerized deployment (Ollama + app as separate services)
- Install Python, uv, Git and Ollama on the server
- Clone this repository and install dependencies
- Pull the required Ollama models (
llama3.1:8b,mxbai-embed-large) - Run the Streamlit app and bind it to
0.0.0.0 - Configure Nginx as a reverse proxy and enable HTTPS
- Optionally containerize everything with Docker for reproducibility
- Streamlit Cloud and other serverless hosting platforms cannot support Ollama, because Ollama requires local GPU/CPU resources and persistent model storage.
- For production reliability, running the app in Docker or under a process manager (e.g., systemd, pm2, supervisor) is recommended.
- The system still runs fully offline even in production, since all models and embeddings are stored on the server.
MIT License
The current prototype uses a pure RAG (Retrieval-Augmented Generation) approach:
the LLM itself is not modified or retrained. All reasoning is grounded in the text extracted from
the documents that the user uploads during each session.
Because the application is intentionally general-purpose, it can be used for contracts,
specifications, RFIs, drawing notes, or any other construction-related documentation.
However, in real organizational workflows, certain teams receive repetitive patterns of documents
and would benefit from a system that gradually adapts to their domain.
Below are potential future extensions in that direction.
Rather than indexing only the documents uploaded during the session, the system could maintain a long-term, curated vector store containing organizational knowledge, such as:
- past contract documents
- previous tender packages
- internal templates and procedures
- technical standards and compliance rules
- approved interpretations of contractual clauses
- successful responses and analyses from earlier projects
This enables the assistant to produce answers that reflect not only the current documents, but also the accumulated experience of the organization.
Example — Bidding Office:
A bidding office (ufficio gare) is an ideal case because it receives tender documents that often
follow similar structures (BoQs, specifications, instructions to bidders, contract drafts).
A persistent vector store allows the system to:
- reuse knowledge from previous bids,
- maintain consistency in how risks and compliance points are analyzed,
- and speed up repetitive tasks (e.g., summarizing “Condizioni speciali”, payment terms, warranties).
Users may correct or approve the model’s responses.
These approved “gold samples” could then be stored and reused as few-shot examples for future analyses.
This form of adaptation does not retrain the LLM; instead, it enriches the prompting layer.
Possible use cases:
- Bidding office: preferred structure for technical offers or clarifications
- Legal team: approved interpretations of contract clauses
- QA/QC: standardized responses to nonconformance issues
- Designers: common notes, coordination rules, or project-specific conventions
This approach improves consistency without requiring GPUs or heavy training.
A more ambitious evolution would be to fine-tune the underlying LLM (e.g., via LoRA adapters) with domain-specific data such as:
- historical tenders with high-quality responses
- technical specifications frequently analyzed
- internal guidelines and formal templates
Fine-tuning allows the model to internalize:
- specific terminology used by the company
- preferred writing style
- internal standards or engineering logic
However, this requires:
- GPUs or significant CPU time
- dataset preparation and curation
- evaluation and versioning of the model
- strong controls around confidentiality
For these reasons, full fine-tuning is considered out of scope for this MVP, but represents a realistic long-term evolution for organizations with high document volume — especially bidding offices that handle large, repetitive tender packages.
These improvements do not change the core nature of the application (a general-purpose document analyzer), but illustrate how it could evolve into a company-aware assistant capable of adapting to specific workflows — such as bidding offices, legal departments, design coordination teams, or construction management units.