Version: 1.0.0
Status: Development (SPE Hackathon Project)
An AI-powered wellbore analysis system leveraging Retrieval-Augmented Generation (RAG) to enable intelligent document analysis, real-time chat interactions, and data extraction from technical wellbore documents.
The Wellbore Data Agent is a full-stack application designed to help petroleum engineers and analysts extract valuable insights from technical wellbore documents. By combining modern AI, vector databases, and a responsive web interface, the system enables users to:
- Upload and process PDF documents containing wellbore data and technical information
- Query documents using natural language through an intelligent AI agent
- Extract insights including summaries, tables, and calculated analyses
- Perform nodal analysis calculations on wellbore data
- Real-time interaction via WebSocket for responsive, streaming conversations
wellbore-data-agent/
βββ backend/ # FastAPI backend service
β βββ app/
β β βββ main.py # FastAPI application entry point
β β βββ agents/ # AI agent logic (LangGraph-based)
β β β βββ agent_graph.py
β β β βββ extraction_graph.py
β β β βββ summarization_graph.py
β β β βββ langgraph_agent.py
β β βββ api/ # API endpoints and middleware
β β β βββ routes/ # API route handlers
β β β βββ middleware/ # CORS, error handling
β β β βββ deps.py # Dependency injection
β β βββ core/ # Core configuration
β β βββ db/ # Database management
β β βββ models/ # Pydantic data models
β β βββ rag/ # RAG pipeline components
β β β βββ chunking.py # Document chunking strategies
β β β βββ embeddings.py # Embedding generation
β β β βββ retriever.py # Document retrieval
β β β βββ vector_store_manager.py
β β βββ services/ # Business logic services
β β β βββ llm_service.py
β β β βββ document_service.py
β β β βββ conversation_service.py
β β βββ utils/ # Utility functions
β β βββ validation/ # Request/response validation
β βββ data/ # Data directory
β β βββ raw/ # Raw uploaded documents
β β βββ processed/ # Processed documents
β β βββ uploads/ # Temporary upload storage
β β βββ vector_db/ # Chroma vector database
β βββ scripts/ # Setup and utility scripts
β βββ requirements.txt # Python dependencies
β βββ README.md # Backend documentation
β
βββ frontend/ # React + Vite frontend application
β βββ src/
β β βββ App.tsx # Root component
β β βββ main.tsx # Entry point
β β βββ routes.tsx # Router configuration
β β βββ components/ # Reusable React components
β β βββ pages/ # Page components
β β βββ services/ # API client services
β β βββ store/ # Redux state management
β β βββ types/ # TypeScript type definitions
β β βββ layout/ # Layout components
β β βββ context/ # React context hooks
β βββ public/ # Static assets
β βββ package.json # Node.js dependencies
β
βββ docs/ # Documentation
β βββ architecture.md
β βββ api.md
β βββ agent-workflow.md
β βββ deployment.md
β
βββ docker-compose.yml # Multi-container orchestration
βββ README.md # This file
- Framework: FastAPI (Python)
- AI/ML:
- LangGraph for agent orchestration
- LangChain for LLM interactions
- Ollama for local LLM inference
- Vector Database: Chroma (persistent vector storage)
- Embeddings: Sentence Transformers
- Real-time Communication: WebSocket support via FastAPI
- Document Processing: PDF extraction using PDFMiner, pdfplumber, PyMuPDF
- Async Runtime: Uvicorn with async/await support
- Framework: React 19 with TypeScript
- Build Tool: Vite
- UI Components: Material-UI (MUI), custom Radix UI components
- State Management: Redux Toolkit
- Styling: Tailwind CSS
- HTTP Client: Axios
- Real-time: WebSocket integration for live chat
- Markdown: React-Markdown for rendered content
- Containerization: Docker & Docker Compose
- Communication: Backend (port 8000) β Frontend (port 5173)
- External: Ollama LLM service (port 11434)
- Upload PDFs: Drag-and-drop or file selection interface
- Automatic Processing: Documents are chunked and embedded into vector store
- Metadata Tracking: Tracks page count, word count, chunk count, and upload timestamps
- Document Retrieval: List all documents with detailed metadata
- Document Deletion: Remove documents and associated data
- Three Chat Endpoints:
/chat/- Simple query endpoint/chat/ask- Question-answering with confidence scores and source citations/chat/stream- Streaming responses for real-time interaction
- WebSocket Interface (
/ws/):- question: Get answers to queries about documents
- summarize: Generate document summaries
- extract_tables: Extract tables based on natural language queries
- Built on LangGraph for agentic workflows
- Tool-based architecture with specialized agents:
- Extraction Agent: Extract structured data from documents
- Summarization Agent: Generate concise summaries
- Analysis Agent: Perform calculations and analysis
- Document Chunking: Intelligent splitting with overlap for context preservation
- Embedding Generation: Dense embeddings using sentence-transformers
- Vector Search: Semantic similarity search via Chroma
- Context Retrieval: Top-K document chunk retrieval for LLM context
- Calculation framework for wellbore nodal analysis
- Currently includes mocked calculations with extensible architecture
/healthendpoint to check system status- Validates LLM service connectivity
- Monitors vector store health
- Reports detailed service status
- LLM & AI: langchain, langgraph, langchain-ollama, ollama
- Vector DB: chromadb, langchain-chroma
- Web: fastapi, uvicorn, python-socketio, websockets
- PDF Processing: pdfplumber, pdfminer.six, PyPDF2, PyMuPDF, camelot
- ML: torch, transformers, sentence-transformers, scikit-learn
- Data: pandas, pydantic, sqlalchemy
- Utilities: python-dotenv, tenacity, httpx
- React Ecosystem: react, react-dom, react-router-dom
- State: redux, @reduxjs/toolkit, react-redux
- UI: @mui/material, tailwindcss, lucide-react, react-icons
- Utilities: axios, marked, react-markdown, dompurify
- Forms: react-dropzone (for file uploads)
- Docker & Docker Compose
- OR
- Python 3.10+
- Node.js 18+
- Ollama (for local LLM inference)
-
Clone the repository:
git clone <repository-url> cd wellbore-data-agent
-
Start services:
docker-compose up --build
-
Access the application:
- Frontend: http://localhost:5173
- Backend API: http://localhost:8000
- API Documentation: http://localhost:8000/docs
-
Install Python dependencies:
cd backend pip install -r requirements.txt -
Configure environment:
cp .env.example .env # Edit .env with your settings: # - OLLAMA_BASE_URL (default: http://localhost:11434) # - OLLAMA_MODEL (default: llama2)
-
Start Ollama service (if using local LLM):
ollama serve
-
Run the application:
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
-
Install Node dependencies:
cd frontend npm install -
Start development server:
npm run dev
-
Access:
- Application: http://localhost:5173
User Input
β
Frontend (React)
β
WebSocket/HTTP to Backend
β
FastAPI Router
β
LangGraph Agent
ββ Document Retrieval (Vector Store)
ββ LLM Service (Ollama)
ββ Tool Execution (Extraction, Summarization, Analysis)
β
Response
β
Frontend Display
All Rights Reserved.
This project was originally developed for the SPE (Society of Petroleum Engineers) Hackathon and still under review. You are welcome to view the code, explore the architecture, and reference the approach for educational or evaluative purposes.
However, reuse, redistribution, or commercial use of the project is not permitted at this time without prior permission from the author.
Developed for the SPE (Society of Petroleum Engineers) Hackathon