An agentic AI system built with LangGraph that autonomously researches Honeywell's competitors in the pressure transmitter market. The LLM decides which tools to call, what to search, and when to stop. Results are stored in Neo4j (structured graph) and ChromaDB (evidence chunks for human in the loop verification).
The system uses a LangGraph StateGraph with two nodes (agent and tools) in a loop:
agentnode calls the LLM with bound tools. LLM decides which tools to call.should_continuechecks if LLM returned tool calls. If yes → go to tools, if no → end.toolsnode executes the tool calls, results go back to agent, repeat until done.
Tools (LLM chooses which to call):
| Tool | Purpose |
|---|---|
search_web |
Tavily web search |
extract_page_content |
Tavily page extraction + stores chunks in ChromaDB |
save_competitor |
Saves company with evidence link |
save_product |
Saves product + specs with evidence link |
research_customer_segments |
Finds customer groups in industry (LLM generates queries, stores evidence) |
map_segments_to_products |
Maps which products serve which customer segments |
research_industry_needs |
Searches 8+ sources, generates comprehensive needs report |
map_needs_from_report |
Extracts needs from report and maps to products |
generate_house_of_quality |
Creates QFD matrix mapping customer needs to specs |
get_current_progress |
Returns current research status |
finish_research |
Signals completion |
Data Storage:
| Store | Purpose |
|---|---|
| ChromaDB | Raw text chunks from web pages (evidence for verification) |
| Neo4j | Structured knowledge graph (Companies, Products, Specifications, CustomerNeeds, CustomerSegments) |
Agent Loop:
- LLM receives the conversation history and decides which tools to call
- If LLM returns tool calls → execute them, add results to conversation, go back to step 1
- If LLM returns no tool calls (or calls
finish_research) → end - Final data written to Neo4j
Research Strategy (Four Phases):
- Phase 1: Find competitors and their products with specs
- Phase 2: Research customer segments (who buys pressure transmitters) and map products to segments
- Phase 3: Generate comprehensive industry needs report (from 8+ sources), then map needs to product specs
- Phase 4: Build House of Quality (QFD) matrix mapping customer needs to specifications
Graph Structure:
Honeywell ─COMPETES_WITH→ Competitor ─OFFERS_PRODUCT→ Product ─HAS_SPEC→ Specification
│
├─ADDRESSES_NEED→ CustomerNeed
│
└─ADDRESSES_CUSTOMER_SEGMENT→ CustomerSegment
# Using conda (recommended)
conda create -n ci_db python=3.11 -y
conda activate ci_db
# OR using venv
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activatepip install -r requirements.txtCreate a .env file in the project root:
OPENAI_API_KEY=sk-...
TAVILY_API_KEY=tvly-...
NEO4J_URI=bolt://localhost:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=your_passwordpython main.py --iterations 20 --industry "oil and gas"python main.py --streamlit
# Or directly:
streamlit run streamlit_app.pyEvidence verification is done through the Streamlit dashboard's "✅ Verify Data" tab, which:
- Shows all relationships from Neo4j
- Retrieves original source evidence from ChromaDB
- Displays the exact text and source URL for human verification
| Tab | Description |
|---|---|
| 📊 Knowledge Graph | Interactive visualization of the Neo4j graph |
| 🔄 Pipeline Architecture | Shows how LangGraph agent works |
| 📚 Ontology | Spec definitions and normalization rules |
| 📋 Specification Table | All products and their specs in a table |
| 🔍 Compare Products | Side-by-side product comparison |
| ✅ Verify Data | Human verification with ChromaDB evidence |
| 🎯 Customer Needs | Industry needs report and product mappings |
| 👥 Customer Segments | Customer groups with evidence and product mappings |
| 🏠 House of Quality | QFD matrix with relationship weights and competitive scores |
| 📈 Evaluation | Accuracy metrics comparing LLM extractions to source content |