Competitive Intelligence Database

An agentic AI system built with LangGraph that autonomously researches Honeywell's competitors in the pressure transmitter market. The LLM decides which tools to call, what to search, and when to stop. Results are stored in Neo4j (structured graph) and ChromaDB (evidence chunks for human in the loop verification).

Architecture

The system uses a LangGraph StateGraph with two nodes (agent and tools) in a loop:

agent node calls the LLM with bound tools. LLM decides which tools to call.
should_continue checks if LLM returned tool calls. If yes → go to tools, if no → end.
tools node executes the tool calls, results go back to agent, repeat until done.

Tools (LLM chooses which to call):

Tool	Purpose
`search_web`	Tavily web search
`extract_page_content`	Tavily page extraction + stores chunks in ChromaDB
`save_competitor`	Saves company with evidence link
`save_product`	Saves product + specs with evidence link
`research_customer_segments`	Finds customer groups in industry (LLM generates queries, stores evidence)
`map_segments_to_products`	Maps which products serve which customer segments
`research_industry_needs`	Searches 8+ sources, generates comprehensive needs report
`map_needs_from_report`	Extracts needs from report and maps to products
`generate_house_of_quality`	Creates QFD matrix mapping customer needs to specs
`get_current_progress`	Returns current research status
`finish_research`	Signals completion

Data Storage:

Store	Purpose
ChromaDB	Raw text chunks from web pages (evidence for verification)
Neo4j	Structured knowledge graph (Companies, Products, Specifications, CustomerNeeds, CustomerSegments)

How It Works

Agent Loop:

LLM receives the conversation history and decides which tools to call
If LLM returns tool calls → execute them, add results to conversation, go back to step 1
If LLM returns no tool calls (or calls finish_research) → end
Final data written to Neo4j

Research Strategy (Four Phases):

Phase 1: Find competitors and their products with specs
Phase 2: Research customer segments (who buys pressure transmitters) and map products to segments
Phase 3: Generate comprehensive industry needs report (from 8+ sources), then map needs to product specs
Phase 4: Build House of Quality (QFD) matrix mapping customer needs to specifications

Graph Structure:

Honeywell ─COMPETES_WITH→ Competitor ─OFFERS_PRODUCT→ Product ─HAS_SPEC→ Specification
                                                          │
                                                          ├─ADDRESSES_NEED→ CustomerNeed
                                                          │
                                                          └─ADDRESSES_CUSTOMER_SEGMENT→ CustomerSegment

Setup

1. Create Environment

# Using conda (recommended)
conda create -n ci_db python=3.11 -y
conda activate ci_db

# OR using venv
python -m venv .venv
source .venv/bin/activate   # Windows: .venv\Scripts\activate

2. Install Dependencies

pip install -r requirements.txt

3. Configure Environment Variables

Create a .env file in the project root:

OPENAI_API_KEY=sk-...
TAVILY_API_KEY=tvly-...
NEO4J_URI=bolt://localhost:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=your_password

4. Start Neo4j

Usage

Run the Pipeline

python main.py --iterations 20 --industry "oil and gas"

Launch Dashboard

python main.py --streamlit

# Or directly:
streamlit run streamlit_app.py

Verify Evidence

Evidence verification is done through the Streamlit dashboard's "✅ Verify Data" tab, which:

Shows all relationships from Neo4j
Retrieves original source evidence from ChromaDB
Displays the exact text and source URL for human verification

Streamlit Dashboard Features

Tab	Description
📊 Knowledge Graph	Interactive visualization of the Neo4j graph
🔄 Pipeline Architecture	Shows how LangGraph agent works
📚 Ontology	Spec definitions and normalization rules
📋 Specification Table	All products and their specs in a table
🔍 Compare Products	Side-by-side product comparison
✅ Verify Data	Human verification with ChromaDB evidence
🎯 Customer Needs	Industry needs report and product mappings
👥 Customer Segments	Customer groups with evidence and product mappings
🏠 House of Quality	QFD matrix with relationship weights and competitive scores
📈 Evaluation	Accuracy metrics comparing LLM extractions to source content

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
lib		lib
src		src
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
customer_segments.json		customer_segments.json
house_of_quality.json		house_of_quality.json
industry_report.json		industry_report.json
main.py		main.py
requirements.txt		requirements.txt
streamlit_app.py		streamlit_app.py
verify_evidence.py		verify_evidence.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Competitive Intelligence Database

Architecture

How It Works

Setup

1. Create Environment

2. Install Dependencies

3. Configure Environment Variables

4. Start Neo4j

Usage

Run the Pipeline

Launch Dashboard

Verify Evidence

Streamlit Dashboard Features

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Competitive Intelligence Database

Architecture

How It Works

Setup

1. Create Environment

2. Install Dependencies

3. Configure Environment Variables

4. Start Neo4j

Usage

Run the Pipeline

Launch Dashboard

Verify Evidence

Streamlit Dashboard Features

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages