🐼 Panda_Dive

Panda_Dive

领域深度搜索工具 - Deep Domain Research Tool

A powerful multi-agent deep research tool built with LangGraph and LangChain. Panda_Dive orchestrates multiple researcher agents to comprehensively explore any domain, synthesize findings, and generate detailed reports with retrieval quality safeguards.

📑 Table of Contents

Features
Recent Updates
Showcase
Architecture
Installation
Quick Start
Human-in-the-loop Steering
Configuration
How It Works
Documentation
Evaluation
Development
Project Structure
Contributing
License
Support

✨ Features

🤖 Multi-Agent Research System

Supervisory Agent: Intelligently delegates research tasks to multiple specialized researcher agents
Concurrent Execution: Run up to 20 research tasks in parallel for maximum efficiency
Dynamic Task Delegation: The supervisor adapts based on research progress and findings

🧠 Flexible LLM Support

Panda_Dive supports multiple LLM providers out of the box:

OpenAI (GPT-4, GPT-3.5)
Anthropic (Claude 3.5, Claude 3)
DeepSeek (DeepSeek V3)
Google (VertexAI, GenAI)
Groq (Llama, Mixtral)
AWS Bedrock

Configure different models for different research stages:

Research queries
Information compression
Summarization
Final report generation

📊 Smart Token Management

Automatic Truncation: Intelligently handles token limit errors
Retry Logic: Robust retry mechanism for failed tool calls
Context Optimization: Compresses research findings to stay within limits

🔧 Extensibility

MCP Integration: Extend tools via Model Context Protocol
LangSmith Tracing: Full observability and debugging support
Multiple Search APIs: Tavily, DuckDuckGo, Exa, ArXiv (DuckDuckGo is now the default - privacy-friendly and no API key required)

🎯 Retrieval Quality Loop

Query Rewriting: Expand queries to improve recall (supports both Tavily and DuckDuckGo)
Relevance Scoring: Score each result on a 0.0-1.0 scale
Reranking: Prioritize higher-quality sources before synthesis
Robust Error Handling: Graceful handling of connection issues for DuckDuckGo searches

🧭 Human-in-the-Loop Steering (HITL)

Per-Round Checkpoint: Optional pause after each supervisor tool-execution round
Two Resume Commands: /continue keeps direction, /steer <instruction> injects new guidance
Brief Replacement (No Duplication): Steering updates replace the previous brief message in supervisor_messages instead of appending duplicate briefs
Audit Fields in State: steering_history, steering_last_command, steering_warnings

Long-Term Memory

Persistent Research Memory: Optionally store reusable facts and episodic summaries across runs
Prompt Injection: Retrieve relevant memory and inject it into the supervisor brief before research begins
Namespace Isolation: Partition memory by owner, thread_id, and topic_hash
SQLite by Default: Persist memory locally with configurable retrieval and ranking controls

🆕 Recent Updates

Added a polished local frontend demo view for Panda_Dive research workflows
Added a complete sample context research report for quick output reference
Updated README with visual showcase and direct links to example assets
Added langmem-backed long-term memory with persistence, retrieval, and prompt injection support

🖼️ Showcase

Frontend Effect Demo

Panda_Dive Research Output Example

Sample report: example/context research report.md

Preview topic: Systematic Investigation of Context in LLM-based Agent Systems

The sample report demonstrates:

Conceptual overview and context taxonomy

Design patterns (dispatcher, state channels, event sourcing)

Multi-agent context lifecycle, trade-offs, and failure modes

Open challenges and research directions for 2025-2026

🏗️ Architecture

Panda_Dive uses a sophisticated multi-agent graph architecture with three hierarchical layers: Main Graph (entry point), Supervisor Subgraph (orchestration), and Researcher Subgraph (execution).

Main Graph

Entry point handling user interaction, research brief generation, and final report synthesis:

graph TD
    START([START]) --> CLARIFY[clarify_with_user]
    
    CLARIFY --"need_clarification=True"--> USER["🔄 Return to User<br/>with question"]
    USER -->|User response| CLARIFY
    
    CLARIFY --"need_clarification=False"--> BRIEF[write_research_brief]
    BRIEF --> SUPERVISOR["🧩 research_supervisor<br/>Subgraph Entry"]
    
    SUPERVISOR -->|All research<br/>completed| REPORT[final_report_generation]
    REPORT --> END([END])
    
    style START fill:#e1f5ff,stroke:#333,stroke-width:2px
    style END fill:#e1f5ff,stroke:#333,stroke-width:2px
    style CLARIFY fill:#fff3cd,stroke:#333
    style BRIEF fill:#d4edda,stroke:#333
    style SUPERVISOR fill:#f8dce0,stroke:#333,stroke-width:3px
    style REPORT fill:#cce5ff,stroke:#333
    style USER fill:#fff3cd,stroke:#666,stroke-dasharray: 5 5

Supervisor Subgraph

Orchestrates parallel research by dynamically spawning researcher subgraphs:

graph TB
    subgraph SUPERVISOR["🧩 Supervisor Subgraph"]
        START_S([START]) --> S[supervisor<br/>Lead Researcher]
        
        S --> ST{supervisor_tools<br/>Tool Router}
        
        %% Tool executions
        ST -->|think_tool| THINK["💭 Strategic Reflection"]
        THINK --> S
        
        ST -->|ConductResearch| SPAWN["🚀 Dynamic Subgraph Spawning"]
        
        %% Dynamic spawning detail
        subgraph DYNAMIC["🔄 Dynamic Concurrency Control"]
            SPAWN --> CHECK{"Within<br/>max_concurrent<br/>limit?"}
            CHECK -->|Yes| RESEARCHER["🧩 researcher_subgraph<br/>(Instance N)"]
            CHECK -->|No| OVERFLOW["⚠️ Overflow:<br/>Skip with error"]
            RESEARCHER -->|async gather| COLLECT["📊 Collect Results"]
            OVERFLOW --> COLLECT
        end
        
        COLLECT --> UPDATE["📝 Update State:<br/>• notes<br/>• raw_notes"]
        UPDATE --> S
        
        ST -->|ResearchComplete| DONE_S[Done]
        
        %% Loop conditions
        ST -.->|Iterations <<br/>max_researcher<br/>_iterations| S
    end
    
    style START_S fill:#e1f5ff
    style DONE_S fill:#d4edda
    style S fill:#f8dce0,stroke:#333,stroke-width:3px
    style ST fill:#fff3cd,stroke:#333
    style SPAWN fill:#d4edda,stroke:#333,stroke-width:2px
    style DYNAMIC fill:#f0f8ff,stroke:#666,stroke-dasharray: 3 3
    style RESEARCHER fill:#cce5ff,stroke:#333

Researcher Subgraph

Executes individual research tasks with the 6-step retrieval quality loop:

graph TB
    subgraph RESEARCHER["🧩 Researcher Subgraph"]
        START_R([START]) --> R[researcher<br/>Research Assistant]
        
        R --> RT{researcher_tools<br/>Tool Router}
        
        %% Tool executions
        RT -->|think_tool| THINK_R["💭 Strategic<br/>Reflection"]
        THINK_R --> R
        
        RT -->|Search Tool| RQL["🎯 Retrieval Quality Loop"]
        
        %% Retrieval Quality Loop detail
        subgraph RQL_DETAIL["🔄 Query → Results → Score → Rerank"]
            RQL --> REWRITE["1️⃣ Query Rewriting<br/>Generate N variants"]
            REWRITE --> SEARCH["2️⃣ Search Execution<br/>tavily/duckduckgo"]
            SEARCH --> PARSE["3️⃣ Result Parsing<br/>→ Structured dicts"]
            PARSE --> SCORE["4️⃣ Relevance Scoring<br/>LLM: 0.0-1.0"]
            SCORE --> RERANK["5️⃣ Reranking<br/>+ Source weight"]
            RERANK --> FORMAT["6️⃣ Format Results<br/>For researcher"]
            
            %% State tracking
            STATE["📊 State Tracking:<br/>• rewritten_queries<br/>• relevance_scores<br/>• reranked_results<br/>• quality_notes"]
        end
        
        FORMAT --> UPDATE_R["📝 Update State"]
        UPDATE_R --> R
        
        RT -->|MCP Tools| MCP["🔧 MCP Tools<br/>(Dynamic Loading)"]
        MCP --> R
        
        RT -->|ResearchComplete| COMPRESS[compress_research]
        
        COMPRESS --> DONE_R[Done]
        
        %% Loop conditions
        RT -.->|tool_calls <<br/>max_react<br/>_tool_calls| R
    end
    
    style START_R fill:#e1f5ff
    style DONE_R fill:#d4edda
    style R fill:#cce5ff,stroke:#333,stroke-width:3px
    style RT fill:#fff3cd,stroke:#333
    style RQL fill:#f8dce0,stroke:#333,stroke-width:2px
    style RQL_DETAIL fill:#fff5f5,stroke:#666,stroke-dasharray: 3 3
    style RESEARCHER fill:#cce5ff,stroke:#333
    style STATE fill:#f0f8ff,stroke:#999

Architecture Highlights

Layer	Components	Key Features
Main Graph	`clarify_with_user`, `write_research_brief`, `research_supervisor`, `final_report_generation`	User interaction, clarification loop, brief generation, report synthesis
Supervisor Subgraph	`supervisor`, `supervisor_tools`, Dynamic Spawning	Parallel research orchestration, concurrency control (`max_concurrent_research_units`), async subgraph spawning
Researcher Subgraph	`researcher`, `researcher_tools`, Retrieval Quality Loop, `compress_research`	Individual research execution, 6-step retrieval quality (rewrite → search → parse → score → rerank → format), MCP integration

Data Flow

User Query 
  → Main Graph (Clarification → Brief)
  → Supervisor Subgraph (Parallel delegation)
    → Researcher Subgraph Instance 1 (Quality Loop)
    → Researcher Subgraph Instance 2 (Quality Loop)
    → Researcher Subgraph Instance N (Quality Loop)
  → Main Graph (Synthesis → Report)
  → User

Each researcher subgraph executes the full retrieval quality loop: Query Rewriting → Search Execution → Result Parsing → Relevance Scoring → Reranking → Result Formatting, with all metrics tracked in state for observability.

📦 Installation

Prerequisites

Python 3.11 or higher
API keys for your chosen LLM provider(s)
(Optional) Tavily API key if using Tavily search (DuckDuckGo requires no API key)

Install from source

# Clone the repository
git clone https://github.com/123yongming/Panda_Dive.git
cd Panda_Dive

Linux/macOS

# Create virtual environment with uv
uv venv
source .venv/bin/activate

# Install dependencies
uv sync

Windows

# Create virtual environment
python -m venv venv
.\venv\Scripts\Activate

# Install uv and dependencies
pip install uv
uv pip install -r pyproject.toml

Alternative: Using pip directly

# Create virtual environment
python -m venv .venv

# Activate (Linux/macOS: source .venv/bin/activate, Windows: .venv\Scripts\activate)
source .venv/bin/activate

# Install in editable mode
pip install -e .

Configuration

Copy the example environment file and configure your API keys:

# Linux/macOS
cp .env.example .env

# Windows
copy .env.example .env

Edit .env with your credentials:

OPENAI_API_KEY=your_openai_key
ANTHROPIC_API_KEY=your_anthropic_key
GOOGLE_API_KEY=your_google_key
TAVILY_API_KEY=your_tavily_key
LANGSMITH_API_KEY=your_langsmith_key
LANGSMITH_PROJECT=panda_dive

🚀 Quick Start

Basic Usage

import asyncio

from Panda_Dive import Configuration, deep_researcher
from langchain_core.messages import HumanMessage

# Configure the researcher (DuckDuckGo is default - no API key needed!)
config = Configuration(
    max_researcher_iterations=6,
    max_concurrent_research_units=4,
    allow_clarification=True,
    research_model="openai:gpt-4o",
    final_report_model="openai:gpt-4o",
)

# Start research
topic = "What are the latest developments in quantum computing?"

async def main() -> None:
    result = await deep_researcher.ainvoke(
        {"messages": [HumanMessage(content=topic)]},
        config={"configurable": config.model_dump()},
    )
    print(result["messages"][-1].content)

asyncio.run(main())

Enable Long-Term Memory

config = Configuration(
    memory_enabled=True,
    memory_backend="sqlite",
    memory_sqlite_path=".memory/memory.sqlite3",
    memory_namespace_template="memory.owner.{owner}",
    memory_retrieval_top_k=8,
)

When memory is enabled, Panda_Dive persists extracted memory items and retrieves relevant context for later research runs.

Running with LangSmith

You can also run Panda_Dive as a LangGraph development server:

uvx --refresh --from "langgraph-cli[inmem]" --with-editable . --python 3.11 langgraph dev --allow-blocking --host 0.0.0.0 --port 2026

This will start the development server on http://localhost:2026 with in-memory storage, allowing you to interact with the deep researcher through the LangSmith UI.

🧭 Human-in-the-loop Steering

Steering is opt-in and requires a checkpointer (so the graph can interrupt and resume safely).

Python API Example

import asyncio

from langchain_core.messages import HumanMessage
from langgraph.checkpoint.memory import InMemorySaver
from langgraph.types import Command

from Panda_Dive import Configuration, build_deep_researcher


async def main() -> None:
    graph = build_deep_researcher(checkpointer=InMemorySaver())
    config = {
        "configurable": {
            "thread_id": "steering-demo",
            **Configuration(
                enable_steering=True,
                allow_clarification=False,
                steering_command_prefix="/steer",
                steering_continue_command="/continue",
            ).model_dump(),
        }
    }

    # Start a run. The graph may interrupt at a steering checkpoint.
    await graph.ainvoke(
        {"messages": [HumanMessage(content="Research AI coding agents in 2026")]},
        config=config,
    )

    # Resume with a steering directive.
    await graph.ainvoke(
        Command(resume="/steer prioritize official docs and benchmark-backed claims"),
        config=config,
    )

    # Or resume without changing direction.
    await graph.ainvoke(Command(resume="/continue"), config=config)


asyncio.run(main())

LangSmith Studio Quick Test

Start dev server:

uvx --refresh --from "langgraph-cli[inmem]" --with-editable . --python 3.11 langgraph dev --allow-blocking --host 0.0.0.0 --port 2026

In LangSmith Studio, set configurable values:
- enable_steering=true
- steering_command_prefix=/steer
- steering_continue_command=/continue
- thread_id=<any-stable-id>
Start a run, then resume from interrupt with:
- /steer focus on official sources and recent papers
- or /continue
Validate state fields after resume:
- research_brief updated when using /steer
- steering_history, steering_last_command, steering_warnings
- supervisor_messages keeps a single brief entry (old brief is replaced, not appended)

⚙️ Configuration

Key Options

Parameter	Type	Default	Description
`search_api`	str	`"duckduckgo"`	Search API to use: `duckduckgo` (default), `tavily`, `exa`, `arxiv`, or `none`
`max_researcher_iterations`	int	`6`	Maximum iterations per researcher (1-10)
`max_react_tool_calls`	int	`6`	Maximum tool calls per reaction (1-30)
`max_concurrent_research_units`	int	`4`	Parallel research tasks (1-20)
`allow_clarification`	bool	`True`	Ask clarifying questions before research
`enable_steering`	bool	`False`	Enable per-round human steering checkpoints in supervisor loop
`steering_command_prefix`	str	`"/steer"`	Command prefix for steering directives
`steering_continue_command`	str	`"/continue"`	Command to continue without modifying the brief
`memory_enabled`	bool	`False`	Enable long-term memory extraction, retrieval, and injection
`memory_backend`	str	`"sqlite"`	Memory backend: `sqlite` or `langgraph_store`
`memory_sqlite_path`	str	`".memory/memory.sqlite3"`	SQLite path for persisted memory data
`memory_namespace_template`	str	`"memory.owner.{owner}"`	Namespace template supporting `{owner}`, `{thread_id}`, and `{topic_hash}`
`memory_retrieval_top_k`	int	`8`	Number of memory facts retrieved before prompt injection
`model`	str	`"openai:gpt-4o"`	Default model for research
`query_variants`	int	`3`	Number of query variants for retrieval quality
`relevance_threshold`	float	`0.7`	Minimum relevance score threshold
`rerank_top_k`	int	`10`	Number of documents after reranking
`rerank_weight_source`	str	`"auto"`	Source weighting strategy for reranking

🔍 How It Works

Research Process

Clarification (Optional)
- Asks clarifying questions to understand research scope
- User can confirm or modify the research brief
Research Brief Generation
- Creates a structured brief based on the topic
- Identifies key areas to investigate
Supervised Research
- Supervisor delegates specific research tasks
- Multiple researcher agents work in parallel
- Each researcher explores their assigned subtopic
Steering Checkpoint (Optional)
- Interrupts after each supervisor tool round when enable_steering=True
- Accepts /steer <instruction> or /continue
- If steered, updates research_brief and replaces old brief message in supervisor context
Research Synthesis
- Compresses individual findings to fit context
- Synthesizes cross-cutting insights
Final Report
- Generates comprehensive, well-structured report
- Includes citations and sources

📚 Documentation

docs/README.md - Documentation index
docs/architecture.md - Architecture overview and component boundaries
docs/evaluation-guide.md - Evaluation workflow and benchmark card generation
docs/contributing.md - Contribution path and quality gates
docs/future-direction.md - 2026-2028 direction and phased roadmap

🧪 Evaluation

Panda_Dive includes a comprehensive evaluation framework using LangSmith to benchmark the deep research system against the "Deep Research Bench" dataset.

Environment Variables

Before running evaluations, ensure these environment variables are set:

Variable	Required	Description
`LANGSMITH_API_KEY`	Yes	LangSmith API key for evaluation tracking
`OPENAI_API_KEY`	No*	OpenAI API key (if using OpenAI models)
`ANTHROPIC_API_KEY`	No*	Anthropic API key (if using Claude models)
`DEEPSEEK_API_KEY`	No*	DeepSeek API key (if using DeepSeek models)

*Required only if using the respective provider's models.

Smoke Test (Quick Validation)

Run a quick smoke test on 2 examples to validate the setup:

# Basic smoke test (2 examples, default settings)
python tests/run_evaluate.py --smoke --dataset-name "deep_research_bench"

Supervisor Parallelism Evaluation

This evaluation measures both intended parallelism (tool-call count) and observed parallelism (span overlap) for the supervisor.

# Create the dataset (one-time setup)
python tests/create_supervisor_parallelism_dataset.py \
  --dataset-name "Panda_Dive: Supervisor Parallelism" \
  --source tests/prompt/supervisor_parallelism.jsonl

# Run the evaluation
python tests/run_evaluate.py \
  --dataset-name "Panda_Dive: Supervisor Parallelism" \
  --max-concurrency 1 \
  --experiment-prefix "supervisor-parallel"

Metrics produced:

tool_call_count_match: Whether actual tool calls match the reference count
parallel_overlap_ms: Total overlap time (ms) across trace spans

Full Evaluation

Run a full evaluation on the entire dataset (⚠️ Warning: Expensive!):

# Full evaluation (all dataset examples)
python tests/run_evaluate.py --full

Configuration Options

Flag	Default	Description
`--smoke`	-	Run smoke test (2 examples)
`--full`	-	Run full evaluation (all examples)
`--dataset-name`	"Deep Research Bench"	Dataset name in LangSmith
`--max-examples`	2 (smoke) / all (full)	Maximum examples to evaluate
`--experiment-prefix`	Auto-generated	Prefix for experiment name
`--max-concurrency`	2	Maximum concurrent evaluations (max: 5)
`--timeout-seconds`	1800	Per-example timeout (seconds)
`--model`	From env/config	Model to use for evaluation

Cost Warning

⚠️ Full evaluation runs can be expensive! A full run on the "Deep Research Bench" dataset can cost $50-200+ depending on the model used. Always:

Run a smoke test first to validate setup
Monitor LangSmith during the run
Start with lower concurrency to control costs

Exporting Results

After evaluation, export results to JSONL format:

# Export results using experiment project name
python tests/extract_langsmith_data.py \
  --project-name "deep-research-eval-smoke-20250204-120000" \
  --model-name "gpt-4o" \
  --output-dir tests/expt_results/

# Force overwrite if file exists
python tests/extract_langsmith_data.py \
  --project-name "your-experiment-name" \
  --model-name "claude-3-5-sonnet" \
  --force

Export Options

Flag	Required	Default	Description
`--project-name`	Yes	-	LangSmith project name containing the experiment runs
`--model-name`	Yes	-	Model name (used for output filename)
`--dataset-name`	No	"Deep Research Bench"	Dataset name for validation
`--output-dir`	No	`tests/expt_results/`	Output directory for JSONL file
`--force`	No	`false`	Overwrite existing file if it exists

🧪 Development

Running Tests

# Run all tests
python -m pytest

# Run with verbose output
python -m pytest -v

# Run with coverage
python -m pytest --cov=Panda_Dive

# Run specific test
python -m pytest src/test_api.py::test_function_name

Linting and Formatting

# Check code style
ruff check .

# Auto-fix issues
ruff check --fix .

# Type checking
mypy src/Panda_Dive/

Code Style Guidelines

Python 3.10+ type hints (e.g., list[str], not List[str])
Google-style docstrings
Async/await patterns for all graph nodes
Proper error handling and logging

See AGENTS.md for detailed development guidelines.

📂 Project Structure

Panda_Dive/
├── docs/
│   └── retrieval-quality-loop.md  # Retrieval quality loop report
├── src/
│   └── Panda_Dive/
│       ├── __init__.py           # Package exports
│       ├── deepresearcher.py     # Main graph orchestration
│       ├── configuration.py       # Pydantic configuration models
│       ├── state.py               # TypedDict state definitions
│       ├── prompts.py             # System prompts for LLMs
│       └── utils.py               # Tool wrappers and helpers
├── pyproject.toml                # Project configuration
├── .env.example                  # Environment variables template
├── AGENTS.md                     # Agent development guidelines
└── README.md                     # This file

🤝 Contributing

We welcome contributions! Here's how to get started:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Make your changes following our code style guidelines
Run tests and linting (pytest and ruff)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Development Workflow

Follow PEP 8 and our ruff configuration
Add tests for new features
Update documentation as needed
Ensure type hints are complete

📄 License

This project is licensed under the MIT License.

🙏 Acknowledgments

Built with:

LangGraph - Graph-based orchestration
LangChain - LLM application framework
Pydantic - Data validation

📞 Support

📖 Read the AGENTS.md for development guidelines
🐛 Report issues on GitHub Issues
💬 Explore more projects by PonyPan

Made with ❤️ by PonyPan

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
.github/workflows		.github/workflows
docs		docs
example		example
image		image
local_demo		local_demo
src/Panda_Dive		src/Panda_Dive
tests		tests
.env.example		.env.example
.gitignore		.gitignore
AGENTS.md		AGENTS.md
README.md		README.md
langgraph.json		langgraph.json
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

🐼 Panda_Dive

Panda_Dive

📑 Table of Contents

✨ Features

🤖 Multi-Agent Research System

🧠 Flexible LLM Support

📊 Smart Token Management

🔧 Extensibility

🎯 Retrieval Quality Loop

🧭 Human-in-the-Loop Steering (HITL)

Long-Term Memory

🆕 Recent Updates

🖼️ Showcase

Frontend Effect Demo

Panda_Dive Research Output Example

🏗️ Architecture

Main Graph

Supervisor Subgraph

Researcher Subgraph

Architecture Highlights

Data Flow

📦 Installation

Prerequisites

Install from source

Linux/macOS

Windows

Alternative: Using pip directly

Configuration

🚀 Quick Start

Basic Usage

Enable Long-Term Memory

Running with LangSmith

🧭 Human-in-the-loop Steering

Python API Example

LangSmith Studio Quick Test

⚙️ Configuration

Key Options

🔍 How It Works

Research Process

📚 Documentation

🧪 Evaluation

Environment Variables

Smoke Test (Quick Validation)

Supervisor Parallelism Evaluation

Full Evaluation

Configuration Options

Cost Warning

Exporting Results

Export Options

🧪 Development

Running Tests

Linting and Formatting

Code Style Guidelines

📂 Project Structure

🤝 Contributing

Development Workflow

📄 License

🙏 Acknowledgments

📞 Support

About

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages