Skip to content

123yongming/Panda_Dive

Repository files navigation

🐼 Panda_Dive

Panda_Dive

领域深度搜索工具 - Deep Domain Research Tool

Python Version License Version LangGraph LangChain Last Commit Stars

A powerful multi-agent deep research tool built with LangGraph and LangChain. Panda_Dive orchestrates multiple researcher agents to comprehensively explore any domain, synthesize findings, and generate detailed reports with retrieval quality safeguards.

📑 Table of Contents


✨ Features

🤖 Multi-Agent Research System

  • Supervisory Agent: Intelligently delegates research tasks to multiple specialized researcher agents
  • Concurrent Execution: Run up to 20 research tasks in parallel for maximum efficiency
  • Dynamic Task Delegation: The supervisor adapts based on research progress and findings

🧠 Flexible LLM Support

Panda_Dive supports multiple LLM providers out of the box:

  • OpenAI (GPT-4, GPT-3.5)
  • Anthropic (Claude 3.5, Claude 3)
  • DeepSeek (DeepSeek V3)
  • Google (VertexAI, GenAI)
  • Groq (Llama, Mixtral)
  • AWS Bedrock

Configure different models for different research stages:

  • Research queries
  • Information compression
  • Summarization
  • Final report generation

📊 Smart Token Management

  • Automatic Truncation: Intelligently handles token limit errors
  • Retry Logic: Robust retry mechanism for failed tool calls
  • Context Optimization: Compresses research findings to stay within limits

🔧 Extensibility

  • MCP Integration: Extend tools via Model Context Protocol
  • LangSmith Tracing: Full observability and debugging support
  • Multiple Search APIs: Tavily, DuckDuckGo, Exa, ArXiv (DuckDuckGo is now the default - privacy-friendly and no API key required)

🎯 Retrieval Quality Loop

  • Query Rewriting: Expand queries to improve recall (supports both Tavily and DuckDuckGo)
  • Relevance Scoring: Score each result on a 0.0-1.0 scale
  • Reranking: Prioritize higher-quality sources before synthesis
  • Robust Error Handling: Graceful handling of connection issues for DuckDuckGo searches

🧭 Human-in-the-Loop Steering (HITL)

  • Per-Round Checkpoint: Optional pause after each supervisor tool-execution round
  • Two Resume Commands: /continue keeps direction, /steer <instruction> injects new guidance
  • Brief Replacement (No Duplication): Steering updates replace the previous brief message in supervisor_messages instead of appending duplicate briefs
  • Audit Fields in State: steering_history, steering_last_command, steering_warnings

Long-Term Memory

  • Persistent Research Memory: Optionally store reusable facts and episodic summaries across runs
  • Prompt Injection: Retrieve relevant memory and inject it into the supervisor brief before research begins
  • Namespace Isolation: Partition memory by owner, thread_id, and topic_hash
  • SQLite by Default: Persist memory locally with configurable retrieval and ranking controls

🆕 Recent Updates

  • Added a polished local frontend demo view for Panda_Dive research workflows
  • Added a complete sample context research report for quick output reference
  • Updated README with visual showcase and direct links to example assets
  • Added langmem-backed long-term memory with persistence, retrieval, and prompt injection support

🖼️ Showcase

Frontend Effect Demo

Panda_Dive frontend demo

Panda_Dive Research Output Example

Preview topic: Systematic Investigation of Context in LLM-based Agent Systems

The sample report demonstrates:

  • Conceptual overview and context taxonomy
  • Design patterns (dispatcher, state channels, event sourcing)
  • Multi-agent context lifecycle, trade-offs, and failure modes
  • Open challenges and research directions for 2025-2026

🏗️ Architecture

Panda_Dive uses a sophisticated multi-agent graph architecture with three hierarchical layers: Main Graph (entry point), Supervisor Subgraph (orchestration), and Researcher Subgraph (execution).

Main Graph

Entry point handling user interaction, research brief generation, and final report synthesis:

graph TD
    START([START]) --> CLARIFY[clarify_with_user]
    
    CLARIFY --"need_clarification=True"--> USER["🔄 Return to User<br/>with question"]
    USER -->|User response| CLARIFY
    
    CLARIFY --"need_clarification=False"--> BRIEF[write_research_brief]
    BRIEF --> SUPERVISOR["🧩 research_supervisor<br/>Subgraph Entry"]
    
    SUPERVISOR -->|All research<br/>completed| REPORT[final_report_generation]
    REPORT --> END([END])
    
    style START fill:#e1f5ff,stroke:#333,stroke-width:2px
    style END fill:#e1f5ff,stroke:#333,stroke-width:2px
    style CLARIFY fill:#fff3cd,stroke:#333
    style BRIEF fill:#d4edda,stroke:#333
    style SUPERVISOR fill:#f8dce0,stroke:#333,stroke-width:3px
    style REPORT fill:#cce5ff,stroke:#333
    style USER fill:#fff3cd,stroke:#666,stroke-dasharray: 5 5
Loading

Supervisor Subgraph

Orchestrates parallel research by dynamically spawning researcher subgraphs:

graph TB
    subgraph SUPERVISOR["🧩 Supervisor Subgraph"]
        START_S([START]) --> S[supervisor<br/>Lead Researcher]
        
        S --> ST{supervisor_tools<br/>Tool Router}
        
        %% Tool executions
        ST -->|think_tool| THINK["💭 Strategic Reflection"]
        THINK --> S
        
        ST -->|ConductResearch| SPAWN["🚀 Dynamic Subgraph Spawning"]
        
        %% Dynamic spawning detail
        subgraph DYNAMIC["🔄 Dynamic Concurrency Control"]
            SPAWN --> CHECK{"Within<br/>max_concurrent<br/>limit?"}
            CHECK -->|Yes| RESEARCHER["🧩 researcher_subgraph<br/>(Instance N)"]
            CHECK -->|No| OVERFLOW["⚠️ Overflow:<br/>Skip with error"]
            RESEARCHER -->|async gather| COLLECT["📊 Collect Results"]
            OVERFLOW --> COLLECT
        end
        
        COLLECT --> UPDATE["📝 Update State:<br/>• notes<br/>• raw_notes"]
        UPDATE --> S
        
        ST -->|ResearchComplete| DONE_S[Done]
        
        %% Loop conditions
        ST -.->|Iterations <<br/>max_researcher<br/>_iterations| S
    end
    
    style START_S fill:#e1f5ff
    style DONE_S fill:#d4edda
    style S fill:#f8dce0,stroke:#333,stroke-width:3px
    style ST fill:#fff3cd,stroke:#333
    style SPAWN fill:#d4edda,stroke:#333,stroke-width:2px
    style DYNAMIC fill:#f0f8ff,stroke:#666,stroke-dasharray: 3 3
    style RESEARCHER fill:#cce5ff,stroke:#333
Loading

Researcher Subgraph

Executes individual research tasks with the 6-step retrieval quality loop:

graph TB
    subgraph RESEARCHER["🧩 Researcher Subgraph"]
        START_R([START]) --> R[researcher<br/>Research Assistant]
        
        R --> RT{researcher_tools<br/>Tool Router}
        
        %% Tool executions
        RT -->|think_tool| THINK_R["💭 Strategic<br/>Reflection"]
        THINK_R --> R
        
        RT -->|Search Tool| RQL["🎯 Retrieval Quality Loop"]
        
        %% Retrieval Quality Loop detail
        subgraph RQL_DETAIL["🔄 Query → Results → Score → Rerank"]
            RQL --> REWRITE["1️⃣ Query Rewriting<br/>Generate N variants"]
            REWRITE --> SEARCH["2️⃣ Search Execution<br/>tavily/duckduckgo"]
            SEARCH --> PARSE["3️⃣ Result Parsing<br/>→ Structured dicts"]
            PARSE --> SCORE["4️⃣ Relevance Scoring<br/>LLM: 0.0-1.0"]
            SCORE --> RERANK["5️⃣ Reranking<br/>+ Source weight"]
            RERANK --> FORMAT["6️⃣ Format Results<br/>For researcher"]
            
            %% State tracking
            STATE["📊 State Tracking:<br/>• rewritten_queries<br/>• relevance_scores<br/>• reranked_results<br/>• quality_notes"]
        end
        
        FORMAT --> UPDATE_R["📝 Update State"]
        UPDATE_R --> R
        
        RT -->|MCP Tools| MCP["🔧 MCP Tools<br/>(Dynamic Loading)"]
        MCP --> R
        
        RT -->|ResearchComplete| COMPRESS[compress_research]
        
        COMPRESS --> DONE_R[Done]
        
        %% Loop conditions
        RT -.->|tool_calls <<br/>max_react<br/>_tool_calls| R
    end
    
    style START_R fill:#e1f5ff
    style DONE_R fill:#d4edda
    style R fill:#cce5ff,stroke:#333,stroke-width:3px
    style RT fill:#fff3cd,stroke:#333
    style RQL fill:#f8dce0,stroke:#333,stroke-width:2px
    style RQL_DETAIL fill:#fff5f5,stroke:#666,stroke-dasharray: 3 3
    style RESEARCHER fill:#cce5ff,stroke:#333
    style STATE fill:#f0f8ff,stroke:#999
Loading

Architecture Highlights

Layer Components Key Features
Main Graph clarify_with_user, write_research_brief, research_supervisor, final_report_generation User interaction, clarification loop, brief generation, report synthesis
Supervisor Subgraph supervisor, supervisor_tools, Dynamic Spawning Parallel research orchestration, concurrency control (max_concurrent_research_units), async subgraph spawning
Researcher Subgraph researcher, researcher_tools, Retrieval Quality Loop, compress_research Individual research execution, 6-step retrieval quality (rewrite → search → parse → score → rerank → format), MCP integration

Data Flow

User Query 
  → Main Graph (Clarification → Brief)
  → Supervisor Subgraph (Parallel delegation)
    → Researcher Subgraph Instance 1 (Quality Loop)
    → Researcher Subgraph Instance 2 (Quality Loop)
    → Researcher Subgraph Instance N (Quality Loop)
  → Main Graph (Synthesis → Report)
  → User

Each researcher subgraph executes the full retrieval quality loop: Query Rewriting → Search Execution → Result Parsing → Relevance Scoring → Reranking → Result Formatting, with all metrics tracked in state for observability.


📦 Installation

Prerequisites

  • Python 3.11 or higher
  • API keys for your chosen LLM provider(s)
  • (Optional) Tavily API key if using Tavily search (DuckDuckGo requires no API key)

Install from source

# Clone the repository
git clone https://github.com/123yongming/Panda_Dive.git
cd Panda_Dive

Linux/macOS

# Create virtual environment with uv
uv venv
source .venv/bin/activate

# Install dependencies
uv sync

Windows

# Create virtual environment
python -m venv venv
.\venv\Scripts\Activate

# Install uv and dependencies
pip install uv
uv pip install -r pyproject.toml

Alternative: Using pip directly

# Create virtual environment
python -m venv .venv

# Activate (Linux/macOS: source .venv/bin/activate, Windows: .venv\Scripts\activate)
source .venv/bin/activate

# Install in editable mode
pip install -e .

Configuration

Copy the example environment file and configure your API keys:

# Linux/macOS
cp .env.example .env

# Windows
copy .env.example .env

Edit .env with your credentials:

OPENAI_API_KEY=your_openai_key
ANTHROPIC_API_KEY=your_anthropic_key
GOOGLE_API_KEY=your_google_key
TAVILY_API_KEY=your_tavily_key
LANGSMITH_API_KEY=your_langsmith_key
LANGSMITH_PROJECT=panda_dive

🚀 Quick Start

Basic Usage

import asyncio

from Panda_Dive import Configuration, deep_researcher
from langchain_core.messages import HumanMessage

# Configure the researcher (DuckDuckGo is default - no API key needed!)
config = Configuration(
    max_researcher_iterations=6,
    max_concurrent_research_units=4,
    allow_clarification=True,
    research_model="openai:gpt-4o",
    final_report_model="openai:gpt-4o",
)

# Start research
topic = "What are the latest developments in quantum computing?"

async def main() -> None:
    result = await deep_researcher.ainvoke(
        {"messages": [HumanMessage(content=topic)]},
        config={"configurable": config.model_dump()},
    )
    print(result["messages"][-1].content)

asyncio.run(main())

Enable Long-Term Memory

config = Configuration(
    memory_enabled=True,
    memory_backend="sqlite",
    memory_sqlite_path=".memory/memory.sqlite3",
    memory_namespace_template="memory.owner.{owner}",
    memory_retrieval_top_k=8,
)

When memory is enabled, Panda_Dive persists extracted memory items and retrieves relevant context for later research runs.

Running with LangSmith

You can also run Panda_Dive as a LangGraph development server:

uvx --refresh --from "langgraph-cli[inmem]" --with-editable . --python 3.11 langgraph dev --allow-blocking --host 0.0.0.0 --port 2026

This will start the development server on http://localhost:2026 with in-memory storage, allowing you to interact with the deep researcher through the LangSmith UI.


🧭 Human-in-the-loop Steering

Steering is opt-in and requires a checkpointer (so the graph can interrupt and resume safely).

Python API Example

import asyncio

from langchain_core.messages import HumanMessage
from langgraph.checkpoint.memory import InMemorySaver
from langgraph.types import Command

from Panda_Dive import Configuration, build_deep_researcher


async def main() -> None:
    graph = build_deep_researcher(checkpointer=InMemorySaver())
    config = {
        "configurable": {
            "thread_id": "steering-demo",
            **Configuration(
                enable_steering=True,
                allow_clarification=False,
                steering_command_prefix="/steer",
                steering_continue_command="/continue",
            ).model_dump(),
        }
    }

    # Start a run. The graph may interrupt at a steering checkpoint.
    await graph.ainvoke(
        {"messages": [HumanMessage(content="Research AI coding agents in 2026")]},
        config=config,
    )

    # Resume with a steering directive.
    await graph.ainvoke(
        Command(resume="/steer prioritize official docs and benchmark-backed claims"),
        config=config,
    )

    # Or resume without changing direction.
    await graph.ainvoke(Command(resume="/continue"), config=config)


asyncio.run(main())

LangSmith Studio Quick Test

  1. Start dev server:
uvx --refresh --from "langgraph-cli[inmem]" --with-editable . --python 3.11 langgraph dev --allow-blocking --host 0.0.0.0 --port 2026
  1. In LangSmith Studio, set configurable values:
    • enable_steering=true
    • steering_command_prefix=/steer
    • steering_continue_command=/continue
    • thread_id=<any-stable-id>
  2. Start a run, then resume from interrupt with:
    • /steer focus on official sources and recent papers
    • or /continue
  3. Validate state fields after resume:
    • research_brief updated when using /steer
    • steering_history, steering_last_command, steering_warnings
    • supervisor_messages keeps a single brief entry (old brief is replaced, not appended)

⚙️ Configuration

Key Options

Parameter Type Default Description
search_api str "duckduckgo" Search API to use: duckduckgo (default), tavily, exa, arxiv, or none
max_researcher_iterations int 6 Maximum iterations per researcher (1-10)
max_react_tool_calls int 6 Maximum tool calls per reaction (1-30)
max_concurrent_research_units int 4 Parallel research tasks (1-20)
allow_clarification bool True Ask clarifying questions before research
enable_steering bool False Enable per-round human steering checkpoints in supervisor loop
steering_command_prefix str "/steer" Command prefix for steering directives
steering_continue_command str "/continue" Command to continue without modifying the brief
memory_enabled bool False Enable long-term memory extraction, retrieval, and injection
memory_backend str "sqlite" Memory backend: sqlite or langgraph_store
memory_sqlite_path str ".memory/memory.sqlite3" SQLite path for persisted memory data
memory_namespace_template str "memory.owner.{owner}" Namespace template supporting {owner}, {thread_id}, and {topic_hash}
memory_retrieval_top_k int 8 Number of memory facts retrieved before prompt injection
model str "openai:gpt-4o" Default model for research
query_variants int 3 Number of query variants for retrieval quality
relevance_threshold float 0.7 Minimum relevance score threshold
rerank_top_k int 10 Number of documents after reranking
rerank_weight_source str "auto" Source weighting strategy for reranking

🔍 How It Works

Research Process

  1. Clarification (Optional)

    • Asks clarifying questions to understand research scope
    • User can confirm or modify the research brief
  2. Research Brief Generation

    • Creates a structured brief based on the topic
    • Identifies key areas to investigate
  3. Supervised Research

    • Supervisor delegates specific research tasks
    • Multiple researcher agents work in parallel
    • Each researcher explores their assigned subtopic
  4. Steering Checkpoint (Optional)

    • Interrupts after each supervisor tool round when enable_steering=True
    • Accepts /steer <instruction> or /continue
    • If steered, updates research_brief and replaces old brief message in supervisor context
  5. Research Synthesis

    • Compresses individual findings to fit context
    • Synthesizes cross-cutting insights
  6. Final Report

    • Generates comprehensive, well-structured report
    • Includes citations and sources

📚 Documentation

🧪 Evaluation

Panda_Dive includes a comprehensive evaluation framework using LangSmith to benchmark the deep research system against the "Deep Research Bench" dataset.

Environment Variables

Before running evaluations, ensure these environment variables are set:

Variable Required Description
LANGSMITH_API_KEY Yes LangSmith API key for evaluation tracking
OPENAI_API_KEY No* OpenAI API key (if using OpenAI models)
ANTHROPIC_API_KEY No* Anthropic API key (if using Claude models)
DEEPSEEK_API_KEY No* DeepSeek API key (if using DeepSeek models)

*Required only if using the respective provider's models.

Smoke Test (Quick Validation)

Run a quick smoke test on 2 examples to validate the setup:

# Basic smoke test (2 examples, default settings)
python tests/run_evaluate.py --smoke --dataset-name "deep_research_bench"

Supervisor Parallelism Evaluation

This evaluation measures both intended parallelism (tool-call count) and observed parallelism (span overlap) for the supervisor.

# Create the dataset (one-time setup)
python tests/create_supervisor_parallelism_dataset.py \
  --dataset-name "Panda_Dive: Supervisor Parallelism" \
  --source tests/prompt/supervisor_parallelism.jsonl

# Run the evaluation
python tests/run_evaluate.py \
  --dataset-name "Panda_Dive: Supervisor Parallelism" \
  --max-concurrency 1 \
  --experiment-prefix "supervisor-parallel"

Metrics produced:

  • tool_call_count_match: Whether actual tool calls match the reference count
  • parallel_overlap_ms: Total overlap time (ms) across trace spans

Full Evaluation

Run a full evaluation on the entire dataset (⚠️ Warning: Expensive!):

# Full evaluation (all dataset examples)
python tests/run_evaluate.py --full

Configuration Options

Flag Default Description
--smoke - Run smoke test (2 examples)
--full - Run full evaluation (all examples)
--dataset-name "Deep Research Bench" Dataset name in LangSmith
--max-examples 2 (smoke) / all (full) Maximum examples to evaluate
--experiment-prefix Auto-generated Prefix for experiment name
--max-concurrency 2 Maximum concurrent evaluations (max: 5)
--timeout-seconds 1800 Per-example timeout (seconds)
--model From env/config Model to use for evaluation

Cost Warning

⚠️ Full evaluation runs can be expensive! A full run on the "Deep Research Bench" dataset can cost $50-200+ depending on the model used. Always:

  1. Run a smoke test first to validate setup
  2. Monitor LangSmith during the run
  3. Start with lower concurrency to control costs

Exporting Results

After evaluation, export results to JSONL format:

# Export results using experiment project name
python tests/extract_langsmith_data.py \
  --project-name "deep-research-eval-smoke-20250204-120000" \
  --model-name "gpt-4o" \
  --output-dir tests/expt_results/

# Force overwrite if file exists
python tests/extract_langsmith_data.py \
  --project-name "your-experiment-name" \
  --model-name "claude-3-5-sonnet" \
  --force

Export Options

Flag Required Default Description
--project-name Yes - LangSmith project name containing the experiment runs
--model-name Yes - Model name (used for output filename)
--dataset-name No "Deep Research Bench" Dataset name for validation
--output-dir No tests/expt_results/ Output directory for JSONL file
--force No false Overwrite existing file if it exists

🧪 Development

Running Tests

# Run all tests
python -m pytest

# Run with verbose output
python -m pytest -v

# Run with coverage
python -m pytest --cov=Panda_Dive

# Run specific test
python -m pytest src/test_api.py::test_function_name

Linting and Formatting

# Check code style
ruff check .

# Auto-fix issues
ruff check --fix .

# Type checking
mypy src/Panda_Dive/

Code Style Guidelines

  • Python 3.10+ type hints (e.g., list[str], not List[str])
  • Google-style docstrings
  • Async/await patterns for all graph nodes
  • Proper error handling and logging

See AGENTS.md for detailed development guidelines.


📂 Project Structure

Panda_Dive/
├── docs/
│   └── retrieval-quality-loop.md  # Retrieval quality loop report
├── src/
│   └── Panda_Dive/
│       ├── __init__.py           # Package exports
│       ├── deepresearcher.py     # Main graph orchestration
│       ├── configuration.py       # Pydantic configuration models
│       ├── state.py               # TypedDict state definitions
│       ├── prompts.py             # System prompts for LLMs
│       └── utils.py               # Tool wrappers and helpers
├── pyproject.toml                # Project configuration
├── .env.example                  # Environment variables template
├── AGENTS.md                     # Agent development guidelines
└── README.md                     # This file

🤝 Contributing

We welcome contributions! Here's how to get started:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Make your changes following our code style guidelines
  4. Run tests and linting (pytest and ruff)
  5. Commit your changes (git commit -m 'Add amazing feature')
  6. Push to the branch (git push origin feature/amazing-feature)
  7. Open a Pull Request

Development Workflow

  • Follow PEP 8 and our ruff configuration
  • Add tests for new features
  • Update documentation as needed
  • Ensure type hints are complete

📄 License

This project is licensed under the MIT License.


🙏 Acknowledgments

Built with:


📞 Support


Made with ❤️ by PonyPan

About

No description, website, or topics provided.

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors