领域深度搜索工具 - Deep Domain Research Tool
A powerful multi-agent deep research tool built with LangGraph and LangChain. Panda_Dive orchestrates multiple researcher agents to comprehensively explore any domain, synthesize findings, and generate detailed reports with retrieval quality safeguards.
- Features
- Recent Updates
- Showcase
- Architecture
- Installation
- Quick Start
- Human-in-the-loop Steering
- Configuration
- How It Works
- Documentation
- Evaluation
- Development
- Project Structure
- Contributing
- License
- Support
- Supervisory Agent: Intelligently delegates research tasks to multiple specialized researcher agents
- Concurrent Execution: Run up to 20 research tasks in parallel for maximum efficiency
- Dynamic Task Delegation: The supervisor adapts based on research progress and findings
Panda_Dive supports multiple LLM providers out of the box:
- OpenAI (GPT-4, GPT-3.5)
- Anthropic (Claude 3.5, Claude 3)
- DeepSeek (DeepSeek V3)
- Google (VertexAI, GenAI)
- Groq (Llama, Mixtral)
- AWS Bedrock
Configure different models for different research stages:
- Research queries
- Information compression
- Summarization
- Final report generation
- Automatic Truncation: Intelligently handles token limit errors
- Retry Logic: Robust retry mechanism for failed tool calls
- Context Optimization: Compresses research findings to stay within limits
- MCP Integration: Extend tools via Model Context Protocol
- LangSmith Tracing: Full observability and debugging support
- Multiple Search APIs: Tavily, DuckDuckGo, Exa, ArXiv (DuckDuckGo is now the default - privacy-friendly and no API key required)
- Query Rewriting: Expand queries to improve recall (supports both Tavily and DuckDuckGo)
- Relevance Scoring: Score each result on a 0.0-1.0 scale
- Reranking: Prioritize higher-quality sources before synthesis
- Robust Error Handling: Graceful handling of connection issues for DuckDuckGo searches
- Per-Round Checkpoint: Optional pause after each supervisor tool-execution round
- Two Resume Commands:
/continuekeeps direction,/steer <instruction>injects new guidance - Brief Replacement (No Duplication): Steering updates replace the previous brief message in
supervisor_messagesinstead of appending duplicate briefs - Audit Fields in State:
steering_history,steering_last_command,steering_warnings
- Persistent Research Memory: Optionally store reusable facts and episodic summaries across runs
- Prompt Injection: Retrieve relevant memory and inject it into the supervisor brief before research begins
- Namespace Isolation: Partition memory by
owner,thread_id, andtopic_hash - SQLite by Default: Persist memory locally with configurable retrieval and ranking controls
- Added a polished local frontend demo view for Panda_Dive research workflows
- Added a complete sample context research report for quick output reference
- Updated README with visual showcase and direct links to example assets
- Added langmem-backed long-term memory with persistence, retrieval, and prompt injection support
- Sample report: example/context research report.md
Preview topic: Systematic Investigation of Context in LLM-based Agent Systems
The sample report demonstrates:
- Conceptual overview and context taxonomy
- Design patterns (dispatcher, state channels, event sourcing)
- Multi-agent context lifecycle, trade-offs, and failure modes
- Open challenges and research directions for 2025-2026
Panda_Dive uses a sophisticated multi-agent graph architecture with three hierarchical layers: Main Graph (entry point), Supervisor Subgraph (orchestration), and Researcher Subgraph (execution).
Entry point handling user interaction, research brief generation, and final report synthesis:
graph TD
START([START]) --> CLARIFY[clarify_with_user]
CLARIFY --"need_clarification=True"--> USER["🔄 Return to User<br/>with question"]
USER -->|User response| CLARIFY
CLARIFY --"need_clarification=False"--> BRIEF[write_research_brief]
BRIEF --> SUPERVISOR["🧩 research_supervisor<br/>Subgraph Entry"]
SUPERVISOR -->|All research<br/>completed| REPORT[final_report_generation]
REPORT --> END([END])
style START fill:#e1f5ff,stroke:#333,stroke-width:2px
style END fill:#e1f5ff,stroke:#333,stroke-width:2px
style CLARIFY fill:#fff3cd,stroke:#333
style BRIEF fill:#d4edda,stroke:#333
style SUPERVISOR fill:#f8dce0,stroke:#333,stroke-width:3px
style REPORT fill:#cce5ff,stroke:#333
style USER fill:#fff3cd,stroke:#666,stroke-dasharray: 5 5
Orchestrates parallel research by dynamically spawning researcher subgraphs:
graph TB
subgraph SUPERVISOR["🧩 Supervisor Subgraph"]
START_S([START]) --> S[supervisor<br/>Lead Researcher]
S --> ST{supervisor_tools<br/>Tool Router}
%% Tool executions
ST -->|think_tool| THINK["💭 Strategic Reflection"]
THINK --> S
ST -->|ConductResearch| SPAWN["🚀 Dynamic Subgraph Spawning"]
%% Dynamic spawning detail
subgraph DYNAMIC["🔄 Dynamic Concurrency Control"]
SPAWN --> CHECK{"Within<br/>max_concurrent<br/>limit?"}
CHECK -->|Yes| RESEARCHER["🧩 researcher_subgraph<br/>(Instance N)"]
CHECK -->|No| OVERFLOW["⚠️ Overflow:<br/>Skip with error"]
RESEARCHER -->|async gather| COLLECT["📊 Collect Results"]
OVERFLOW --> COLLECT
end
COLLECT --> UPDATE["📝 Update State:<br/>• notes<br/>• raw_notes"]
UPDATE --> S
ST -->|ResearchComplete| DONE_S[Done]
%% Loop conditions
ST -.->|Iterations <<br/>max_researcher<br/>_iterations| S
end
style START_S fill:#e1f5ff
style DONE_S fill:#d4edda
style S fill:#f8dce0,stroke:#333,stroke-width:3px
style ST fill:#fff3cd,stroke:#333
style SPAWN fill:#d4edda,stroke:#333,stroke-width:2px
style DYNAMIC fill:#f0f8ff,stroke:#666,stroke-dasharray: 3 3
style RESEARCHER fill:#cce5ff,stroke:#333
Executes individual research tasks with the 6-step retrieval quality loop:
graph TB
subgraph RESEARCHER["🧩 Researcher Subgraph"]
START_R([START]) --> R[researcher<br/>Research Assistant]
R --> RT{researcher_tools<br/>Tool Router}
%% Tool executions
RT -->|think_tool| THINK_R["💭 Strategic<br/>Reflection"]
THINK_R --> R
RT -->|Search Tool| RQL["🎯 Retrieval Quality Loop"]
%% Retrieval Quality Loop detail
subgraph RQL_DETAIL["🔄 Query → Results → Score → Rerank"]
RQL --> REWRITE["1️⃣ Query Rewriting<br/>Generate N variants"]
REWRITE --> SEARCH["2️⃣ Search Execution<br/>tavily/duckduckgo"]
SEARCH --> PARSE["3️⃣ Result Parsing<br/>→ Structured dicts"]
PARSE --> SCORE["4️⃣ Relevance Scoring<br/>LLM: 0.0-1.0"]
SCORE --> RERANK["5️⃣ Reranking<br/>+ Source weight"]
RERANK --> FORMAT["6️⃣ Format Results<br/>For researcher"]
%% State tracking
STATE["📊 State Tracking:<br/>• rewritten_queries<br/>• relevance_scores<br/>• reranked_results<br/>• quality_notes"]
end
FORMAT --> UPDATE_R["📝 Update State"]
UPDATE_R --> R
RT -->|MCP Tools| MCP["🔧 MCP Tools<br/>(Dynamic Loading)"]
MCP --> R
RT -->|ResearchComplete| COMPRESS[compress_research]
COMPRESS --> DONE_R[Done]
%% Loop conditions
RT -.->|tool_calls <<br/>max_react<br/>_tool_calls| R
end
style START_R fill:#e1f5ff
style DONE_R fill:#d4edda
style R fill:#cce5ff,stroke:#333,stroke-width:3px
style RT fill:#fff3cd,stroke:#333
style RQL fill:#f8dce0,stroke:#333,stroke-width:2px
style RQL_DETAIL fill:#fff5f5,stroke:#666,stroke-dasharray: 3 3
style RESEARCHER fill:#cce5ff,stroke:#333
style STATE fill:#f0f8ff,stroke:#999
| Layer | Components | Key Features |
|---|---|---|
| Main Graph | clarify_with_user, write_research_brief, research_supervisor, final_report_generation |
User interaction, clarification loop, brief generation, report synthesis |
| Supervisor Subgraph | supervisor, supervisor_tools, Dynamic Spawning |
Parallel research orchestration, concurrency control (max_concurrent_research_units), async subgraph spawning |
| Researcher Subgraph | researcher, researcher_tools, Retrieval Quality Loop, compress_research |
Individual research execution, 6-step retrieval quality (rewrite → search → parse → score → rerank → format), MCP integration |
User Query
→ Main Graph (Clarification → Brief)
→ Supervisor Subgraph (Parallel delegation)
→ Researcher Subgraph Instance 1 (Quality Loop)
→ Researcher Subgraph Instance 2 (Quality Loop)
→ Researcher Subgraph Instance N (Quality Loop)
→ Main Graph (Synthesis → Report)
→ User
Each researcher subgraph executes the full retrieval quality loop: Query Rewriting → Search Execution → Result Parsing → Relevance Scoring → Reranking → Result Formatting, with all metrics tracked in state for observability.
- Python 3.11 or higher
- API keys for your chosen LLM provider(s)
- (Optional) Tavily API key if using Tavily search (DuckDuckGo requires no API key)
# Clone the repository
git clone https://github.com/123yongming/Panda_Dive.git
cd Panda_Dive# Create virtual environment with uv
uv venv
source .venv/bin/activate
# Install dependencies
uv sync# Create virtual environment
python -m venv venv
.\venv\Scripts\Activate
# Install uv and dependencies
pip install uv
uv pip install -r pyproject.toml# Create virtual environment
python -m venv .venv
# Activate (Linux/macOS: source .venv/bin/activate, Windows: .venv\Scripts\activate)
source .venv/bin/activate
# Install in editable mode
pip install -e .Copy the example environment file and configure your API keys:
# Linux/macOS
cp .env.example .env
# Windows
copy .env.example .envEdit .env with your credentials:
OPENAI_API_KEY=your_openai_key
ANTHROPIC_API_KEY=your_anthropic_key
GOOGLE_API_KEY=your_google_key
TAVILY_API_KEY=your_tavily_key
LANGSMITH_API_KEY=your_langsmith_key
LANGSMITH_PROJECT=panda_diveimport asyncio
from Panda_Dive import Configuration, deep_researcher
from langchain_core.messages import HumanMessage
# Configure the researcher (DuckDuckGo is default - no API key needed!)
config = Configuration(
max_researcher_iterations=6,
max_concurrent_research_units=4,
allow_clarification=True,
research_model="openai:gpt-4o",
final_report_model="openai:gpt-4o",
)
# Start research
topic = "What are the latest developments in quantum computing?"
async def main() -> None:
result = await deep_researcher.ainvoke(
{"messages": [HumanMessage(content=topic)]},
config={"configurable": config.model_dump()},
)
print(result["messages"][-1].content)
asyncio.run(main())config = Configuration(
memory_enabled=True,
memory_backend="sqlite",
memory_sqlite_path=".memory/memory.sqlite3",
memory_namespace_template="memory.owner.{owner}",
memory_retrieval_top_k=8,
)When memory is enabled, Panda_Dive persists extracted memory items and retrieves relevant context for later research runs.
You can also run Panda_Dive as a LangGraph development server:
uvx --refresh --from "langgraph-cli[inmem]" --with-editable . --python 3.11 langgraph dev --allow-blocking --host 0.0.0.0 --port 2026This will start the development server on http://localhost:2026 with in-memory storage, allowing you to interact with the deep researcher through the LangSmith UI.
Steering is opt-in and requires a checkpointer (so the graph can interrupt and resume safely).
import asyncio
from langchain_core.messages import HumanMessage
from langgraph.checkpoint.memory import InMemorySaver
from langgraph.types import Command
from Panda_Dive import Configuration, build_deep_researcher
async def main() -> None:
graph = build_deep_researcher(checkpointer=InMemorySaver())
config = {
"configurable": {
"thread_id": "steering-demo",
**Configuration(
enable_steering=True,
allow_clarification=False,
steering_command_prefix="/steer",
steering_continue_command="/continue",
).model_dump(),
}
}
# Start a run. The graph may interrupt at a steering checkpoint.
await graph.ainvoke(
{"messages": [HumanMessage(content="Research AI coding agents in 2026")]},
config=config,
)
# Resume with a steering directive.
await graph.ainvoke(
Command(resume="/steer prioritize official docs and benchmark-backed claims"),
config=config,
)
# Or resume without changing direction.
await graph.ainvoke(Command(resume="/continue"), config=config)
asyncio.run(main())- Start dev server:
uvx --refresh --from "langgraph-cli[inmem]" --with-editable . --python 3.11 langgraph dev --allow-blocking --host 0.0.0.0 --port 2026- In LangSmith Studio, set configurable values:
enable_steering=truesteering_command_prefix=/steersteering_continue_command=/continuethread_id=<any-stable-id>
- Start a run, then resume from interrupt with:
/steer focus on official sources and recent papers- or
/continue
- Validate state fields after resume:
research_briefupdated when using/steersteering_history,steering_last_command,steering_warningssupervisor_messageskeeps a single brief entry (old brief is replaced, not appended)
| Parameter | Type | Default | Description |
|---|---|---|---|
search_api |
str | "duckduckgo" |
Search API to use: duckduckgo (default), tavily, exa, arxiv, or none |
max_researcher_iterations |
int | 6 |
Maximum iterations per researcher (1-10) |
max_react_tool_calls |
int | 6 |
Maximum tool calls per reaction (1-30) |
max_concurrent_research_units |
int | 4 |
Parallel research tasks (1-20) |
allow_clarification |
bool | True |
Ask clarifying questions before research |
enable_steering |
bool | False |
Enable per-round human steering checkpoints in supervisor loop |
steering_command_prefix |
str | "/steer" |
Command prefix for steering directives |
steering_continue_command |
str | "/continue" |
Command to continue without modifying the brief |
memory_enabled |
bool | False |
Enable long-term memory extraction, retrieval, and injection |
memory_backend |
str | "sqlite" |
Memory backend: sqlite or langgraph_store |
memory_sqlite_path |
str | ".memory/memory.sqlite3" |
SQLite path for persisted memory data |
memory_namespace_template |
str | "memory.owner.{owner}" |
Namespace template supporting {owner}, {thread_id}, and {topic_hash} |
memory_retrieval_top_k |
int | 8 |
Number of memory facts retrieved before prompt injection |
model |
str | "openai:gpt-4o" |
Default model for research |
query_variants |
int | 3 |
Number of query variants for retrieval quality |
relevance_threshold |
float | 0.7 |
Minimum relevance score threshold |
rerank_top_k |
int | 10 |
Number of documents after reranking |
rerank_weight_source |
str | "auto" |
Source weighting strategy for reranking |
-
Clarification (Optional)
- Asks clarifying questions to understand research scope
- User can confirm or modify the research brief
-
Research Brief Generation
- Creates a structured brief based on the topic
- Identifies key areas to investigate
-
Supervised Research
- Supervisor delegates specific research tasks
- Multiple researcher agents work in parallel
- Each researcher explores their assigned subtopic
-
Steering Checkpoint (Optional)
- Interrupts after each supervisor tool round when
enable_steering=True - Accepts
/steer <instruction>or/continue - If steered, updates
research_briefand replaces old brief message in supervisor context
- Interrupts after each supervisor tool round when
-
Research Synthesis
- Compresses individual findings to fit context
- Synthesizes cross-cutting insights
-
Final Report
- Generates comprehensive, well-structured report
- Includes citations and sources
- docs/README.md - Documentation index
- docs/architecture.md - Architecture overview and component boundaries
- docs/evaluation-guide.md - Evaluation workflow and benchmark card generation
- docs/contributing.md - Contribution path and quality gates
- docs/future-direction.md - 2026-2028 direction and phased roadmap
Panda_Dive includes a comprehensive evaluation framework using LangSmith to benchmark the deep research system against the "Deep Research Bench" dataset.
Before running evaluations, ensure these environment variables are set:
| Variable | Required | Description |
|---|---|---|
LANGSMITH_API_KEY |
Yes | LangSmith API key for evaluation tracking |
OPENAI_API_KEY |
No* | OpenAI API key (if using OpenAI models) |
ANTHROPIC_API_KEY |
No* | Anthropic API key (if using Claude models) |
DEEPSEEK_API_KEY |
No* | DeepSeek API key (if using DeepSeek models) |
*Required only if using the respective provider's models.
Run a quick smoke test on 2 examples to validate the setup:
# Basic smoke test (2 examples, default settings)
python tests/run_evaluate.py --smoke --dataset-name "deep_research_bench"
This evaluation measures both intended parallelism (tool-call count) and observed parallelism (span overlap) for the supervisor.
# Create the dataset (one-time setup)
python tests/create_supervisor_parallelism_dataset.py \
--dataset-name "Panda_Dive: Supervisor Parallelism" \
--source tests/prompt/supervisor_parallelism.jsonl
# Run the evaluation
python tests/run_evaluate.py \
--dataset-name "Panda_Dive: Supervisor Parallelism" \
--max-concurrency 1 \
--experiment-prefix "supervisor-parallel"Metrics produced:
tool_call_count_match: Whether actual tool calls match the reference countparallel_overlap_ms: Total overlap time (ms) across trace spans
Run a full evaluation on the entire dataset (
# Full evaluation (all dataset examples)
python tests/run_evaluate.py --full
| Flag | Default | Description |
|---|---|---|
--smoke |
- | Run smoke test (2 examples) |
--full |
- | Run full evaluation (all examples) |
--dataset-name |
"Deep Research Bench" | Dataset name in LangSmith |
--max-examples |
2 (smoke) / all (full) | Maximum examples to evaluate |
--experiment-prefix |
Auto-generated | Prefix for experiment name |
--max-concurrency |
2 | Maximum concurrent evaluations (max: 5) |
--timeout-seconds |
1800 | Per-example timeout (seconds) |
--model |
From env/config | Model to use for evaluation |
- Run a smoke test first to validate setup
- Monitor LangSmith during the run
- Start with lower concurrency to control costs
After evaluation, export results to JSONL format:
# Export results using experiment project name
python tests/extract_langsmith_data.py \
--project-name "deep-research-eval-smoke-20250204-120000" \
--model-name "gpt-4o" \
--output-dir tests/expt_results/
# Force overwrite if file exists
python tests/extract_langsmith_data.py \
--project-name "your-experiment-name" \
--model-name "claude-3-5-sonnet" \
--force| Flag | Required | Default | Description |
|---|---|---|---|
--project-name |
Yes | - | LangSmith project name containing the experiment runs |
--model-name |
Yes | - | Model name (used for output filename) |
--dataset-name |
No | "Deep Research Bench" | Dataset name for validation |
--output-dir |
No | tests/expt_results/ |
Output directory for JSONL file |
--force |
No | false |
Overwrite existing file if it exists |
# Run all tests
python -m pytest
# Run with verbose output
python -m pytest -v
# Run with coverage
python -m pytest --cov=Panda_Dive
# Run specific test
python -m pytest src/test_api.py::test_function_name# Check code style
ruff check .
# Auto-fix issues
ruff check --fix .
# Type checking
mypy src/Panda_Dive/- Python 3.10+ type hints (e.g.,
list[str], notList[str]) - Google-style docstrings
- Async/await patterns for all graph nodes
- Proper error handling and logging
See AGENTS.md for detailed development guidelines.
Panda_Dive/
├── docs/
│ └── retrieval-quality-loop.md # Retrieval quality loop report
├── src/
│ └── Panda_Dive/
│ ├── __init__.py # Package exports
│ ├── deepresearcher.py # Main graph orchestration
│ ├── configuration.py # Pydantic configuration models
│ ├── state.py # TypedDict state definitions
│ ├── prompts.py # System prompts for LLMs
│ └── utils.py # Tool wrappers and helpers
├── pyproject.toml # Project configuration
├── .env.example # Environment variables template
├── AGENTS.md # Agent development guidelines
└── README.md # This file
We welcome contributions! Here's how to get started:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes following our code style guidelines
- Run tests and linting (
pytestandruff) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
- Follow PEP 8 and our ruff configuration
- Add tests for new features
- Update documentation as needed
- Ensure type hints are complete
This project is licensed under the MIT License.
Built with:
- LangGraph - Graph-based orchestration
- LangChain - LLM application framework
- Pydantic - Data validation
- 📖 Read the AGENTS.md for development guidelines
- 🐛 Report issues on GitHub Issues
- 💬 Explore more projects by PonyPan
Made with ❤️ by PonyPan
