Skip to content

feat: deep-research-agent — LangGraph + BMasterAI telemetry#50

Merged
travis-burmaster merged 2 commits intomainfrom
feat/deep-research-agent
Mar 14, 2026
Merged

feat: deep-research-agent — LangGraph + BMasterAI telemetry#50
travis-burmaster merged 2 commits intomainfrom
feat/deep-research-agent

Conversation

@ellucas-creator
Copy link
Collaborator

Deep Research Agent

A multi-step web research agent built with LangGraph and instrumented with BMasterAI logging and telemetry. Inspired by langchain-ai/deepagents.

Architecture

planner → web_searcher → analyzer → reflector
                                         ↓
                             needs_more_research?
                              /               \
                            yes (≤2x)         no
                             ↓                ↓
                        web_searcher      synthesizer → END
                       (follow-ups)

BMasterAI Integration

Feature Where used
configure_logging Startup
monitor.track_agent_start/stop Every node
monitor.track_llm_call Every Claude API call
monitor.track_task_duration Per-node timing
bm.log_event(EventType.LLM_CALL) Before each LLM call
bm.log_event(EventType.TOOL_USE) Each Tavily search
bm.log_event(EventType.DECISION_POINT) Planner + reflector routing
bm.log_reasoning_chain Planner sub-question breakdown
monitor.get_agent_dashboard() Final telemetry printout
logs/research.jsonl Structured JSON for analytics

Usage

pip install -r requirements.txt
cp .env.example .env  # add ANTHROPIC_API_KEY + TAVILY_API_KEY
python main.py "What is the current state of AI agents in enterprise software?"

… by deepagents)

- Planner decomposes research topic into 3-5 sub-questions with reasoning chain logging
- Web Searcher runs Tavily searches per sub-question, logs each tool use
- Analyzer synthesizes results into findings, logs LLM calls + token usage
- Reflector evaluates quality (1-10 score), conditionally loops for more research (max 2x)
- Synthesizer produces structured report (Executive Summary, Key Findings, Analysis, Limitations)
- Full BMasterAI instrumentation: agent lifecycle, LLM calls, tool use, decision points, reasoning chains
- Telemetry written to logs/research.jsonl for downstream analytics
- Inspired by langchain-ai/deepagents deep_research example
@claude
Copy link

claude bot commented Mar 14, 2026

Claude finished @ellucas-creator's task —— View job


Code Review: Deep Research Agent

I've completed a comprehensive review of this LangGraph + BMasterAI integration. Here's my detailed feedback:

Review Checklist

  • Read and understand the repository structure and CLAUDE.md guidelines
  • Review main.py - entry point and overall architecture
  • Review agents.py - core agent implementation and BMasterAI integration
  • Review graph.py - LangGraph workflow definition
  • Review state.py - state management and data flow
  • Review configuration files (.env.example, requirements.txt, .gitignore)
  • Review README.md - documentation completeness
  • Analyze security concerns and best practices
  • Assess performance considerations and potential optimizations
  • Evaluate error handling and robustness
  • Check for missing test coverage
  • Provide comprehensive review feedback

Strengths

Architecture & Design

  • Excellent LangGraph workflow design with clear separation of concerns
  • Well-structured state management using TypedDict for type safety
  • Clean conditional routing logic for reflection loops with max iteration limit
  • Comprehensive BMasterAI telemetry integration throughout the pipeline

Code Quality

  • Good docstrings and inline documentation
  • Consistent error handling patterns across agents
  • Proper environment variable validation with clear error messages
  • Clean JSON parsing with graceful fallbacks

Security

  • API keys properly handled via environment variables
  • No hardcoded secrets in code
  • Appropriate .gitignore excluding sensitive files

🔍 Issues & Recommendations

High Priority

  1. Missing Test Coverage - agents.py:1-386

    • No unit tests for any agent functions
    • No integration tests for the LangGraph pipeline
    • No tests for error handling scenarios
    • Recommendation: Add test files covering agent logic, API failures, and edge cases
  2. Insufficient Input Validation - main.py:92-99

    # Current: basic strip() only
    topic = input("Research question: ").strip()
    if not topic:
        topic = "What is the current state of multi-agent AI systems in 2026?"
    • No validation for topic length or content
    • Recommendation: Add length limits and content validation
  3. Resource Leak Risk - agents.py:28-29

    monitor = get_monitor()
    monitor.start_monitoring()
    • Monitor started but never explicitly stopped
    • Recommendation: Add proper cleanup in main.py or use context manager

Medium Priority

  1. Error Information Loss - agents.py:110-112

    except Exception as e:
        sub_questions = [state["topic"]]
        reasoning = f"Parsing failed ({e}), falling back to direct search"
    • Exception details lost in user-facing output
    • Recommendation: Log full exception details via BMasterAI for debugging
  2. Inefficient API Usage - agents.py:162-186

    for q in questions:
        # Sequential API calls
        result = tavily.search(query=q, ...)
    • Sequential Tavily calls could be parallelized
    • Recommendation: Use asyncio or concurrent.futures for parallel searches
  3. Memory Growth - agents.py:158-160

    all_results = list(state.get("search_results", []))
    all_sources = list(state.get("sources", []))
    • Results accumulate across reflection loops without cleanup
    • Recommendation: Consider result size limits or pruning strategy

Low Priority

  1. Magic Numbers - agents.py:38

    MAX_REFLECTION_LOOPS = 2
    • Hardcoded configuration values
    • Recommendation: Move to environment variables or config file
  2. Type Safety Gap - agents.py:57

    tokens_used=getattr(response, "usage_metadata", {}).get("total_tokens", 0) if hasattr(response, "usage_metadata") else 0
    • Complex conditional logic for token extraction
    • Recommendation: Create helper function with proper typing

Performance Considerations

  • API Rate Limiting: No rate limiting for Tavily or Anthropic APIs
  • Memory Usage: Large search results accumulate in state without bounds
  • Latency: Sequential operations could benefit from parallelization
  • Token Efficiency: No token usage optimization for LLM calls

🛡️ Security Assessment

Good Practices

  • Environment variables for API keys
  • No sensitive data in version control
  • Proper .gitignore configuration

⚠️ Areas for Improvement

  • No input sanitization for research topics
  • BMasterAI logs might contain sensitive research queries
  • No rate limiting protection against abuse

📚 Documentation Quality

Excellent

  • Comprehensive README with usage examples
  • Clear architecture diagrams
  • Well-documented BMasterAI integration table
  • Good inline documentation

Minor Improvements

  • Add API rate limits to documentation
  • Include troubleshooting section
  • Document environment variable options

🚀 Recommended Next Steps

  1. Add comprehensive test suite (unit + integration)
  2. Implement proper resource cleanup in main.py
  3. Add input validation and sanitization
  4. Consider parallelizing Tavily searches for performance
  5. Add configuration management for magic numbers
  6. Implement error logging improvements

📊 Overall Assessment

This is a well-architected example that demonstrates excellent LangGraph usage and comprehensive BMasterAI integration. The code follows good Python practices with clear separation of concerns. However, it lacks test coverage and could benefit from better error handling and performance optimizations.

Recommendation: ✅ Approve with suggested improvements - This is ready for merge as an example, but consider the test coverage and input validation improvements for production use.


@claude
Copy link

claude bot commented Mar 14, 2026

Claude finished @ellucas-creator's task —— View job


Code Review: Deep Research Agent

I've completed a comprehensive review of this LangGraph + BMasterAI integration. Here's my detailed feedback:

Review Checklist

  • Read repository guidelines and setup instructions
  • Review main.py - entry point and overall architecture
  • Review agents.py - core agent implementation and BMasterAI integration
  • Review graph.py - LangGraph workflow definition
  • Review state.py - state management and data flow
  • Review configuration files (.env.example, requirements.txt, .gitignore)
  • Review README.md - documentation completeness
  • Analyze security concerns and best practices
  • Assess performance considerations and potential optimizations
  • Evaluate error handling and robustness
  • Check for missing test coverage
  • Provide comprehensive review feedback

Strengths

Architecture & Design

  • Excellent LangGraph workflow design with clear separation of concerns
  • Well-structured state management using TypedDict for type safety
  • Clean conditional routing logic for reflection loops with max iteration limit
  • Comprehensive BMasterAI telemetry integration throughout the pipeline

Code Quality

  • Good docstrings and inline documentation
  • Consistent error handling patterns across agents
  • Proper environment variable validation with clear error messages
  • Clean JSON parsing with graceful fallbacks

Security

  • API keys properly handled via environment variables
  • No hardcoded secrets in code
  • Appropriate .gitignore excluding sensitive files

🔍 Issues & Recommendations

High Priority

  1. Missing Test Coverage - agents.py:1-386

    • No unit tests for any agent functions
    • No integration tests for the LangGraph pipeline
    • No tests for error handling scenarios
    • Recommendation: Add test files covering agent logic, API failures, and edge cases
  2. Insufficient Input Validation - main.py:92-99

    # Current: basic strip() only
    topic = input("Research question: ").strip()
    if not topic:
        topic = "What is the current state of multi-agent AI systems in 2026?"
    • No validation for topic length or content
    • Recommendation: Add length limits and content validation
  3. Resource Leak Risk - agents.py:28-29

    monitor = get_monitor()
    monitor.start_monitoring()
    • Monitor started but never explicitly stopped
    • Recommendation: Add proper cleanup in main.py or use context manager

Medium Priority

  1. Error Information Loss - agents.py:110-112

    except Exception as e:
        sub_questions = [state["topic"]]
        reasoning = f"Parsing failed ({e}), falling back to direct search"
    • Exception details lost in user-facing output
    • Recommendation: Log full exception details via BMasterAI for debugging
  2. Inefficient API Usage - agents.py:162-186

    for q in questions:
        # Sequential API calls
        result = tavily.search(query=q, ...)
    • Sequential Tavily calls could be parallelized
    • Recommendation: Use asyncio or concurrent.futures for parallel searches
  3. Memory Growth - agents.py:158-160

    all_results = list(state.get("search_results", []))
    all_sources = list(state.get("sources", []))
    • Results accumulate across reflection loops without cleanup
    • Recommendation: Consider result size limits or pruning strategy

Low Priority

  1. Magic Numbers - agents.py:38

    MAX_REFLECTION_LOOPS = 2
    • Hardcoded configuration values
    • Recommendation: Move to environment variables or config file
  2. Type Safety Gap - agents.py:57

    tokens_used=getattr(response, "usage_metadata", {}).get("total_tokens", 0) if hasattr(response, "usage_metadata") else 0
    • Complex conditional logic for token extraction
    • Recommendation: Create helper function with proper typing

Performance Considerations

  • API Rate Limiting: No rate limiting for Tavily or Anthropic APIs
  • Memory Usage: Large search results accumulate in state without bounds
  • Latency: Sequential operations could benefit from parallelization
  • Token Efficiency: No token usage optimization for LLM calls

🛡️ Security Assessment

Good Practices

  • Environment variables for API keys
  • No sensitive data in version control
  • Proper .gitignore configuration

⚠️ Areas for Improvement

  • No input sanitization for research topics
  • BMasterAI logs might contain sensitive research queries
  • No rate limiting protection against abuse

📚 Documentation Quality

Excellent

  • Comprehensive README with usage examples
  • Clear architecture diagrams
  • Well-documented BMasterAI integration table
  • Good inline documentation

Minor Improvements

  • Add API rate limits to documentation
  • Include troubleshooting section
  • Document environment variable options

🚀 Recommended Next Steps

  1. Add comprehensive test suite (unit + integration)
  2. Implement proper resource cleanup in main.py
  3. Add input validation and sanitization
  4. Consider parallelizing Tavily searches for performance
  5. Add configuration management for magic numbers
  6. Implement error logging improvements

📊 Overall Assessment

This is a well-architected example that demonstrates excellent LangGraph usage and comprehensive BMasterAI integration. The code follows good Python practices with clear separation of concerns. However, it lacks test coverage and could benefit from better error handling and performance optimizations.

Recommendation: ✅ Approve with suggested improvements - This is ready for merge as an example, but consider the test coverage and input validation improvements for production use.


@travis-burmaster travis-burmaster merged commit b02369e into main Mar 14, 2026
1 check passed
@travis-burmaster travis-burmaster deleted the feat/deep-research-agent branch March 14, 2026 22:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants