Skip to content

feat: claude-web-computer-agent — web search + computer use with BMasterAI telemetry#51

Merged
travis-burmaster merged 3 commits intomainfrom
feat/claude-web-computer-agent
Mar 15, 2026
Merged

feat: claude-web-computer-agent — web search + computer use with BMasterAI telemetry#51
travis-burmaster merged 3 commits intomainfrom
feat/claude-web-computer-agent

Conversation

@travis-burmaster
Copy link
Owner

Summary

  • Adds examples/claude-web-computer-agent/ — a bare-metal Claude tool-use agent combining web search (Tavily) and computer use (screenshot/click/type/key/scroll), fully instrumented with BMasterAI logging and monitoring
  • No frameworks — just the raw Anthropic SDK tool_use / tool_result cycle, making this the foundational reference before the LangGraph examples
  • Screenshots are returned as multimodal image blocks so Claude can actually see the screen
  • BMasterAI telemetry on every LLM call, tool dispatch, decision point, and error path — output to logs/agent.log, logs/agent.jsonl, and logs/reasoning/

Files

File Purpose
tools.py web_search and computer_use schemas + dispatch
agent.py WebComputerAgent class — tool-use loop with full BMasterAI instrumentation
main.py CLI entry point with env checks and interactive fallback
requirements.txt bmasterai>=0.2.3, anthropic>=0.40.0, tavily-python
.env.example ANTHROPIC_API_KEY + TAVILY_API_KEY

Test plan

  • pip install -r requirements.txt installs cleanly
  • python main.py "Search for latest Anthropic news" runs without errors
  • python main.py "Take a screenshot" returns a computer_use result
  • logs/agent.jsonl is written with structured telemetry after each run

Generated with Claude (Cowork mode)

…terAI telemetry

Adds a bare-metal Claude tool-use agent combining web search (Tavily)
and computer use (screenshot/click/type/key/scroll) — no frameworks,
just the Anthropic SDK — fully instrumented with BMasterAI logging
and monitoring.

What it demonstrates:
- The raw Anthropic tool_use / tool_result message cycle
- Multimodal tool_result (screenshots sent back as image blocks)
- BMasterAI telemetry on every LLM call, tool dispatch, and decision point
- Structured JSONL telemetry output for downstream analytics

Stack: Anthropic SDK, Tavily, xdotool + scrot, BMasterAI

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@claude
Copy link

claude bot commented Mar 15, 2026

Claude finished @travis-burmaster's task —— View job


Code Review Complete ✅

Todo List

  • Read repository guidelines (CLAUDE.md) - No CLAUDE.md found
  • Review all 7 new files in the PR
  • Analyze code quality and best practices
  • Check for potential bugs and security issues
  • Evaluate performance considerations
  • Assess test coverage
  • Provide comprehensive feedback

Overall Assessment

This is a well-structured and thoroughly documented example implementation that demonstrates the core tool-use pattern for Claude agents. The code quality is high with comprehensive BMasterAI telemetry integration.

✅ Strengths

Code Quality & Architecture

  • Clean separation of concerns: Tools, agent logic, and CLI are properly separated
  • Comprehensive documentation: Excellent README with clear examples and setup instructions
  • Robust error handling: Graceful degradation when dependencies are missing
  • Defensive programming: Proper validation of required parameters before tool execution
  • Consistent code style: Well-formatted with clear docstrings

Security & Best Practices

  • Environment variable handling: Proper use of environment variables for API keys
  • No hardcoded secrets: Template provided in .env.example
  • Input validation: Parameters validated before tool execution
  • Subprocess security: Uses check=True and proper argument handling

Performance Considerations

  • Efficient token usage: Truncates long content (snippets to 500 chars, logs to 200 chars)
  • Resource management: Proper cleanup in finally blocks
  • Caching: Base64 images stripped from logs to reduce memory usage
  • Timeout handling: MAX_TURNS prevents infinite loops

⚠️ Areas for Improvement

Security Issues

  1. Command Injection Risk (tools.py:164, 170, 176, 186)

    # Current - vulnerable to injection
    subprocess.run(["xdotool", "type", "--clearmodifiers", text], check=True)
    
    # Safer approach
    subprocess.run(["xdotool", "type", "--clearmodifiers", "--", text], check=True)

    Impact: Malicious input could execute arbitrary commands if text contains shell metacharacters.

  2. Path Traversal (tools.py:148-149)

    # Consider validating output path for screenshot
    result = subprocess.run(["scrot", "-", "--quality", "80"], ...)

    Recommendation: While current usage is safe, consider validating any file paths in future extensions.

Error Handling & Reliability

  1. Incomplete Error Context (agent.py:171-172)

    except Exception as e:
        # Missing context about which operation failed
        self.monitor.track_error(AGENT_ID, type(e).__name__)

    Recommendation: Include more context (turn number, operation type, input parameters).

  2. Resource Cleanup (tools.py:147-159)

    # No cleanup if scrot process hangs
    result = subprocess.run(["scrot", "-", "--quality", "80"], capture_output=True, check=True)

    Recommendation: Add timeout parameter to prevent hanging processes.

Code Quality Enhancements

  1. Type Hints (tools.py:139, agent.py:189)

    # Current
    def _run_computer_use(action: str, x: int = None, ...)
    
    # Better
    def _run_computer_use(action: str, x: Optional[int] = None, ...)
  2. Constants (agent.py:37-39)

    # Consider making these configurable
    DEFAULT_MODEL = "claude-opus-4-6"
    MAX_TURNS = 20
    MAX_TOKENS = 4096

Test Coverage

  1. Missing Tests: No test files included
    • Recommendation: Add unit tests for:
      • Tool dispatch functions
      • Error handling scenarios
      • Environment variable validation
      • BMasterAI telemetry integration

Documentation

  1. API Model Reference (agent.py:37, main.py:70)
    DEFAULT_MODEL = "claude-opus-4-6"  # This model may not exist
    Recommendation: Use a stable model reference like claude-3-5-sonnet-20241022 or make model configurable.

🔧 Specific Recommendations

Immediate (Security)

  1. Add -- separator to xdotool commands to prevent command injection
  2. Add subprocess timeouts to prevent hanging processes

Short Term (Reliability)

  1. Enhance error messages with operation context
  2. Add input validation for tool parameters
  3. Make model and limits configurable via environment variables

Long Term (Maintainability)

  1. Add comprehensive test suite
  2. Consider adding retry logic for failed tool operations
  3. Add configuration validation on startup

📊 Performance Analysis

  • Token efficiency: ✅ Good truncation strategies
  • Memory usage: ✅ Base64 data properly stripped from logs
  • Network efficiency: ✅ Reasonable API call patterns
  • Resource limits: ✅ MAX_TURNS prevents runaway execution

🏆 Best Practices Followed

  • ✅ Comprehensive telemetry and monitoring
  • ✅ Clear error messages and logging
  • ✅ Modular, testable design
  • ✅ Proper dependency management
  • ✅ Excellent documentation and examples

Recommendation: APPROVE with suggestions

This is a solid foundational example that demonstrates best practices for Claude tool-use agents. The security issues are minor and easily addressable. The comprehensive BMasterAI integration makes this an excellent reference implementation.


Lists the new example at the top of the 2026 section as the
foundational tool-use reference. Also removes stale `NEW` badge
from deep-research-agent since it's no longer the newest entry.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@claude
Copy link

claude bot commented Mar 15, 2026

Claude finished @travis-burmaster's task —— View job


Claude Code Review Complete

Todo List

  • Read repository guidelines (CLAUDE.md) - No CLAUDE.md found
  • Review all 8 new files in the PR
  • Analyze code quality and best practices
  • Check for potential bugs and security issues
  • Evaluate performance considerations
  • Assess test coverage
  • Provide comprehensive feedback

Overall Assessment

This is a well-structured and thoroughly documented example implementation that demonstrates the core tool-use pattern for Claude agents. The code quality is high with comprehensive BMasterAI telemetry integration.

✅ Strengths

Code Quality & Architecture

  • Clean separation of concerns: Tools, agent logic, and CLI are properly separated
  • Comprehensive documentation: Excellent README with clear examples and setup instructions
  • Robust error handling: Graceful degradation when dependencies are missing
  • Defensive programming: Proper validation of required parameters before tool execution
  • Consistent code style: Well-formatted with clear docstrings

Security & Best Practices

  • Environment variable handling: Proper use of environment variables for API keys
  • No hardcoded secrets: Template provided in .env.example
  • Input validation: Parameters validated before tool execution
  • Subprocess security: Uses check=True and proper argument handling

Performance Considerations

  • Efficient token usage: Truncates long content (snippets to 500 chars, logs to 200 chars)
  • Resource management: Proper cleanup in finally blocks
  • Caching: Base64 images stripped from logs to reduce memory usage
  • Timeout handling: MAX_TURNS prevents infinite loops

⚠️ Areas for Improvement

Security Issues

  1. Command Injection Risk (tools.py:164, 170, 176, 186)

    # Current - vulnerable to injection
    subprocess.run(["xdotool", "type", "--clearmodifiers", text], check=True)
    
    # Safer approach
    subprocess.run(["xdotool", "type", "--clearmodifiers", "--", text], check=True)

    Impact: Malicious input could execute arbitrary commands if text contains shell metacharacters.

  2. Path Traversal (tools.py:148-149)

    # Consider validating output path for screenshot
    result = subprocess.run(["scrot", "-", "--quality", "80"], ...)

    Recommendation: While current usage is safe, consider validating any file paths in future extensions.

Error Handling & Reliability

  1. Incomplete Error Context (agent.py:171-172)

    except Exception as e:
        # Missing context about which operation failed
        self.monitor.track_error(AGENT_ID, type(e).__name__)

    Recommendation: Include more context (turn number, operation type, input parameters).

  2. Resource Cleanup (tools.py:147-159)

    # No cleanup if scrot process hangs
    result = subprocess.run(["scrot", "-", "--quality", "80"], capture_output=True, check=True)

    Recommendation: Add timeout parameter to prevent hanging processes.

Code Quality Enhancements

  1. Type Hints (tools.py:139, agent.py:189)

    # Current
    def _run_computer_use(action: str, x: int = None, ...)
    
    # Better
    def _run_computer_use(action: str, x: Optional[int] = None, ...)
  2. Constants (agent.py:37-39)

    # Consider making these configurable
    DEFAULT_MODEL = "claude-opus-4-6"
    MAX_TURNS = 20
    MAX_TOKENS = 4096

Test Coverage

  1. Missing Tests: No test files included
    • Recommendation: Add unit tests for:
      • Tool dispatch functions
      • Error handling scenarios
      • Environment variable validation
      • BMasterAI telemetry integration

Documentation

  1. API Model Reference (agent.py:37, main.py:70)
    DEFAULT_MODEL = "claude-opus-4-6"  # This model may not exist
    Recommendation: Use a stable model reference like claude-3-5-sonnet-20241022 or make model configurable.

🔧 Specific Recommendations

Immediate (Security)

  1. Add -- separator to xdotool commands to prevent command injection
  2. Add subprocess timeouts to prevent hanging processes

Short Term (Reliability)

  1. Enhance error messages with operation context
  2. Add input validation for tool parameters
  3. Make model and limits configurable via environment variables

Long Term (Maintainability)

  1. Add comprehensive test suite
  2. Consider adding retry logic for failed tool operations
  3. Add configuration validation on startup

📊 Performance Analysis

  • Token efficiency: ✅ Good truncation strategies
  • Memory usage: ✅ Base64 data properly stripped from logs
  • Network efficiency: ✅ Reasonable API call patterns
  • Resource limits: ✅ MAX_TURNS prevents runaway execution

🏆 Best Practices Followed

  • ✅ Comprehensive telemetry and monitoring
  • ✅ Clear error messages and logging
  • ✅ Modular, testable design
  • ✅ Proper dependency management
  • ✅ Excellent documentation and examples

Recommendation: APPROVE with suggestions

This is a solid foundational example that demonstrates best practices for Claude tool-use agents. The security issues are minor and easily addressable. The comprehensive BMasterAI integration makes this an excellent reference implementation.


tools.py:
- Add -- end-of-options sentinel to xdotool type and key calls to
  prevent text/key names from being interpreted as CLI flags
- Add timeout=10s to all subprocess calls to prevent hangs if
  the display is unresponsive
- Fix Optional[int] / Optional[str] type hints on _run_computer_use

agent.py:
- Include error_type, message_count, and traceback (limit 5 frames)
  in the exception handler metadata for better observability

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@claude
Copy link

claude bot commented Mar 15, 2026

Claude finished @travis-burmaster's task —— View job


Claude Code Review Complete

Todo List

  • Read repository guidelines (CLAUDE.md) - No CLAUDE.md found
  • Review all 8 new files in the PR
  • Analyze code quality and best practices
  • Check for potential bugs and security issues
  • Evaluate performance considerations
  • Assess test coverage
  • Provide comprehensive feedback

Overall Assessment

This is a well-structured and thoroughly documented example implementation that demonstrates the core tool-use pattern for Claude agents. The code quality is high with comprehensive BMasterAI telemetry integration.

✅ Strengths

Code Quality & Architecture

  • Clean separation of concerns: Tools, agent logic, and CLI are properly separated
  • Comprehensive documentation: Excellent README with clear examples and setup instructions
  • Robust error handling: Graceful degradation when dependencies are missing
  • Defensive programming: Proper validation of required parameters before tool execution
  • Consistent code style: Well-formatted with clear docstrings

Security & Best Practices

  • Environment variable handling: Proper use of environment variables for API keys
  • No hardcoded secrets: Template provided in .env.example
  • Input validation: Parameters validated before tool execution
  • Subprocess security: Uses check=True and proper argument handling

Performance Considerations

  • Efficient token usage: Truncates long content (snippets to 500 chars, logs to 200 chars)
  • Resource management: Proper cleanup in finally blocks
  • Caching: Base64 images stripped from logs to reduce memory usage
  • Timeout handling: MAX_TURNS prevents infinite loops

⚠️ Areas for Improvement

Security Issues

  1. Command Injection Risk (tools.py:177, 188)

    # Current - vulnerable to injection
    subprocess.run(["xdotool", "type", "--clearmodifiers", "--", text], check=True)

    Status: ✅ Already properly protected with "--" separator

  2. Subprocess Timeout Protection (tools.py:153, 169, 179, 189, 201)

    # Good: timeout parameter already added
    timeout=_SUBPROCESS_TIMEOUT,

    Status: ✅ Already implemented properly

Error Handling & Reliability

  1. Model Reference (agent.py:37, main.py:70)

    DEFAULT_MODEL = "claude-opus-4-6"  # May not exist

    Recommendation: Consider using a stable model reference like claude-3-5-sonnet-20241022 or make model configurable via environment variable.

  2. Exception Context (agent.py:178)

    self.monitor.track_error(AGENT_ID, type(e).__name__)

    Recommendation: Could include more context like turn number and operation type.

Code Quality Enhancements

  1. Type Hints (tools.py:141)

    def _run_computer_use(action: str, x: Optional[int] = None, ...)

    Status: ✅ Type hints are properly used throughout

  2. Constants Configuration (agent.py:37-39)

    # Consider making these configurable
    DEFAULT_MODEL = "claude-opus-4-6"
    MAX_TURNS = 20
    MAX_TOKENS = 4096

    Recommendation: Make these configurable via environment variables or constructor parameters.

Test Coverage

  1. Missing Tests: No test files included
    • Recommendation: Add unit tests for:
      • Tool dispatch functions
      • Error handling scenarios
      • Environment variable validation
      • BMasterAI telemetry integration

🔧 Specific Recommendations

Short Term (Reliability)

  1. Use a stable model reference or make model configurable
  2. Add configuration validation on startup
  3. Consider making MAX_TURNS and MAX_TOKENS configurable

Long Term (Maintainability)

  1. Add comprehensive test suite
  2. Consider adding retry logic for failed tool operations
  3. Add integration tests with mock services

📊 Performance Analysis

  • Token efficiency: ✅ Good truncation strategies (tools.py:128, agent.py:266)
  • Memory usage: ✅ Base64 data properly stripped from logs (agent.py:338)
  • Network efficiency: ✅ Reasonable API call patterns
  • Resource limits: ✅ MAX_TURNS prevents runaway execution
  • Subprocess management: ✅ Proper timeouts and error handling

🛡️ Security Analysis

  • API key management: ✅ Environment variables, no hardcoded secrets
  • Subprocess execution: ✅ Proper use of check=True and -- separators
  • Input validation: ✅ Required parameters validated before execution
  • Path traversal: ✅ No user-controllable file paths
  • Command injection: ✅ Properly protected with argument arrays and separators

🏆 Best Practices Followed

  • ✅ Comprehensive telemetry and monitoring
  • ✅ Clear error messages and logging
  • ✅ Modular, testable design
  • ✅ Proper dependency management
  • ✅ Excellent documentation and examples
  • ✅ Graceful degradation for missing dependencies
  • ✅ Resource cleanup and timeout handling

Recommendation: APPROVE

This is an excellent foundational example that demonstrates best practices for Claude tool-use agents. The implementation is secure, well-documented, and follows good software engineering practices. The comprehensive BMasterAI integration makes this a valuable reference implementation.

Minor improvements around model configuration and test coverage would enhance the codebase, but the current implementation is production-ready and serves as an excellent learning resource.


@travis-burmaster travis-burmaster merged commit 1253c4a into main Mar 15, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant