Skip to content

feat: gemini-web-computer-agent — native function-calling loop with BMasterAI telemetry#52

Merged
travis-burmaster merged 2 commits intomainfrom
feat/gemini-web-computer-agent
Mar 16, 2026
Merged

feat: gemini-web-computer-agent — native function-calling loop with BMasterAI telemetry#52
travis-burmaster merged 2 commits intomainfrom
feat/gemini-web-computer-agent

Conversation

@travis-burmaster
Copy link
Owner

Summary

  • Adds examples/gemini-web-computer-agent/ — a bare-metal Gemini function-calling agent combining web search (Tavily) and computer use (screenshot/click/type/key/scroll), fully instrumented with BMasterAI logging and monitoring
  • Uses the Google GenAI SDK directly — no LangGraph, no framework — making this the Gemini counterpart to claude-web-computer-agent
  • Cross-platform computer use: Linux (xdotool + scrot) and macOS (cliclick + screencapture) with platform-aware error messages and install hints
  • Screenshots returned as multimodal Part.from_bytes image parts so Gemini can see the screen
  • BMasterAI telemetry on every LLM call, tool dispatch, decision point, and error path
  • Root README updated with the new entry at the top of the 2026 section

Files

File Purpose
tools.py web_search and computer_use schemas + cross-platform dispatch
agent.py WebComputerAgent class — Gemini function-call loop with full BMasterAI instrumentation
main.py CLI entry point — checks GEMINI_API_KEY, interactive fallback
requirements.txt google-genai>=0.5.0, bmasterai>=0.2.3, tavily-python
.env.example GEMINI_API_KEY + TAVILY_API_KEY

Test plan

  • pip install -r requirements.txt installs cleanly
  • python main.py "Search for latest Gemini news" runs without errors
  • python main.py "Take a screenshot" returns a computer_use result
  • logs/agent.jsonl is written with structured telemetry after each run

Generated with Claude (Cowork mode)

…MasterAI telemetry

Adds examples/gemini-web-computer-agent/ — a bare-metal Gemini agent
combining web search (Tavily) and computer use (screenshot/click/type/
key/scroll) using the Google GenAI SDK, fully instrumented with
BMasterAI logging and monitoring.

Cross-platform computer use: Linux (xdotool + scrot) and macOS
(cliclick + screencapture) with platform-aware error messages.

Also updates root README to list the new example at the top of the
2026 section.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@claude
Copy link

claude bot commented Mar 16, 2026

Claude finished @travis-burmaster's task —— View job


Pull Request Review

Comprehensive review of the Gemini Web Computer Agent implementation covering code quality, security, performance, and architecture.

Todo List

  • Read repository guidelines (CLAUDE.md)
  • Review all changed files
  • Analyze code architecture and patterns
  • Check security practices
  • Review error handling and edge cases
  • Assess performance considerations
  • Check test coverage
  • Provide comprehensive feedback

Summary

This PR adds a well-architected Gemini agent that mirrors the existing Claude agent pattern. The implementation follows good practices with comprehensive BMasterAI instrumentation, cross-platform support, and proper error handling. However, there are several areas for improvement, particularly around security and robustness.

✅ Strengths

Architecture & Design:

  • Clean separation of concerns: agent.py (core loop), tools.py (tool implementations), main.py (CLI)
  • Follows the established pattern from claude-web-computer-agent for consistency
  • Proper use of Google GenAI SDK with function declarations and multimodal responses
  • Comprehensive BMasterAI telemetry throughout the entire agent lifecycle

Code Quality:

  • Good documentation and clear function signatures
  • Appropriate error handling with specific exception types
  • Cross-platform computer use support (Linux/macOS)
  • Sensible defaults and configuration options

BMasterAI Integration:

  • Excellent instrumentation coverage: agent start/stop, LLM calls, tool usage, errors
  • Proper use of event types and metadata logging
  • Performance tracking with duration measurements
  • Structured JSONL output for analytics

⚠️ Areas for Improvement

Security Concerns

1. Command Injection Vulnerability (tools.py:188-228)

subprocess.run(["xdotool", "type", "--clearmodifiers", "--", text], ...)

While the -- helps, there's still risk if text contains shell metacharacters. Consider input sanitization:

# Sanitize text input to prevent command injection
if not isinstance(text, str) or len(text) > 1000:
    return {"error": "Invalid text input"}

2. API Key Exposure Risk (agent.py:75-76)
The code correctly reads from environment variables, but consider adding validation:

if not api_key or len(api_key) < 10:
    raise ValueError("Invalid GEMINI_API_KEY")

3. Subprocess Timeout Handling (tools.py:139)
10-second timeout is reasonable, but consider making it configurable and adding resource limits.

Robustness Issues

4. Screenshot Error Handling (agent.py:323-346)
Base64 decoding could fail silently. The current error handling is good, but could be more specific:

except (base64.binascii.Error, ValueError) as e:
    # More specific error handling

5. Platform Detection (tools.py:149)
Uses sys.platform == "darwin" which is correct, but could benefit from explicit platform validation:

SUPPORTED_PLATFORMS = {"linux", "darwin"}
if sys.platform not in SUPPORTED_PLATFORMS:
    return {"error": f"Unsupported platform: {sys.platform}"}

Performance Considerations

6. Token Limit Management (agent.py:41)
MAX_TOKENS = 4096 seems low for multimodal content. Consider:

  • Dynamic token estimation for images
  • Context window management for long conversations
  • Warning when approaching limits

7. Image Compression (tools.py:167)
Screenshot quality is hardcoded at 80%. Consider:

  • Making quality configurable
  • Adaptive compression based on image size
  • PNG vs JPEG based on content type

Code Quality Issues

8. Magic Numbers (agent.py:40)

MAX_TURNS = 20
MAX_TOKENS = 4096

Consider making these configurable via environment variables or constructor parameters.

9. Error Message Consistency (tools.py:240-254)
Install commands are platform-specific but could be more helpful:

if sys.platform == "darwin":
    install_cmd = f"brew install {tool} || 'Install Homebrew first: /bin/bash -c \"$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)\"'"

Missing Test Coverage

The implementation lacks tests, which is concerning for a tool that executes system commands and handles external APIs. Recommend adding:

  1. Unit tests for tool functions:

    • Mock subprocess calls
    • Test error conditions
    • Validate input sanitization
  2. Integration tests:

    • Mock API responses
    • Test agent loop logic
    • Validate telemetry output
  3. Security tests:

    • Command injection attempts
    • Malformed inputs
    • Resource exhaustion scenarios

Recommendations

Immediate (Pre-merge):

  1. Add input validation for text parameter in computer_use tool
  2. Make timeouts and limits configurable
  3. Add platform validation with helpful error messages
  4. Consider adding a simple test suite

Future Improvements:

  1. Implement context window management
  2. Add configurable image compression
  3. Enhanced error reporting with suggestions
  4. Rate limiting for API calls
  5. Memory usage monitoring for large screenshots

Model Configuration Issue

Minor: Line 39 uses "gemini-3-flash-preview" - verify this is the correct model identifier for the Google GenAI SDK.

Documentation

The README is excellent and follows the established pattern. Code comments are clear and helpful.

Verdict

This is a solid implementation that demonstrates good engineering practices and proper BMasterAI integration. The security concerns are manageable and the architecture is clean. With the recommended security improvements, this would be ready for merge.

Recommendation: ✅ Approve with minor security fixes


Replace three comments that still said "Claude" with "Gemini":
  - Module docstring header
  - Section comment above tool schemas
  - ALL_TOOL_SCHEMAS list comment

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@claude
Copy link

claude bot commented Mar 16, 2026

Claude finished @travis-burmaster's task —— View job


Pull Request Review

Comprehensive review of the Gemini Web Computer Agent implementation covering code quality, security, performance, and architecture.

Todo List

  • Read repository guidelines (CLAUDE.md)
  • Review all changed files
  • Analyze code architecture and patterns
  • Check security practices
  • Review error handling and edge cases
  • Assess performance considerations
  • Check test coverage
  • Provide comprehensive feedback

Summary

This PR adds a well-architected Gemini agent that mirrors the existing Claude agent pattern. The implementation follows good practices with comprehensive BMasterAI instrumentation, cross-platform support, and proper error handling. However, there are several areas for improvement, particularly around security and robustness.

✅ Strengths

Architecture & Design:

  • Clean separation of concerns: agent.py (core loop), tools.py (tool implementations), main.py (CLI)
  • Follows the established pattern from claude-web-computer-agent for consistency
  • Proper use of Google GenAI SDK with function declarations and multimodal responses
  • Comprehensive BMasterAI telemetry throughout the entire agent lifecycle

Code Quality:

  • Good documentation and clear function signatures
  • Appropriate error handling with specific exception types
  • Cross-platform computer use support (Linux/macOS)
  • Sensible defaults and configuration options

BMasterAI Integration:

  • Excellent instrumentation coverage: agent start/stop, LLM calls, tool usage, errors
  • Proper use of event types and metadata logging
  • Performance tracking with duration measurements
  • Structured JSONL output for analytics

⚠️ Areas for Improvement

Security Concerns

1. Command Injection Vulnerability (tools.py:188-228)

subprocess.run(["xdotool", "type", "--clearmodifiers", "--", text], ...)

While the -- helps, there's still risk if text contains shell metacharacters. Consider input sanitization:

# Sanitize text input to prevent command injection
if not isinstance(text, str) or len(text) > 1000:
    return {"error": "Invalid text input"}

2. API Key Exposure Risk (agent.py:75-76)
The code correctly reads from environment variables, but consider adding validation:

if not api_key or len(api_key) < 10:
    raise ValueError("Invalid GEMINI_API_KEY")

3. Subprocess Timeout Handling (tools.py:139)
10-second timeout is reasonable, but consider making it configurable and adding resource limits.

Robustness Issues

4. Screenshot Error Handling (agent.py:323-346)
Base64 decoding could fail silently. The current error handling is good, but could be more specific:

except (base64.binascii.Error, ValueError) as e:
    # More specific error handling

5. Platform Detection (tools.py:149)
Uses sys.platform == "darwin" which is correct, but could benefit from explicit platform validation:

SUPPORTED_PLATFORMS = {"linux", "darwin"}
if sys.platform not in SUPPORTED_PLATFORMS:
    return {"error": f"Unsupported platform: {sys.platform}"}

Performance Considerations

6. Token Limit Management (agent.py:41)
MAX_TOKENS = 4096 seems low for multimodal content. Consider:

  • Dynamic token estimation for images
  • Context window management for long conversations
  • Warning when approaching limits

7. Image Compression (tools.py:167)
Screenshot quality is hardcoded at 80%. Consider:

  • Making quality configurable
  • Adaptive compression based on image size
  • PNG vs JPEG based on content type

Code Quality Issues

8. Magic Numbers (agent.py:40)

MAX_TURNS = 20
MAX_TOKENS = 4096

Consider making these configurable via environment variables or constructor parameters.

9. Error Message Consistency (tools.py:240-254)
Install commands are platform-specific but could be more helpful:

if sys.platform == "darwin":
    install_cmd = f"brew install {tool} || 'Install Homebrew first: /bin/bash -c \"$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)\"'"

Missing Test Coverage

The implementation lacks tests, which is concerning for a tool that executes system commands and handles external APIs. Recommend adding:

  1. Unit tests for tool functions:

    • Mock subprocess calls
    • Test error conditions
    • Validate input sanitization
  2. Integration tests:

    • Mock API responses
    • Test agent loop logic
    • Validate telemetry output
  3. Security tests:

    • Command injection attempts
    • Malformed inputs
    • Resource exhaustion scenarios

Recommendations

Immediate (Pre-merge):

  1. Add input validation for text parameter in computer_use tool
  2. Make timeouts and limits configurable
  3. Add platform validation with helpful error messages
  4. Consider adding a simple test suite

Future Improvements:

  1. Implement context window management
  2. Add configurable image compression
  3. Enhanced error reporting with suggestions
  4. Rate limiting for API calls
  5. Memory usage monitoring for large screenshots

Model Configuration Issue

Minor: Line 39 uses "gemini-3-flash-preview" - verify this is the correct model identifier for the Google GenAI SDK.

Documentation

The README is excellent and follows the established pattern. Code comments are clear and helpful.

Verdict

This is a solid implementation that demonstrates good engineering practices and proper BMasterAI integration. The security concerns are manageable and the architecture is clean. With the recommended security improvements, this would be ready for merge.

Recommendation: ✅ Approve with minor security fixes


@travis-burmaster travis-burmaster merged commit 8188340 into main Mar 16, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant