feat: gemini-web-computer-agent — native function-calling loop with BMasterAI telemetry by travis-burmaster · Pull Request #52 · travis-burmaster/bmasterai

travis-burmaster · 2026-03-16T01:00:00Z

Summary

Adds examples/gemini-web-computer-agent/ — a bare-metal Gemini function-calling agent combining web search (Tavily) and computer use (screenshot/click/type/key/scroll), fully instrumented with BMasterAI logging and monitoring
Uses the Google GenAI SDK directly — no LangGraph, no framework — making this the Gemini counterpart to claude-web-computer-agent
Cross-platform computer use: Linux (xdotool + scrot) and macOS (cliclick + screencapture) with platform-aware error messages and install hints
Screenshots returned as multimodal Part.from_bytes image parts so Gemini can see the screen
BMasterAI telemetry on every LLM call, tool dispatch, decision point, and error path
Root README updated with the new entry at the top of the 2026 section

Files

File	Purpose
`tools.py`	`web_search` and `computer_use` schemas + cross-platform dispatch
`agent.py`	`WebComputerAgent` class — Gemini function-call loop with full BMasterAI instrumentation
`main.py`	CLI entry point — checks `GEMINI_API_KEY`, interactive fallback
`requirements.txt`	`google-genai>=0.5.0`, `bmasterai>=0.2.3`, `tavily-python`
`.env.example`	`GEMINI_API_KEY` + `TAVILY_API_KEY`

Test plan

pip install -r requirements.txt installs cleanly
python main.py "Search for latest Gemini news" runs without errors
python main.py "Take a screenshot" returns a computer_use result
logs/agent.jsonl is written with structured telemetry after each run

Generated with Claude (Cowork mode)

…MasterAI telemetry Adds examples/gemini-web-computer-agent/ — a bare-metal Gemini agent combining web search (Tavily) and computer use (screenshot/click/type/ key/scroll) using the Google GenAI SDK, fully instrumented with BMasterAI logging and monitoring. Cross-platform computer use: Linux (xdotool + scrot) and macOS (cliclick + screencapture) with platform-aware error messages. Also updates root README to list the new example at the top of the 2026 section. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

claude · 2026-03-16T01:00:16Z

Claude finished @travis-burmaster's task —— View job

Pull Request Review

Comprehensive review of the Gemini Web Computer Agent implementation covering code quality, security, performance, and architecture.

Todo List

Read repository guidelines (CLAUDE.md)
Review all changed files
Analyze code architecture and patterns
Check security practices
Review error handling and edge cases
Assess performance considerations
Check test coverage
Provide comprehensive feedback

Summary

This PR adds a well-architected Gemini agent that mirrors the existing Claude agent pattern. The implementation follows good practices with comprehensive BMasterAI instrumentation, cross-platform support, and proper error handling. However, there are several areas for improvement, particularly around security and robustness.

✅ Strengths

Architecture & Design:

Clean separation of concerns: agent.py (core loop), tools.py (tool implementations), main.py (CLI)
Follows the established pattern from claude-web-computer-agent for consistency
Proper use of Google GenAI SDK with function declarations and multimodal responses
Comprehensive BMasterAI telemetry throughout the entire agent lifecycle

Code Quality:

Good documentation and clear function signatures
Appropriate error handling with specific exception types
Cross-platform computer use support (Linux/macOS)
Sensible defaults and configuration options

BMasterAI Integration:

Excellent instrumentation coverage: agent start/stop, LLM calls, tool usage, errors
Proper use of event types and metadata logging
Performance tracking with duration measurements
Structured JSONL output for analytics

⚠️ Areas for Improvement

Security Concerns

1. Command Injection Vulnerability (tools.py:188-228)

subprocess.run(["xdotool", "type", "--clearmodifiers", "--", text], ...)

While the -- helps, there's still risk if text contains shell metacharacters. Consider input sanitization:

# Sanitize text input to prevent command injection
if not isinstance(text, str) or len(text) > 1000:
    return {"error": "Invalid text input"}

2. API Key Exposure Risk (agent.py:75-76)
The code correctly reads from environment variables, but consider adding validation:

if not api_key or len(api_key) < 10:
    raise ValueError("Invalid GEMINI_API_KEY")

3. Subprocess Timeout Handling (tools.py:139)
10-second timeout is reasonable, but consider making it configurable and adding resource limits.

Robustness Issues

4. Screenshot Error Handling (agent.py:323-346)
Base64 decoding could fail silently. The current error handling is good, but could be more specific:

except (base64.binascii.Error, ValueError) as e:
    # More specific error handling

5. Platform Detection (tools.py:149)
Uses sys.platform == "darwin" which is correct, but could benefit from explicit platform validation:

SUPPORTED_PLATFORMS = {"linux", "darwin"}
if sys.platform not in SUPPORTED_PLATFORMS:
    return {"error": f"Unsupported platform: {sys.platform}"}

Performance Considerations

6. Token Limit Management (agent.py:41)
MAX_TOKENS = 4096 seems low for multimodal content. Consider:

Dynamic token estimation for images
Context window management for long conversations
Warning when approaching limits

7. Image Compression (tools.py:167)
Screenshot quality is hardcoded at 80%. Consider:

Making quality configurable
Adaptive compression based on image size
PNG vs JPEG based on content type

Code Quality Issues

8. Magic Numbers (agent.py:40)

MAX_TURNS = 20
MAX_TOKENS = 4096

Consider making these configurable via environment variables or constructor parameters.

9. Error Message Consistency (tools.py:240-254)
Install commands are platform-specific but could be more helpful:

if sys.platform == "darwin":
    install_cmd = f"brew install {tool} || 'Install Homebrew first: /bin/bash -c \"$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)\"'"

Missing Test Coverage

The implementation lacks tests, which is concerning for a tool that executes system commands and handles external APIs. Recommend adding:

Unit tests for tool functions:
- Mock subprocess calls
- Test error conditions
- Validate input sanitization
Integration tests:
- Mock API responses
- Test agent loop logic
- Validate telemetry output
Security tests:
- Command injection attempts
- Malformed inputs
- Resource exhaustion scenarios

Recommendations

Immediate (Pre-merge):

Add input validation for text parameter in computer_use tool
Make timeouts and limits configurable
Add platform validation with helpful error messages
Consider adding a simple test suite

Future Improvements:

Implement context window management
Add configurable image compression
Enhanced error reporting with suggestions
Rate limiting for API calls
Memory usage monitoring for large screenshots

Model Configuration Issue

Minor: Line 39 uses "gemini-3-flash-preview" - verify this is the correct model identifier for the Google GenAI SDK.

Documentation

The README is excellent and follows the established pattern. Code comments are clear and helpful.

Verdict

This is a solid implementation that demonstrates good engineering practices and proper BMasterAI integration. The security concerns are manageable and the architecture is clean. With the recommended security improvements, this would be ready for merge.

Recommendation: ✅ Approve with minor security fixes

Replace three comments that still said "Claude" with "Gemini": - Module docstring header - Section comment above tool schemas - ALL_TOOL_SCHEMAS list comment Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

claude · 2026-03-16T12:52:21Z

Claude finished @travis-burmaster's task —— View job

Pull Request Review

Comprehensive review of the Gemini Web Computer Agent implementation covering code quality, security, performance, and architecture.

Todo List

Read repository guidelines (CLAUDE.md)
Review all changed files
Analyze code architecture and patterns
Check security practices
Review error handling and edge cases
Assess performance considerations
Check test coverage
Provide comprehensive feedback

Summary

This PR adds a well-architected Gemini agent that mirrors the existing Claude agent pattern. The implementation follows good practices with comprehensive BMasterAI instrumentation, cross-platform support, and proper error handling. However, there are several areas for improvement, particularly around security and robustness.

✅ Strengths

Architecture & Design:

Clean separation of concerns: agent.py (core loop), tools.py (tool implementations), main.py (CLI)
Follows the established pattern from claude-web-computer-agent for consistency
Proper use of Google GenAI SDK with function declarations and multimodal responses
Comprehensive BMasterAI telemetry throughout the entire agent lifecycle

Code Quality:

Good documentation and clear function signatures
Appropriate error handling with specific exception types
Cross-platform computer use support (Linux/macOS)
Sensible defaults and configuration options

BMasterAI Integration:

Excellent instrumentation coverage: agent start/stop, LLM calls, tool usage, errors
Proper use of event types and metadata logging
Performance tracking with duration measurements
Structured JSONL output for analytics

⚠️ Areas for Improvement

Security Concerns

1. Command Injection Vulnerability (tools.py:188-228)

subprocess.run(["xdotool", "type", "--clearmodifiers", "--", text], ...)

While the -- helps, there's still risk if text contains shell metacharacters. Consider input sanitization:

# Sanitize text input to prevent command injection
if not isinstance(text, str) or len(text) > 1000:
    return {"error": "Invalid text input"}

2. API Key Exposure Risk (agent.py:75-76)
The code correctly reads from environment variables, but consider adding validation:

if not api_key or len(api_key) < 10:
    raise ValueError("Invalid GEMINI_API_KEY")

3. Subprocess Timeout Handling (tools.py:139)
10-second timeout is reasonable, but consider making it configurable and adding resource limits.

Robustness Issues

4. Screenshot Error Handling (agent.py:323-346)
Base64 decoding could fail silently. The current error handling is good, but could be more specific:

except (base64.binascii.Error, ValueError) as e:
    # More specific error handling

5. Platform Detection (tools.py:149)
Uses sys.platform == "darwin" which is correct, but could benefit from explicit platform validation:

SUPPORTED_PLATFORMS = {"linux", "darwin"}
if sys.platform not in SUPPORTED_PLATFORMS:
    return {"error": f"Unsupported platform: {sys.platform}"}

Performance Considerations

6. Token Limit Management (agent.py:41)
MAX_TOKENS = 4096 seems low for multimodal content. Consider:

Dynamic token estimation for images
Context window management for long conversations
Warning when approaching limits

7. Image Compression (tools.py:167)
Screenshot quality is hardcoded at 80%. Consider:

Making quality configurable
Adaptive compression based on image size
PNG vs JPEG based on content type

Code Quality Issues

8. Magic Numbers (agent.py:40)

MAX_TURNS = 20
MAX_TOKENS = 4096

Consider making these configurable via environment variables or constructor parameters.

9. Error Message Consistency (tools.py:240-254)
Install commands are platform-specific but could be more helpful:

if sys.platform == "darwin":
    install_cmd = f"brew install {tool} || 'Install Homebrew first: /bin/bash -c \"$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)\"'"

Missing Test Coverage

The implementation lacks tests, which is concerning for a tool that executes system commands and handles external APIs. Recommend adding:

Unit tests for tool functions:
- Mock subprocess calls
- Test error conditions
- Validate input sanitization
Integration tests:
- Mock API responses
- Test agent loop logic
- Validate telemetry output
Security tests:
- Command injection attempts
- Malformed inputs
- Resource exhaustion scenarios

Recommendations

Immediate (Pre-merge):

Add input validation for text parameter in computer_use tool
Make timeouts and limits configurable
Add platform validation with helpful error messages
Consider adding a simple test suite

Future Improvements:

Implement context window management
Add configurable image compression
Enhanced error reporting with suggestions
Rate limiting for API calls
Memory usage monitoring for large screenshots

Model Configuration Issue

Minor: Line 39 uses "gemini-3-flash-preview" - verify this is the correct model identifier for the Google GenAI SDK.

Documentation

The README is excellent and follows the established pattern. Code comments are clear and helpful.

Verdict

This is a solid implementation that demonstrates good engineering practices and proper BMasterAI integration. The security concerns are manageable and the architecture is clean. With the recommended security improvements, this would be ready for merge.

Recommendation: ✅ Approve with minor security fixes

fix: remove stale Claude references in tools.py

50483ec

Replace three comments that still said "Claude" with "Gemini": - Module docstring header - Section comment above tool schemas - ALL_TOOL_SCHEMAS list comment Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

travis-burmaster merged commit 8188340 into main Mar 16, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: gemini-web-computer-agent — native function-calling loop with BMasterAI telemetry#52

feat: gemini-web-computer-agent — native function-calling loop with BMasterAI telemetry#52
travis-burmaster merged 2 commits intomainfrom
feat/gemini-web-computer-agent

travis-burmaster commented Mar 16, 2026

Uh oh!

claude bot commented Mar 16, 2026 •

edited

Loading

Uh oh!

claude bot commented Mar 16, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

travis-burmaster commented Mar 16, 2026

Summary

Files

Test plan

Uh oh!

claude bot commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Review

Todo List

Summary

✅ Strengths

⚠️ Areas for Improvement

Security Concerns

Robustness Issues

Performance Considerations

Code Quality Issues

Missing Test Coverage

Recommendations

Immediate (Pre-merge):

Future Improvements:

Model Configuration Issue

Documentation

Verdict

Uh oh!

claude bot commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Review

Todo List

Summary

✅ Strengths

⚠️ Areas for Improvement

Security Concerns

Robustness Issues

Performance Considerations

Code Quality Issues

Missing Test Coverage

Recommendations

Immediate (Pre-merge):

Future Improvements:

Model Configuration Issue

Documentation

Verdict

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude bot commented Mar 16, 2026 •

edited

Loading

claude bot commented Mar 16, 2026 •

edited

Loading