SYLVA is a vision-first autonomous navigator designed for the Gemini Live Agent Challenge. Unlike traditional automation tools that rely on the fragile DOM, SYLVA perceives the web through raw pixels, using Gemini's visual reasoning to drive interaction with sub-5s latency.
The following diagram illustrates the interaction between the Local Sentinel (Driver), the Reasoning Core (Bridge), and Google's AI models.
graph TD
subgraph "Local Environment (User Machine)"
Driver["SYLVA Driver (Sentinel)"]
Browser["Target Browser (Playwright)"]
Overlay["Integrated UI Overlay"]
end
subgraph "Google Cloud (Reasoning Core)"
Bridge["SYLVA Bridge (FastAPI/Cloud Run)"]
Gemini["Gemini 2.5 Flash Lite (Vertex AI)"]
end
%% Interaction Flow
Browser -- "Raw Screenshots & Telemetry" --> Driver
Driver -- "Optimized JPEG & State" --> Bridge
Bridge -- "Visual Grounding Prompt" --> Gemini
Gemini -- "Structured Action JSON" --> Bridge
Bridge -- "SmartActionChip" --> Driver
Driver -- "Direct Logic Execution" --> Browser
Driver -- "Real-time Logs" --> Overlay
%% Styling
style Gemini fill:#4285F4,stroke:#333,stroke-width:2px,color:#fff
style Bridge fill:#34A853,stroke:#333,stroke-width:2px,color:#fff
style Driver fill:#FBBC05,stroke:#333,stroke-width:2px,color:#000
- Vision-First Constraint: Zero reliance on CSS selectors or element IDs. The system uses a normalized 0-1000 coordinate system derived from raw pixel analysis.
- Latency Optimization: Achieved a 4.09s P95 round-trip latency by utilizing multi-stage image optimization and the high-speed inference of Gemini 2.5 Flash Lite.
- Resilient Navigation: Capable of handling modern, dynamic web applications where DOM structures frequently change.
- Environment: Deployed on Google Cloud Run using
gcloudand ADC. - Browser: Successfully tested on complex web workflows using Playwright.
- Security: Implements a zero-secret policy using Google Cloud IAM roles.
Note: This document and the associated project components were created for the purposes of entering the Google Gemini Live Agent Challenge hackathon.
