Heybro Architecture & Improvement Plan

Executive Summary

Heybro is a Chrome extension agent that automates browser tasks using a "Human-in-the-Loop" approach. It uses a background service worker for state management, a side panel for UI, and injected content scripts for DOM interaction. The core logic relies on a Planner (LLM) to generate steps and an Agent loop to execute them.

Current Status: Functional but inefficient in token usage, prone to "element not found" errors on dynamic pages, and limited in long-running task capability due to context window constraints and hard-coded limits.

Goal: Evolve into a "Zero Point Failure" system that is token-efficient, robust to DOM changes, and capable of sustained automation.

1. System Components

1.1. Background Service (`background.js`)

Role: Central State Manager & Message Broker.
Responsibilities:
- Maintains the "Single Source of Truth" (Tabs, Task Context, Action History).
- Handles tab registry and updates.
- Persists state (in-memory currently, moving to storage.local).
- Coordinates communication between Side Panel, Content Scripts, and Agent.

1.2. Agent Core (`lib/agent.js`)

Role: The "Brain" and Execution Loop.
Responsibilities:
- Loop: startAutoRun manages the step-by-step execution (currently limited to 50 steps).
- Planning: Calls lib/planner.js to get the next action.
- Execution: Dispatches commands to content.js via lib/tools.js.
- Recovery: Implements basic retry logic and "Zero Point Failure" element injection.

1.3. Planner (`lib/planner.js`)

Role: LLM Interface.
Responsibilities:
- Constructs the prompt (System Prompt + Context + History).
- Parses LLM responses into structured JSON (Thought, Call, Subtask Updates).
- Current Flaw: Sends all interactive elements and full history, leading to token bloat.

1.4. Content Script (`content.js`)

Role: The "Hands and Eyes".
Modules:
- PageScanner: Traverses DOM to find interactive elements and creates a flat map.
- SmartLocator: Finds elements based on ID, selector, or fuzzy scoring (Text, Role, etc.).
- InteractionEngine: Performs actions (Click, Type, Scroll) with event simulation for modern frameworks (React/Vue).
- VisualCheck: Determines visibility and occlusion.

1.5. UI (`sidepanel.html` / `lib/ui.js`)

Role: User Interface.
Responsibilities:
- Displays chat, logs, and task status.
- Captures user instructions.

2. Data Flow

User Input: User types a command in Side Panel.
Initialization: agent.js starts, fetches activeTab.
Observation: Agent requests simplify (or mapCompact) from content.js.
Planning:
- planner.js builds a prompt with:
  - Task: User instruction.
  - State: URL, Title.
  - Elements: List of interactive elements (compressed).
  - History: Recent actions and results.
- Sends to Gemini (LLM).
Decision: LLM returns a Plan (Thought + Tool Call).
Execution:
- agent.js sends Tool Call to content.js.
- SmartLocator resolves the target element.
- InteractionEngine performs the action.
Feedback: Result (Success/Error) is sent back to agent.js, recorded in History, and the loop repeats.

3. Improvement Plan (Target Architecture)

3.1. Token Efficiency Strategy

Problem: Sending 2000+ elements and full history exhausts context windows and increases cost/latency. Solution:

Dynamic Element Filtering: Only send elements relevant to the current viewport or high-level semantic containers if possible. (For now: Aggressive compression of element attributes).
History Pruning: Implement a "Sliding Window" for the prompt. Keep the last N steps detailed, and summarize older steps.
Prompt Optimization: Remove redundant instructions in the system prompt.

3.2. Robustness (Zero Point Failure)

Problem: "Element not found" due to dynamic IDs or DOM updates between Plan and Execute. Solution:

Signature-Based Re-binding: The Planner will return a "Signature" (Text, Role, Context) alongside the ID.
SmartLocator Upgrade: If ID lookup fails, SmartLocator will use the Signature to find the best matching element in the current DOM (Semantic Matching).
Self-Correction: If an action fails, the Agent will perform a "Refind" step (refresh DOM -> retry) before giving up.

3.3. Long-Running Automations

Problem: 50-step hard limit and memory volatility. Solution:

Pagination: Reset the step counter if progress is being made.
State Persistence: Periodically save taskContext and actionHistory to chrome.storage.local to allow recovery after browser restarts or crashes.
Async Coordination: Ensure background.js acts as the stable anchor.

4. Tech Stack

Runtime: Chrome Extension V3 (Service Worker).
AI: Google Gemini (via API).
Language: Vanilla JavaScript (ES Modules).
Styling: CSS Variables (Native look).

5. Execution Roadmap

Milestone 1: Token & Prompt Optimization

Task 1.1: Optimize compressElements in planner.js to reduce JSON size.
Task 1.2: Implement "History Summarization" in agent.js (keep last 10 detailed, summarize rest).
Task 1.3: Refine System Prompt to be more concise.

Milestone 2: Robustness (Zero Point Failure)

Task 2.1: Enhance SmartLocator.find in content.js to support robust signature matching (already partially present, needs tuning).
Task 2.2: Update planner.js to always generate robust element signatures.
Task 2.3: Implement "Stale Element Recovery" in agent.js (auto-refresh map if element missing).

Milestone 3: Long-Running Support

Task 3.1: Remove/Soft-limit the 50-step cap in agent.js.
Task 3.2: Implement State Persistence in background.js.

Milestone 4: Agent Coordination

Task 4.1: Ensure single-agent execution via background.js locking.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Heybro Architecture & Improvement Plan

Executive Summary

1. System Components

1.1. Background Service (`background.js`)

1.2. Agent Core (`lib/agent.js`)

1.3. Planner (`lib/planner.js`)

1.4. Content Script (`content.js`)

1.5. UI (`sidepanel.html` / `lib/ui.js`)

2. Data Flow

3. Improvement Plan (Target Architecture)

3.1. Token Efficiency Strategy

3.2. Robustness (Zero Point Failure)

3.3. Long-Running Automations

4. Tech Stack

5. Execution Roadmap

Milestone 1: Token & Prompt Optimization

Milestone 2: Robustness (Zero Point Failure)

Milestone 3: Long-Running Support

Milestone 4: Agent Coordination

FilesExpand file tree

architecture.md

Latest commit

History

architecture.md

File metadata and controls

Heybro Architecture & Improvement Plan

Executive Summary

1. System Components

1.1. Background Service (background.js)

1.2. Agent Core (lib/agent.js)

1.3. Planner (lib/planner.js)

1.4. Content Script (content.js)

1.5. UI (sidepanel.html / lib/ui.js)

2. Data Flow

3. Improvement Plan (Target Architecture)

3.1. Token Efficiency Strategy

3.2. Robustness (Zero Point Failure)

3.3. Long-Running Automations

4. Tech Stack

5. Execution Roadmap

Milestone 1: Token & Prompt Optimization

Milestone 2: Robustness (Zero Point Failure)

Milestone 3: Long-Running Support

Milestone 4: Agent Coordination

1.1. Background Service (`background.js`)

1.2. Agent Core (`lib/agent.js`)

1.3. Planner (`lib/planner.js`)

1.4. Content Script (`content.js`)

1.5. UI (`sidepanel.html` / `lib/ui.js`)