Improve the Heybro Chrome extension architecture to handle long-running automations, optimize token usage, and ensure "Zero Point Failure" robustness.
Important
Breaking Change Potential: The "SmartLocator" changes might affect how elements are found. We will maintain backward compatibility with ID-based lookups but prioritize signatures when IDs fail.
- Optimize
compressElements: Reduce the JSON footprint of element lists by using shorter keys and removing redundant attributes. - Refine
buildSystemPrompt: Shorten the static instructions. - History Pruning: Update
callGeminito accept a summarized history or implement the summarization logic before calling it.
- History Management: Implement a sliding window for
buildHistoryForPrompt. Keep the last 10 actions in full detail, and summarize or drop older ones.
- Enhance
SmartLocator: Improve the scoring algorithm to better utilize the "Signature" (text, role, context) when the ID is invalid. - Add
rebindcapability: Allow the agent to re-scan and find the element if the initial lookup fails.
- Retry Logic: If a tool execution fails with "Element not found", trigger a
simplify(refresh) and retry the action using the signature before asking the Planner again.
- State Persistence: Save
taskContextandactionHistorytochrome.storage.localon every update. Load it on startup.
- Implement
isIgnoredUrlto detect GTM and other service domains. - Update
executeandprobeto filter out ignored frames and prioritize the main frame (frameId 0).
- Add a safety check in the main loop to detect if the agent is stuck on an ignored URL (like GTM) and provide a clear signal to the planner or auto-recover.
- (Optional) Add logic to
getPageStateto flag if the current context is a known service frame.
- Step Limit: Change the hard 50-step limit to a soft limit that prompts the user or auto-continues based on settings.
- We don't have a full test suite, so we will verify manually.
- Token Usage: Check the logs/console to see the size of the prompt being sent to Gemini.
- Robustness:
- Go to a dynamic site (e.g., YouTube, Twitter).
- Start a task.
- Manually modify the DOM (delete an ID) while the agent is "thinking".
- Verify the agent still finds the element using the signature.
- Long Run: Run a task that requires > 50 steps (e.g., "Scroll down 60 times").