For years, the dream of a truly autonomous AI agent—one that could not only answer questions but actively do things on our behalf—has been on the horizon. With BeachdAI (“beach day” with AI at its core), that horizon is here. The core promise is in the name: an AI so capable that it can handle your complex digital life entirely on its own, while you enjoy a day at the beach…and maybe glance at your Pixel watch.
Yes, this guide is a celebration of a monumental achievement. Through our collaboration, we have done something extraordinary: we have built what is, to our knowledge, the world's first fully integrated, stable, and autonomous multi-agent framework that operates seamlessly across the three primary form factors of modern computing: desktop, mobile, and wearable. This is not a theoretical concept. It is a working reality.
Our journey began with ambitious multi-agent systems like DeliberAIte, a powerful standalone tool that we built to orchestrate debates between AI experts to solve complex, abstract problems. That project was a resounding success, but it was only the beginning. We took that powerful reasoning engine and fused it into the very core of BeachdAI, creating a hybrid system that can both perform concrete web tasks and engage in deep strategic thought.
BeachdAI is not a chatbot in a window. It is a team of specialized AI agents living in your browser, ready to act on your command. And, it’s neither a cloud-based service owned by a large corporation, nor a sketchy plug-in that sends your data through its own servers. It’s a local-first, privacy-centric browser extension that puts you in control. And now, that control extends from your desktop to the phone in your pocket and the watch on your wrist.
The development of BeachdAI has been a journey of relentless innovation. We should take immense pride in the fact that we have built a system that is, without exaggeration, on the absolute leading edge of personal AI.
- A World First in Cross-Platform Autonomy: BeachdAI is one of the first known autonomous browser agents to achieve stable operation as an Edge extension on a desktop, on Android, and with real-time communication to a Wear OS watch. This creates a single, unified agentic experience across a user's entire digital ecosystem. Edge is perhaps the only browser from a major manufacturer to currently allow user-developed extensions, and we overcame quirks and limited documentation to complete a working version.
- Hybrid Agent Architecture: The system seamlessly blends two distinct modes of operation: a Standard Flow for direct, task-oriented browser automation and the integrated DeliberAIte Flow, a multi-agent debate framework for deep, strategic problem-solving.
- The "Observer Effect" Stability Fix: We achieved remarkable stability in our most complex agentic flow (DeliberAIte) through a fascinating, emergent property of our own system—the very act of adding detailed logging for debugging introduced micro-delays that resolved a critical race condition, a real-world example of the observer effect in software.
- Mastery of Hostile Environments: We successfully navigated the notoriously difficult and brittle environments of mobile browser extensions and cross-device communication, solving critical bugs related to service worker lifecycle, component rendering, and data synchronization that stop most projects in their tracks.
This guide will walk you through the core concepts of the BeachdAI architecture, explore the code that powers its most innovative features across all three form factors, and provide a look at the exciting future of this groundbreaking project.
To understand BeachdAI, it's best to think of it as three distinct but deeply interconnected parts: The Brain, The Bridge, and The Glance.
- The Brain (The Browser Extension): The powerful multi-agent system that lives in the browser's service worker. This is where all planning, reasoning, and web execution happens.
- The Bridge (The Android Companion App): A specialized, native Android app whose sole purpose is to act as a secure and stable communication relay between the sandboxed browser environment and the rest of the Android operating system, including the watch.
- The Glance (The Wear OS App): A native watch app that provides an "at-a-glance" view of the agent's status and final results, delivering the promise of ambient, peripheral computing.
@startuml
skinparam rectangle {
BorderColor black
BackgroundColor #whitesmoke
}
skinparam component {
BorderColor black
BackgroundColor #lightgrey
}
rectangle "THE BRAIN\n(Browser Extension)" as Brain {
component "[Multi-Agent System]\n- Orchestrator & Agents\n- DeliberAIte Flow" as MAS
component "[React UI (Sidebar.tsx)]\n- Desktop & Mobile Views" as UI
component "[Content Script (content.ts)]\n- Interacts with web pages" as ContentScript
}
rectangle "THE BRIDGE\n(Android Companion App)" as Bridge {
component "[Local Web Server]\n- Receives agent status" as WebServer
component "[Foreground Service]\n- Stays alive in background\n- Relays data to watch" as Service
}
rectangle "THE GLANCE\n(Wear OS App)" as Glance {
component "[UI - Jetpack Compose]" as GlanceUI
component "[DataClient Listener]\n- Receives state\n- Updates UI\n- Vibrates on update" as DataListener
}
Brain -right-> Bridge : "HTTP POST"
Bridge -right-> Glance : "Data Sync"
@enduml
Our choice of technologies was crucial to our success and was made to prioritize stability, performance, and modernity.
- TypeScript (The Brain): For a system this complex, JavaScript's dynamic nature is a liability. TypeScript's strong typing is like a grammar checker for code, catching thousands of potential errors before they ever happen. It is the bedrock of the extension's reliability.
- React (The Brain's UI): The user interface in the extension's sidebar is built with React. Its component-based model and efficient state management allowed us to create a highly responsive and adaptive UI that works beautifully on both widescreen desktops and the narrow constraints of a mobile phone screen (Sidebar.tsx).
- Kotlin (The Bridge & The Glance): As the official, Google-endorsed language for modern Android development, Kotlin was the only logical choice. Its conciseness, safety features, and excellent performance are ideal for building robust services (WearableBridgeService.kt) and fluid user interfaces.
- Jetpack Compose (The Glance's UI): We used Jetpack Compose, Android's modern declarative UI toolkit, to build the watch app's interface. This allowed us to create a clean, elegant, and performant UI with far less code than traditional methods (MainActivity.kt on Wear OS).
BeachdAI is not a single AI; it's a coordinated team of specialized agents managed by a central Orchestrator. This multi-agent approach is what allows the system to handle such a wide variety of tasks with nuance and intelligence.
- Triage Agent: The first point of contact. It analyzes your goal and decides which of the four autonomous flows is most appropriate.
- Example: Sees "solve world hunger" and routes to the DeliberAIte Flow; sees "book a flight" and routes to the Standard Flow.
- Researcher: The fact-finder. This agent uses its built-in search tools to gather any necessary real-time information. Critically, it also decides if a browser is needed at all.
- Example: For "what's the weather?", it finds the answer and determines requires_browser: false. For "buy a stock if it drops below $50", it finds the current price and determines requires_browser: true.
- Planner: The strategist. It takes the user's goal and the Researcher's findings and creates a high-level, step-by-step plan. It is also responsible for adapting the plan when things go wrong.
- Example: Given a goal to "build a snake game," it creates a plan: ["Go to Github.com", "Create a new repository named 'snake-game'", "Go to Jules.Google.com", "Tell Jules to build the game in the new repo"]
- Manager: The hands-on operator. This is the core autonomous executor that navigates the web, observes the state of the page, and decides on the next micro-action. Its prime directive is to "never get stuck."
- Example: Following the plan, it sees a button with the text "Create repository" and decides on the action: CLICK with the appropriate selector.
- Presenter: The communications officer. It synthesizes all the work done into a clean, human-readable final report, formatted in rich HTML.
- Example: Takes a raw scratchpad of flight search data and presents it as: "I found a non-stop flight on Delta for $250, departing at 8:00 AM."
- Verifier: The security guard. Before the Manager navigates to any new URL, this agent inspects it for signs of phishing or malicious content.
- Example: Sees the URL http://g00gle.com and flags it as unsafe, stopping the Manager from proceeding.
- Teacher: When the user decides to “teach me something”, this agent observes the raw log of clicks and types and synthesizes it into a human-readable summary of the new skill.
- Example: Watches a user log into a site and summarizes it as: "I need to type the username into the 'email' field, type the password into the 'password' field, and then click the 'Log In' button."
- DeliberAIte Orchestrator: When a problem is too theoretical for a Standard Flow, this special agent takes over. This specialized agent follows a pattern similar to AutoGen: designing a custom team of expert AI personas and guiding them through a structured debate to produce a comprehensive strategic plan.
The true genius of BeachdAI lies in its ability to dynamically choose the right approach for any given task. We can think of these as four distinct "flows" of autonomy.
Not every task requires opening a web page. For innate knowledge or simple real-time questions, the system is smart enough to answer directly.
Example Goal: "What is the first law of thermodynamics?"
- The Triage Agent routes this to the Standard Flow.
- The Researcher is activated.
Code Deep Dive: getResearcherFacts in background.ts
The prompt gives the Researcher a critical choice.
// A snippet from the Researcher's prompt
4. **CRITICAL: Decide on Next Steps:**
* If the user's goal is **purely informational** (e.g., "what's the weather?", "who won the game?") and you have found a COMPLETE answer, set `requires_browser` to `false`.
* If the user's goal involves an **action** (e.g., "get a lyft", "buy a ticket", "post a tweet")... you MUST set `requires_browser` to `true`.
- The Researcher uses its internal knowledge (powered by Gemini) and determines requires_browser: false, providing the answer in its facts array.
- The Orchestrator sees this and sends the facts directly to the Presenter, which formats the final answer for the user without ever launching a browser tab. This is the most efficient path for simple queries.
This is the system's superpower, reserved for the most complex, abstract problems.
Example Goal: "Suggest solutions for a fair rebuild of Gaza that ensures all citizens of the region are treated equitably."
- Triage recognizes the complexity and routes this to the Deliberate_Flow.
- The DeliberAIte Orchestrator takes over. It uses an LLM to perform "meta-cognition"—designing the very team that will solve the problem. It might generate personas like a Professor of International Relations, a Post-Conflict Reconstruction Specialist, and a Human Rights Advocate.
- These agents then proceed through a structured debate: proposing, reviewing, and revising solutions.
- Finally, the Presenter synthesizes their collective wisdom into a comprehensive strategic document, which is presented to the user with a special call to action: "Attempt Autonomous Execution." This critical human-in-the-loop step ensures that the agent doesn't act on a high-level strategy without your explicit approval.
Code Deep Dive: The "Attempt Strategy" button in Sidebar.tsx
This UI component is what enables the crucial hand-off from strategic planning to autonomous action.
// A snippet from the renderTaskView function in Sidebar.tsx
{isDeliberatePlanPresented ? (
<>
<button onClick={handleAttemptStrategy} ...>
Attempt Autonomous Execution
</button>
<button onClick={handleNewTask} ...>
New Task
</button>
</>
) : ... }
Code Deep Dive: The Kotlin Bridge in WearableBridgeService.kt
This bridge is part of a BeachdAI companion app installed on an Android phone that’s also running the BeachdAI extension in Edge. Specifically, the companion contains a native Android service that runs a tiny local web server, listening for status updates from the browser extension. This is a brilliant and robust way to escape the browser sandbox.
// Snippet from the WebServer inner class
private inner class WebServer : NanoHTTPD(8080) {
override fun serve(session: IHTTPSession): Response {
if (session.method == Method.POST && session.uri == "/status") {
try {
// ... logic to parse the JSON body from the extension ...
val jsonBody = files["postData"]
if (jsonBody != null) {
serviceScope.launch {
// Truncate message if needed and sync to the watch
val finalMessage = processAndTruncateMessageIfNeeded(jsonBody)
syncStateToWatch(finalMessage)
}
return newFixedLengthResponse(Response.Status.OK, ...)
}
} catch (e: Exception) { ... }
}
return newFixedLengthResponse(Response.Status.NOT_FOUND, ...)
}
}
Code Deep Dive: The Kotlin Glance in MainActivity.kt (Wear OS)
The BeachdAI watch app uses the modern DataClient API to listen for these updates. When a new state arrives, it updates the UI, providing a seamless, real-time "glance" into the agent's work.
// Snippet from the Wear OS MainActivity
private val dataListener = DataClient.OnDataChangedListener { dataEvents ->
dataEvents.forEach { event ->
if (event.type == DataEvent.TYPE_CHANGED) {
val dataItem = event.dataItem
if (dataItem.uri.path == "/beachdai_task_update") {
val dataMap = DataMapItem.fromDataItem(dataItem).dataMap
val jsonState = dataMap.getString("state_json")
if (jsonState != null) {
// Parse the new state from JSON
val newState = gson.fromJson(jsonState, AgentTaskState::class.java)
// Trigger a vibration and update the UI
if (newState.status != taskState.value.status) {
triggerVibration()
}
taskState.value = newState
}
}
}
}
}
This is the core flow for concrete, actionable tasks that require interacting with the web.
Example Goal: "Plan a trip to Savannah for President’s Day week with a hotel and an OpenTable reservation."
- Triage selects the Standard Flow.
- Researcher determines this is an action that requires a browser.
- Planner creates a concrete, step-by-step plan. Its prompt is sophisticated enough to understand and attempt multiple goals in a long-running autonomous flow.
- The Manager then executes this plan step-by-step. It receives a snapshot of the current web page from content.ts and must decide on the single next best action.
Code Deep Dive: The Manager's Imperative in background.ts
The Manager's prompt is a masterclass in instruction engineering, filled with strict rules to guide its behavior.
// A snippet from the Manager's prompt
**Your Imperative Task:**
Decide on the single next action to advance the current step.
**Response Rules & Best Practices:**
1. **THINK FIRST:** You MUST provide a 'thought' explaining your reasoning for every single action. No exceptions.
2. **EXECUTE THE PLAN:** Your primary goal is to execute the current step of the plan.
...
8. **USE VALID SELECTORS:** Your 'selector' MUST be a valid CSS selector copied directly from the 'interactiveElements' in the snapshot... Do NOT invent selectors.
...
10. **NEVER GET STUCK:** If you are truly stuck (e.g., repeated failures, page not loading correctly), use 'HELP_REPLAN'. This is your primary escape hatch.
- This continuous loop of "Observe, Think, Act" continues until the plan is complete. And, if the system can’t fully complete the goal due to missing information (like the desired reservation time), it doesn’t just fail…
True autonomy isn't just about success; it's about resiliently handling what would cause some systems to quit, by acknowledging partial success and collaborating with the user.
The Secure Vault & The Bridge to the Watch
To perform useful tasks, an agent must be able to act on your behalf. The Secure Vault is our privacy-first solution. When you teach BeachdAI a task involving a password, it's encrypted with your master passphrase and stored securely on your local machine. This enables "beach day" autonomy. The state of this autonomy is then relayed across the entire framework.
Teaching the Agent: Human-in-the-Loop Collaboration
Although we have built and trained the system to never get stuck, it can decide that it’s accomplished only partial success. In those cases, the user is encouraged to "Take Over." The content.ts script immediately begins recording your actions. When you're done, you click "Go Autonomous." The Teacher agent then analyzes the raw log of your actions and summarizes the new skill, saving it to memory. This is not just debugging; it's creating a library of reusable skills tailored to your specific needs.
Building on this same framework, a “teach me something” link is available when tasks are not in progress, so the user can teach the system something from scratch. And finally, recently executed tasks are available at the user’s fingertips as a list of clickable links, in case it’s useful to rerun the same request at another time.
BeachdAI is more than a successful project; it is a powerful statement about the future of personal computing. It proves that world-class, autonomous AI is not something that has to live exclusively in the data centers of large corporations. It can live right here, in your browser, on your phone, and on your wrist—working for you.
The successful creation of this stable, cross-platform framework is a paradigm shift. It opens the door to a world where your personal agent can manage tasks for you anytime, anywhere, from any device.
The journey has been remarkable, filled with complex challenges that we overcame with creative engineering and relentless persistence. The destination is nothing short of revolutionary. We should all be incredibly proud of what we have built together.
To run the complete BeachdAI experience, you will need to set up all three components.
- The Brain (Browser Extension) Prerequisites:
- Node.js and npm
- Google Chrome or Microsoft Edge (desktop) Setup:
cd path/to/beachdai-extension
npm install
npm run build
This will create a dist folder. To install the extension:
- Open Edge/Chrome and navigate to edge://extensions or chrome://extensions.
- Enable "Developer mode".
- Click "Load unpacked" and select the dist folder.
- The Bridge (Android Companion App) Prerequisites:
- Android Studio
- An Android phone with Developer Mode and USB Debugging enabled. Setup:
- Open the companion app's project folder (/src/android/companion-app) in Android Studio.
- Allow Gradle to sync and download the necessary dependencies.
- Connect your Android phone to your computer via USB.
- Select your phone as the target device in Android Studio.
- Click the "Run" button to build and install the app on your phone. The app will start a foreground service to listen for the extension.
- The Glance (Wear OS App) Prerequisites:
- Android Studio
- A Wear OS device (emulator or physical watch) with Developer Mode and Debugging enabled. Setup:
- Open the watch app's project folder (/src/android/watch-app) in Android Studio.
- Allow Gradle to sync.
- In the Device Manager, ensure your Wear OS emulator is paired with the phone emulator/device running the Bridge app.
- Select your watch as the target device in Android Studio.
- Click "Run" to build and install the app on your watch. How to Contribute We welcome contributions from the community! Whether it's fixing bugs, improving agent prompts, or proposing new features, your input is valuable.
- Fork the repository.
- Create a new branch for your feature (git checkout -b feature/AmazingNewThing).
- Commit your changes (git commit -m 'Add some AmazingNewThing').
- Push to the branch (git push origin feature/AmazingNewThing).
- Open a Pull Request. Please write clear, concise commit messages and provide a detailed description in your pull request. License BeachdAI is licensed under the MIT License. See the LICENSE file for details.