VibeCat

Your Proactive Desktop Companion — an AI that sees your screen, suggests before you ask, and acts with your permission.

▶ Watch the 4-minute demo video | ⬇ Download DMG

The Problem

Every AI coding tool today is reactive — it sits quietly until you type a command, paste an error, or open a chat window. You break your flow to talk to the AI, and the AI breaks your flow to talk back.

"Every AI tool waits for your command. But what if one didn't?"

The context switch is the real cost. You're deep in a flow state, and suddenly you need to look something up, fix a typo in a terminal command, or find that one Stack Overflow answer. Traditional AI assistants can't help until you stop what you're doing and ask — by which point you've already lost minutes of mental re-entry time.

The Flip

"Meet VibeCat — a desktop companion that suggests before you ask."

VibeCat is a native macOS desktop companion that watches your screen, understands your context, and proactively suggests actions before you ask. Instead of waiting for commands, it watches your screen, recognizes opportunities, and speaks up first: "I notice a null check missing — want me to add it?" — then waits for your approval before touching anything. Like a senior colleague sitting next to you.

Core Flow: OBSERVE → SUGGEST → WAIT → ACT → FEEDBACK

OBSERVE — VibeCat continuously watches your screen via Gemini Live API (screenshots + accessibility tree)
SUGGEST — When it spots an opportunity, it speaks up: "I notice a bug in that function — want me to fix it?"
WAIT — It always waits for your confirmation before acting
ACT — Upon approval, it executes precise actions across your desktop apps
FEEDBACK — After acting, it verifies the result and reports back: "Done! The fix is applied. Looks good?"

What Makes VibeCat Different

Feature	Traditional UI Agents	VibeCat
Interaction model	Reactive — waits for commands	Proactive — suggests before you ask
Interface	Text-based CLI or scripted	Voice-first — natural conversation via Gemini Live API
Platform	Python + cross-platform wrappers	Native macOS Swift — first-class citizen
Architecture	Local-only execution	Cloud-assisted reasoning — Cloud Run + ADK
Error handling	Fails and reports	Self-healing — retries with alternative strategies
Verification	Basic state check	Triple-source grounding — Accessibility + CDP + Vision
Transparency	Silent processing	Real-time feedback — narrates every step

Demo Scenarios

VibeCat is optimized for three gold-tier surfaces:

Act 1. Proactive Music Suggestion (YouTube Music)

VibeCat greets you on session start: "Hey! How about some relaxing background music while you work?" → User: "Yes, play it." → VibeCat opens YouTube Music, searches for relaxing coding music, uses vision-based mouse control to find and click the play button → "Music is playing! Enjoy your coding session."

Act 2. Proactive Code Enhancement (Antigravity IDE)

VibeCat reads your code and spots an opportunity: "I can enhance the comments for this code — want me to?" → User: "Sure, go ahead" → VibeCat clicks Gemini Chat in the IDE, types the enhancement request, and Gemini improves documentation → "Done! The comments are enhanced with detailed documentation."

Act 3. Terminal Automation (Terminal)

VibeCat switches to Terminal and suggests a lint check: "Want me to run go vet to check for lint issues?" → User: "Do it" → VibeCat types go vet ./... and executes it → "Lint check passed! No issues found."

Architecture

Version Matrix

Component	In Repo	Latest Stable	Official Source	Notes
Go	1.26.1	1.26.1	`https://go.dev/dl/`	backend modules + e2e aligned
Swift tools-version	6.2	6.2.4	`https://swift.org/install/`	package format stays `6.2`; locally verified on Swift 6.2.4
Google GenAI SDK	v1.49.0	v1.49.0	`https://pkg.go.dev/google.golang.org/genai`	Gemini Live / vision / FC client
Google ADK for Go	v0.6.0	v0.6.0	`https://pkg.go.dev/google.golang.org/adk`	orchestrator framework
chromedp	v0.14.2	v0.14.2	`https://pkg.go.dev/github.com/chromedp/chromedp`	browser control via CDP
Docker Go builder	`golang:1.26.1-alpine`	Go 1.26.1 line	`https://go.dev/dl/`	both backend images
Distroless runtime	`gcr.io/distroless/static-debian12`	rolling Debian 12	`https://github.com/GoogleContainerTools/distroless`	runtime-only image

System Overview

flowchart LR
    subgraph Client["macOS Client (Swift 6.2)"]
        UI["Cat UI + Chat"]
        CAP["ScreenCaptureService"]
        AX["AccessibilityNavigator\n80+ key codes"]
        WORKER["NavigatorActionWorker"]
        OVR["NavigatorOverlayPanel"]
        GWCLIENT["GatewayClient\n/ws/live"]
    end

    subgraph Gateway["Realtime Gateway (Go 1.26.1 · Cloud Run)"]
        HTTP["/health /readyz /api/v1/auth/* /ws/live"]
        LIVE["Gemini Live session manager"]
        PROMPT["Proactive Companion prompt"]
        FC["5 Function Calling tools"]
        PFC["pendingFC queue\none-step-at-a-time"]
        VERIFY["Vision verification"]
        HEAL["Self-healing\nmax 2 retries"]
        CDP["chromedp Chrome controller"]
        STATE["ActionStateStore\nin-memory + Firestore"]
        METRICS["OTEL + replay metrics"]
    end

    subgraph ADK["ADK Orchestrator (Go 1.26.1 · Cloud Run)"]
        ROUTER["/navigator/escalate\n/navigator/background\n/memory/* /search /tool /analyze"]
        PROC["Navigator Processor"]
        SEARCH["Search Agent"]
        MEMORY["Memory Agent"]
        REPLAY["Replay + Firestore store"]
        GENAI["GenAI / ADK runtime"]
    end

    subgraph GCP["Google Cloud Platform"]
        LIVEAPI["Gemini Live API"]
        FIRE["Firestore"]
        OTEL["Cloud Logging / Trace / Monitoring"]
    end

    UI --> GWCLIENT
    CAP --> GWCLIENT
    GWCLIENT --> HTTP
    HTTP --> LIVE
    LIVE --> PROMPT
    LIVE --> LIVEAPI
    LIVEAPI --> FC
    FC --> PFC
    PFC --> WORKER
    WORKER --> AX
    PFC --> CDP
    HEAL --> PFC
    VERIFY --> ROUTER
    ROUTER --> PROC
    PROC --> SEARCH
    PROC --> MEMORY
    PROC --> REPLAY
    PROC --> GENAI
    STATE --> FIRE
    REPLAY --> FIRE
    METRICS --> OTEL
    GENAI --> OTEL
    OVR --> UI

What Each Service Owns

Service	Owns	Does Not Own
macOS Client	UI, microphone, screen capture, AX execution, local keyboard events, overlay feedback	Gemini credentials, tool planning, server-side memory
Realtime Gateway	live multimodal session, prompt, FC tool handling, task queueing, retries, verification orchestration, state leases	long-term memory strategy, desktop rendering, raw local OS control
ADK Orchestrator	screenshot reasoning, target escalation, visible-text extraction, async summary/research/memory/replay	direct desktop execution, step-by-step UI clicking

Component Overview

Layer	Technology	Role
macOS Client	Swift 6.2 / AppKit	Screen capture, AX executor (80+ key codes), overlay with grounding badges, voice transport, local action worker
Realtime Gateway	Go 1.26.1 / Cloud Run	WebSocket handler, Proactive Companion prompt, 5 FC tool handlers, pendingFC sequential execution, self-healing, vision verification, transparent feedback pipeline
Gemini Live API	Google GenAI SDK v1.49.0	Real-time multimodal conversation (voice + vision), function calling, session resumption, VAD
ADK Orchestrator	Go 1.26.1 / ADK v0.6.0 / Cloud Run	Confidence escalation, screenshot interpretation, visible-text extraction, async summary/memory/replay/research enrichment
Chrome Controller	chromedp v0.14.2	Click, Type, Navigate, Scroll, Screenshot, Close — lazy connect with graceful fallback

Client UI Panel Layout

All UI panels follow the cat sprite in real-time. The cat roams freely across all monitors inside a full-screen borderless CatPanel.

                 ┌─────────────────┐
                 │ CompanionChat   │  ← cat +70pt (on show)
                 │ (360×360, key)  │     interactive, movable
                 └─────────────────┘
                 ┌───────────┐
                 │  Speech   │  ← above cat (flips below if off-screen)
                 └─────┬─────┘
                       🐱          ← CatPanel sprite (follows mouse)
                 ┌─────┴─────┐                    ┌──────────────┐
                 │  Status   │  ← below cat        │ DecisionHUD  │ ← cat +50pt right
                 └───────────┘                    │ (280×160)    │    debug info
              ┌────────────────────┐              └──────────────┘
              │ NavigatorOverlay   │  ← cat -90pt (below status area)
              │ (260×52, grounding)│     follows cat in real-time
              └────────────────────┘

        ╔═══════════════╗
        ║TargetHighlight║  ← overlays AX target element (independent of cat)
        ╚═══════════════╝

Panel	Anchor	Follows Cat	Interactive
CatPanel	Full screen, sprite moves inside	Is the cat	No (click-through)
Speech Bubble	Above cat, flips below if clipped	Real-time	No
Status Bubble	Below cat, below window badge	Real-time	No
NavigatorOverlay	90pt below cat center	Real-time	No
DecisionOverlayHUD	50pt right of cat center	Real-time	No
CompanionChatPanel	70pt above cat on show()	On show only	Yes (typing, movable)
TargetHighlightOverlay	On AX target element + 6pt padding	Follows target	No

Realtime Gateway Architecture

flowchart TB
    IN["/ws/live client session"] --> AUTH["JWT auth + registry"]
    AUTH --> LIVE["Gemini Live manager"]
    LIVE --> PROMPT["Proactive Companion system prompt"]
    LIVE --> TOOLS["navigatorToolDeclarations()\n5 tools"]
    TOOLS --> PLAN["intent/risk/planning"]
    PLAN --> PFC["pendingFC state\ncurrent step + remaining steps"]
    PFC --> EXEC["client action dispatch"]
    EXEC --> TF["processingState messages\n7 stages"]
    EXEC --> VERIFY["requestScreenCapture + ADK verify"]
    VERIFY --> HEAL["retry / alternative grounding"]
    HEAL --> PFC
    PFC --> COMPLETE["async background dispatch"]
    PLAN --> STORE["ActionStateStore lease"]
    COMPLETE --> MET["metrics + replay"]

ADK Orchestrator Architecture

The README previously compressed ADK into one box. This is the actual shape of the service.

flowchart TB
    HTTP["HTTP handlers"] --> ESC["POST /navigator/escalate"]
    HTTP --> BG["POST /navigator/background"]
    HTTP --> MEM1["POST /memory/session-summary"]
    HTTP --> MEM2["POST /memory/context"]
    HTTP --> SEARCHR["POST /search"]
    HTTP --> TOOLR["POST /tool"]
    HTTP --> ANALYZE["POST /analyze"]

    ESC --> NPROC["Navigator Processor.ResolveTarget"]
    BG --> BPROC["Navigator Processor.ProcessBackground"]
    NPROC --> SHOT["base64 screenshot decode"]
    NPROC --> AXCTX["AX + focus + window context prompt"]
    NPROC --> VISION["GenAI vision model"]
    NPROC --> FALLBACK["heuristicEscalation fallback"]

    BPROC --> SUMMARY["summary generation"]
    BPROC --> ENRICH["search enrichment when needed"]
    BPROC --> MEMWRITE["memory write"]
    BPROC --> REPLAY["NavigatorReplay persistence"]

    SEARCHR --> SEARCHAGENT["Search Agent"]
    TOOLR --> TOOLAGENT["Tool Agent"]
    MEM1 --> MEMAGENT["Memory Agent"]
    MEM2 --> MEMAGENT

    SUMMARY --> FIRE["Firestore"]
    ENRICH --> FIRE
    MEMWRITE --> FIRE
    REPLAY --> FIRE

Orchestrator Endpoint Contracts

Endpoint	Sync/Async	Input	Output	Purpose
`POST /navigator/escalate`	sync	command + screenshot + AX/focus context	target descriptor / resolved text / confidence	low-confidence target resolution and visible text extraction
`POST /navigator/background`	async-style post-task	task outcome + attempts + context hashes	summary / replay label / research summary / tags	off-hot-path summarization and replay labeling
`POST /memory/session-summary`	sync	history	stored summary	session memory write
`POST /memory/context`	sync	user id	memory context	memory recall
`POST /search`	sync	query	summary + sources	research enrichment
`POST /tool`	sync	tool query	tool-specific result	non-live tool path
`POST /analyze`	sync	image + context	multimodal analysis	retained general analysis path

Gateway Endpoint Contracts

Endpoint	Auth	Purpose
`/health`	no	liveness + connection count
`/readyz`	no	readiness
`/api/v1/auth/register`	no	issue JWT
`/api/v1/auth/refresh`	no	refresh JWT
`/ws/live`	yes	main client transport for Live PM + navigator runtime

State and Persistence

State	Where	Why
`ActionStateStore`	gateway memory + Firestore	reconnect-safe ownership, stale socket rejection, active task continuity
`pendingFC*`	in-memory live session state	sequential multi-step execution
`pendingVisionVerification`	in-memory live session state	hold screenshot verification intent between step and ADK response
`NavigatorReplay`	orchestrator Firestore store	post-task regression and evidence trail
ADK session/memory services	orchestrator in-memory + Firestore-backed app store	summaries and memory context

Observable Contracts

These are the runtime signals that make VibeCat debuggable instead of opaque:

processingState messages for every navigator phase
requestScreenCapture when post-action verification needs fresh evidence
time_to_first_action_ms metric for hot-path responsiveness
replay labels and research tags from /navigator/background
action-state lease persistence for reconnect behavior

Proactive Companion System Prompt

The gateway injects a Proactive Companion system prompt into every Gemini Live session. This prompt defines VibeCat's core identity — an attentive colleague, not a passive tool:

Proactive observation: notices errors, long work sessions, inefficient commands, missing code
Natural suggestion: "You've been coding for a while — want me to play some music?"
Confirmation gate: always waits for user approval before acting
Friendly feedback: "Done! How does that look?" after every action
Emotion tags: [happy], [thinking], [concerned], etc. for character expression
Language matching: responds in the user's language (Korean, English, Japanese)

Navigator Tools (Function Calling)

VibeCat registers 5 tools with Gemini via navigatorToolDeclarations() for precise desktop control:

Tool	Parameters	Purpose	Example
`navigate_text_entry`	`text`, `target`, `submit`	Type text into focused field	Search queries, code snippets, form input
`navigate_hotkey`	`keys[]`, `target`	Send keyboard shortcuts	`Cmd+S`, `Space` (YouTube play/pause), `Cmd+I` (IDE inline)
`navigate_focus_app`	`app`	Switch to a specific application	Open Chrome, switch to Terminal, focus Antigravity
`navigate_open_url`	`url`	Open a URL in the default browser	YouTube links, documentation pages
`navigate_type_and_submit`	`text`, `submit`	Type text and press Enter	Terminal commands (`ls -la`), search submissions

Each tool has a dedicated handler in the gateway: handleNavigateHotkeyToolCall, handleNavigateFocusAppToolCall, handleNavigateOpenURLToolCall, handleNavigateTypeAndSubmitToolCall.

pendingFC: Sequential Multi-Step Execution

Complex actions (e.g., "open YouTube and search for music") require multiple steps. The pendingFC mechanism ensures steps execute one at a time:

Gateway receives Gemini's function call → queues all steps in pendingFCSteps
Sends only the first step to the client
Client executes and confirms → gateway sends next step via advancePendingFC()
Repeat until all steps complete or self-healing exhausts retries
clearPendingFC() resets all state (8 fields + retry counter) on completion

This prevents race conditions in multi-step workflows and enables per-step retry/verification.

Self-Healing Navigation

When an action fails, VibeCat retries with alternative grounding strategies (MaxLocalRetries: 1 default, 2 for vision steps):

First attempt — Execute via primary method (AX tree or hotkey)
Retry — Try alternative grounding source (CDP for browser elements, different AX path, or vision-based coordinate targeting)
Vision steps — Get an extra retry (up to 2) for coordinate-based targeting via ADK screenshot analysis
Each retry — incrementStepRetry() tracks count; vision verification confirms success before proceeding
Exhausted — Graceful fallback to guided mode or human explanation

Vision Verification

After executing risky or complex actions, VibeCat verifies success through ADK screenshot analysis:

Gateway requests a screenshot from the client (requestScreenCapture)
Screenshot is sent to ADK Orchestrator (POST /navigator/escalate)
ADK's Gemini-powered vision agent analyzes the post-action state
Result feeds back into the step pipeline — success advances, failure triggers self-healing retry

Grounding Sources

VibeCat uses triple-source grounding to prevent blind clicking:

Source	Badge	Technology	Use Case
Accessibility	AX (blue)	macOS Accessibility API	Native UI element discovery and manipulation
Vision	Vision (purple)	Gemini/ADK screenshot analysis	Visual verification and coordinate targeting
Hotkey	Hotkey (gray)	CGEvent key injection (80+ codes)	App-specific shortcuts (YouTube, IDE, Terminal)
System	System (teal)	Open URL, Focus App, chromedp CDP	Browser navigation, app switching, CDP actions

Transparent Feedback Pipeline

VibeCat narrates every processing step in real-time via processingState messages — no silent processing:

Stage	English	Korean	When
`analyzing_command`	Analyzing command...	명령 분석 중...	FC received from Gemini
`planning_steps`	Planning steps...	실행 계획 중...	Multi-step plan created
`executing_step`	Executing action...	실행 중...	Step sent to client
`verifying_result`	Verifying result...	결과 확인 중...	Post-action screenshot check
`retrying_step`	Retrying with alternative...	재시도 중...	Self-healing retry triggered
`completing`	Completing task...	작업 완료 중...	Final step succeeded
`observing_screen`	Observing screen...	화면 관찰 중...	Proactive screen analysis

The client displays these as real-time status bubbles in the navigator overlay panel.

Execution Contract

The full execution pipeline:

VibeCat proactively observes the user's screen via Gemini Live (screenshots + AX context)
Gemini identifies opportunities and suggests an action via voice
User confirms ("yeah, go ahead") or declines
Gateway receives Gemini's function call (one of 5 tools)
Transparent feedback: processingState streams to client at each stage
pendingFC queues multi-step plans and sends one step at a time
macOS client executes the step via AX, CDP, or keyboard
Self-healing: on failure, retries up to 2 times with alternative grounding source
Vision verification: ADK analyzes post-action screenshot to confirm success
On success → next step (if any) or completion with voice feedback
On persistent failure → graceful fallback to guided mode or human explanation
Completed tasks enqueue async summary/replay/memory work off the hot path

Quick Start

Prerequisites

macOS 15+ (Sequoia)
Xcode 16+ (for local client builds)
Go 1.26.1+
A Google Cloud project with:
- Gemini API key
- Cloud Run enabled
- Firestore database
- Secret Manager configured

Environment Setup

# Gateway
cp backend/realtime-gateway/.env.example backend/realtime-gateway/.env
# Edit .env → set GEMINI_API_KEY, GCP_PROJECT_ID, GATEWAY_AUTH_SECRET

# Orchestrator
cp backend/adk-orchestrator/.env.example backend/adk-orchestrator/.env
# Edit .env → set GEMINI_API_KEY, GCP_PROJECT_ID

Build & Test

# Swift client
cd VibeCat
swift build
swift test          # 131 tests

# Go gateway
cd backend/realtime-gateway
go build ./...
go test ./...       # all packages pass
go vet ./...

# Go orchestrator
cd backend/adk-orchestrator
go build ./...
go test ./...

Deploy to Cloud Run

# Build images, deploy orchestrator first, then deploy gateway
./infra/deploy.sh

Runtime Permissions

VibeCat requires three macOS permissions:

Permission	Required For
Screen Recording	Capturing screen content for Gemini vision analysis
Accessibility	Reading UI elements and executing navigation actions
Microphone	Voice input for Gemini Live API conversation

Project Structure

vibeCat/
├── VibeCat/                          # Swift package: Core + macOS app + tests
│   ├── Sources/Core/                 # UI-free models, localization, parsers
│   ├── Sources/VibeCat/              # AppKit app, AX navigator, overlay UI
│   └── Tests/VibeCatTests/           # 131 tests (20 test files)
├── backend/realtime-gateway/         # Go: WebSocket gateway, FC tools, self-healing
│   └── internal/
│       ├── ws/                       # Handler, navigator, metrics
│       ├── live/                     # Gemini Live session management
│       └── cdp/                      # Chrome DevTools Protocol controller
├── backend/adk-orchestrator/         # Go: ADK graph, escalation, memory/replay
├── tests/e2e/                        # Deployed smoke and live-path tests
├── infra/                            # GCP bootstrap, deploy scripts, observability
├── docs/                             # Architecture, status, evidence, research
└── Assets/                           # Sprites, tray icons, audio samples

Deployment

Service	Region	Technology	URL
Realtime Gateway	asia-northeast3	Go / Cloud Run	`realtime-gateway-163070481841.asia-northeast3.run.app`
ADK Orchestrator	asia-northeast3	Go / Cloud Run	`adk-orchestrator-163070481841.asia-northeast3.run.app`
Firestore	`(default)`	Native database	—

GCP Project: vibecat-489105 · Infrastructure: Cloud Run, Firestore, Secret Manager, Cloud Logging, Cloud Trace, Cloud Monitoring

Gold-Tier Surfaces

Submission-critical reliability is concentrated on three surfaces:

Surface	Capabilities	Key Shortcuts
Antigravity IDE	Code editing, inline fixes, symbol navigation	`Cmd+P` (file picker), `Cmd+Shift+O` (symbols), `Cmd+I` (inline prompt)
Terminal / iTerm2	Command execution, output interpretation	`Cmd+T` (new tab), type + Enter
Chrome	URL navigation, YouTube playback, search, form filling	`Space` (play/pause), `F` (fullscreen), `Shift+N` (next)

Safety Model

VibeCat uses safe-immediate execution with mandatory confirmation for proactive suggestions:

Proactive suggestions always wait for user confirmation before acting
Low-risk, well-targeted steps may execute immediately after confirmation
Ambiguous intent never auto-executes
Low-confidence targets downgrade to clarification or guided mode

Immediate (low-risk): focus changes, page navigation, search entry, tab switching, hotkeys

Confirmation required: passwords/tokens, deploy/publish/send, destructive shell commands, git push, bulk code insertion

Technology Stack

Technology	Version	Role
Gemini Live API	via GenAI SDK v1.49.0	Real-time multimodal conversation (voice + vision + FC)
Gemini Function Calling	5 tools registered	Structured tool invocation for desktop actions
ADK (Agent Development Kit)	Go / Cloud Run	Confidence escalation, vision verification, memory/replay
Google Cloud Run	asia-northeast3	Serverless backend hosting (gateway + orchestrator)
chromedp	v0.14.2	Go-native Chrome DevTools Protocol client
macOS Accessibility API	AppKit / AX	Native UI element discovery and action execution
Swift	6.2.4 verified (`swift-tools-version: 6.2`)	macOS client with 131 tests (20 test files)
Go	1.26.1	Backend services with full test/vet coverage
Firestore	Native mode	Action state persistence, session memory, replay fixtures
Cloud Logging / Trace	GCP	Navigator telemetry, self-healing metrics, processingState transitions

Observability

The navigator path emits proof-oriented telemetry:

Task acceptance, clarification prompts, replacement prompts
Time to first action (time_to_first_action_ms)
Guided-mode outcomes, verification failures
Input-field focus success/failure, wrong-target detections
Self-healing retry counts and outcomes
Vision verification results
processingState stage transitions

These feed Cloud Logging, Cloud Trace, and Cloud Monitoring. Completed task replays are persisted in Firestore for regression comparison.

Submission Assets

Asset	Location
Architecture diagram	This README (Mermaid) + `docs/FINAL_ARCHITECTURE.md`
Demo video script	`docs/challenge/video-1-demo/SCRIPT.md`
Devpost submission text	`docs/DEVPOST_SUBMISSION.md`
Current status	`docs/CURRENT_STATUS_20260316.md`
Deployment evidence	`docs/evidence/DEPLOYMENT_EVIDENCE.md`
GCP proof	`docs/deployment/PROOF_OF_GCP_DEPLOYMENT.md`
GDG profile	gdg.community.dev/u/m5n58q/
Auto deploy	`infra/deploy.sh`

Dev.to Blog Posts

18 published articles documenting the development journey:

submitting vibecat: what 3 weeks of building a desktop AI actually taught me — Submission day retrospective (series finale)
when your agent fails, does it just... stop? — Self-healing deep dive
the moment vibecat stopped waiting and started suggesting — Proactive Companion pivot
Full series: dev.to/combba

License

This project is submitted to the Gemini Live Agent Challenge (2026).

Name		Name	Last commit message	Last commit date
Latest commit History 213 Commits
.github		.github
.playwright-cli		.playwright-cli
Assets		Assets
VibeCat		VibeCat
backend		backend
demo		demo
docs		docs
infra		infra
scripts		scripts
tests/e2e		tests/e2e
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

VibeCat

The Problem

The Flip

Core Flow: OBSERVE → SUGGEST → WAIT → ACT → FEEDBACK

What Makes VibeCat Different

Demo Scenarios

Act 1. Proactive Music Suggestion (YouTube Music)

Act 2. Proactive Code Enhancement (Antigravity IDE)

Act 3. Terminal Automation (Terminal)

Architecture

Version Matrix

System Overview

What Each Service Owns

Component Overview

Client UI Panel Layout

Realtime Gateway Architecture

ADK Orchestrator Architecture

Orchestrator Endpoint Contracts

Gateway Endpoint Contracts

State and Persistence

Observable Contracts

Proactive Companion System Prompt

Navigator Tools (Function Calling)

pendingFC: Sequential Multi-Step Execution

Self-Healing Navigation

Vision Verification

Grounding Sources

Transparent Feedback Pipeline

Execution Contract

Quick Start

Prerequisites

Environment Setup

Build & Test

Deploy to Cloud Run

Runtime Permissions

Project Structure

Deployment

Gold-Tier Surfaces

Safety Model

Technology Stack

Observability

Submission Assets

Dev.to Blog Posts

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages