AutoClawd — CLAUDE.md

AutoClawd is a macOS ambient AI agent. It runs as a floating pill widget and menu bar icon with an always-on microphone and a background pipeline that listens to conversations, transcribes them with local AI models, extracts tasks, and executes them autonomously via Claude Code — without the user ever typing a prompt.

Core Concept

Always-on mic → captures 30-second audio chunks continuously with live word-by-word streaming
Always-on screen → OCR + accessibility tree watching for context; detects app switches, URLs, text
Always-on intelligence → local Llama 3.2 cleans and analyzes every transcript
Zero-prompt execution → tasks are created and auto-run based on what was said/seen, not what was asked
FUCBC → Find Use-Case, Build Capability: watches workflows, builds executable SKILL.md automations
Agents → library of built capabilities, one click to run from the Agents panel
Capability toasts → glass notification in top-right when OCR detects an automatable workflow
World model → persistent per-project knowledge base built from every conversation

Build & Run

# Build (ad-hoc signed, no provisioning needed)
make

# Build + run immediately
make run

The Makefile copies the built bundle to build/AutoClawd.app. To install permanently: cp -r build/AutoClawd.app /Applications/.

The WhatsApp sidecar is a separate Node.js process:

cd WhatsAppSidecar && npm install && npm start

Architecture

Process Layout

AutoClawd.app (Swift/SwiftUI macOS app)
  +-- PillWindow            floating NSPanel widget (always on top)
  +-- MainPanelWindow       main dashboard (opens on pill tap or menu bar click)
  +-- ToastWindow           capability suggestion toasts (top-right glass card)
  +-- SetupWindow           first-run dependency setup
  +-- NSStatusBarButton     menu bar icon (primary entry point)

WhatsApp Sidecar (Node.js/Express on localhost:7891)
  +-- Baileys WA Web client -> buffers messages -> polled every 2s

AutoClawd MCP Server (Swift HTTP on localhost:7892)
  +-- screen context, cursor context, selection, audio transcript tools

Pipeline Flow

[Mic] -> AudioRecorder -> ChunkManager -> PipelineOrchestrator
                               |                   |
                    StreamingLocalTranscriber  +----v------------------+
                    (live partials via         | Stage 1: Cleaning     |  TranscriptCleaningService
                     SFSpeechRecognizer)       |  local Llama 3.2      |  merge chunks, denoise,
                                               |                       |  resolve speaker context
                                               +----------+-----------+
                                                          |  fires onTranscriptionCleaned for ALL sources
                                               +----------v-----------+
                                               | Stage 2: Analysis     |  TranscriptAnalysisService
                                               |  local Llama 3.2      |  project, priority, tags,
                                               |                       |  tasks, world model update
                                               +----------+-----------+
                                                          |
                                               +----------v-----------+
                                               | Stage 3: Task         |  TaskCreationService
                                               |  Creation             |  mode: auto / ask / user
                                               +----------+-----------+
                                                          |
                                               +----------v-----------+
                                               | Stage 4: Execution    |  TaskExecutionService
                                               |  Claude Code SDK      |  streamed output, auto tasks
                                               +----------------------+

[Screen] -> ScreenVisionAnalyzer -> OCR + AX tree
                                        |
                              CapabilityStore.suggest()
                                        |
                              AppState.detectedCapability
                                        |
                              ToastWindow -> CapabilityToastView (top-right glass card)

FUCBC — Capability Learning Loop

[LearnMode active]
  |
  Every 5s: LearnEvent captured
    (app name, window title, detected URLs, OCR snippet, SFSpeech partial)
  |
  User signals "build" (or enough context accumulated)
  |
  LearnModeService.buildCapability()
    |
    buildUserJourney()     <- transforms raw events into coherent narrative
      "T+0s -> User opened Threads - Screen: 'New Llama 3.3 from Meta'"
      "T+5s -> User navigated to youtube.com - User said: 'I should try this'"
    |
    buildFUCBCPrompt()     <- story + MCP tools + social media API examples
      instructs Claude Code to write executable SKILL.md
    |
    ClaudeCodeRunner.startSession(prompt, in: openClawDir)
      Claude writes: ~/.autoclawd/openclaw-skills/{slug}/SKILL.md
      Claude outputs: JSON manifest
    |
    CapabilityStore.save(capability)
      -> posts capabilityStoreDidChange notification
      -> AgentsView reloads grid

Pipeline Sources (PipelineSource enum)

Each transcript carries a source tag that controls which stages run:

.ambient — full pipeline (clean -> analyze -> task -> execute)
.transcription — clean only (merge/denoise; no task creation)
.whatsapp — full pipeline (same as ambient, with QA reply)

Transcript Session State (AppState)

Three layers of live transcript text accumulate in the widget across ALL modes:

Property	Source	Opacity
`liveTranscriptText`	cleaned chunks from Ollama (all sources)	full
`pendingRawSegment`	committed audio not yet through Ollama	medium
`latestTranscriptChunk`	live SFSpeech streaming partial	faint/italic

Session lifecycle:

onTranscriptionCleaned fires for ALL pipeline sources, appending to liveTranscriptText
After each chunk cycle, if Date().timeIntervalSince(lastSpeechTime) >= 10.0, session ends: clearSessionTranscript() is called
Mode changes do NOT clear the transcript — only silence-end and the manual Clear button do

Local AI Model Usage

Stage	Model	Provider	Purpose
Transcription (streaming)	SFSpeechRecognizer	Apple (local)	Live word-by-word partials
Transcription (committed)	Whisper	Groq (cloud) or Apple SFSpeech (local)	Final chunk text
Cleaning	Llama 3.2 3B	Ollama (local)	Merge, denoise, resolve context
Analysis	Llama 3.2 3B	Ollama (local)	Extract tasks, update world model
Task framing	Llama 3.2 3B	Ollama (local)	Clean task titles from README/CLAUDE.md
FUCBC story + prompt	Llama 3.2 3B	Ollama (local)	Build user journey narrative
Capability building	Claude Code	Anthropic API	Write executable SKILL.md
Execution	Claude Code	Anthropic API	Run tasks in project folders

FUCBC — Capability System

Three-Tier Model

Tier	Name	Definition	Examples
1	Skill	An atomic unit — often just Claude by itself, or a single CLI tool invocation	"Write a tweet thread", `yt-dlp {url}`, `video2ai {file}`, "Summarise PDF"
2	Capability	One or more Skills combined with tool access — built by FUCBC from observed usage	"Post to all platforms" (Threads + Twitter + Buffer), "Ingest reference video" (video2ai + Claude)
3	Workflow	AutoClawd + an ordered sequence of Skills and Capabilities + Claude Code -> delivers a real output	"Launch Video" (8 capabilities in sequence, 1-click from 10-step manual process)

Key insight: A Skill is something Claude can do alone. A Capability is a Skill that also needs a specific external tool. A Workflow is what AutoClawd assembles from Capabilities (and Skills) automatically — then hands to the user as a repeatable 1-click automation.

Data Model

// Tier 1 — Skill (OpenClaw skill library)
// Lives in ~/.autoclawd/openclaw-skills/{slug}/SKILL.md
SKILL.md contents:
  name, description, invocation   // shell command or Claude prompt template
  inputs, outputs                  // what it takes, what it produces
  workflowTags: [String]          // ["video-production", "content-creation"]
  source: github | builtin | observed   // where this skill came from

// Tier 2 — Capability (CapabilityStore)
// Stored in ~/.autoclawd/capabilities/index.json
Capability                    // a FUCBC-built modular automation (Skill + tool access)
  +-- id, name, description, emoji, category
  +-- triggers: CapabilityTriggers
  |     +-- apps: [String]         // e.g. ["Threads", "Safari", "Twitter"]
  |     +-- urlPatterns: [String]  // e.g. ["threads.net", "x.com"]
  |     +-- keywords: [String]     // e.g. ["launch video", "post this"]
  |     +-- ocrPatterns: [String]  // e.g. ["New Post", "Create thread"]
  +-- subWorkflows: [SubWorkflow]  // the Skill-level steps inside this capability
  |     +-- { name, description, invocation: String? }  // shell cmd or skill slug
  +-- skillMDPath: String?         // path to ~/.autoclawd/openclaw-skills/{slug}/SKILL.md

// Tier 3 — Workflow (WorkflowStore) — UPCOMING
// Stored in ~/.autoclawd/workflows/index.json
Workflow
  +-- id, name, description
  +-- steps: [WorkflowStep]        // ordered Capability / Skill references
  +-- inputSpec: WorkflowInputSpec
  |     +-- references: [ReferenceField]   // "Target URL", "Reference videos"
  |     +-- contextField: String           // free text ("launch video for AI startup")
  |     +-- projectSelection: Bool
  +-- createdFrom: .observed | .manual | .prebuilt

Storage

~/.autoclawd/
  capabilities/
    index.json              <- all Capability records (CapabilityStore)
  openclaw-skills/
    {slug}/
      SKILL.md              <- executable skill written by Claude Code via FUCBC
  workflows/                <- UPCOMING: WorkflowRecord storage
    index.json

OCR Auto-Trigger (CapabilityStore.suggest)

Every OCR frame in ScreenVisionAnalyzer calls:

CapabilityStore.shared.suggest(screenText: ocr, app: appName, urls: urls)

Scoring per capability: +4 app match, +3 URL pattern, +2 OCR pattern, +1 keyword. If best score >= 3: sets AppState.detectedCapability -> capability toast appears in top-right.

Capability Toast (ToastWindow + CapabilityToastView)

Glass-styled notification in top-right corner:

Emoji + capability name + "Automate this?" subtitle
Tap -> AppState.executeCapability(cap) -> streams via Claude Code
X dismiss -> AppState.dismissDetectedCapability()
Auto-dismisses after 5 seconds
Wired via AppDelegate: appState.$detectedCapability Combine sink

Agents Panel (AgentsView)

"My Agents" — panel-level grid of all built capabilities:

Accessible via "Agents" tab in MainPanelView sidebar (bolt.fill icon)
3-column LazyVGrid of AgentCard views
Each card: app icon strip, bold title, description, step count, category pill, Run button
Run -> AppState.executeCapability() -> streams via Claude Code
Auto-reloads when capabilityStoreDidChange fires
Empty state: CTA to switch to Learn Mode and record a session

Capability Execution (AppState.executeCapability)

func executeCapability(_ capability: Capability)

Clears detectedCapability
Resolves project (projects.first fallback)
Builds prompt: uses skillMDPath if present ("Read and execute SKILL.md"), else constructs from sub-workflow invocations
Creates PipelineTaskRecord (id: CAP-{8hex})
Prepends to pipelineTasks (visible in LogsPipelineView)
Starts claudeCodeRunner.startSession() -> streams output

Key Files

Core App

File	Purpose
`App.swift`	SwiftUI `@main` entry point (headless — no default window)
`AppDelegate.swift`	NSApplicationDelegate; creates all windows, menu bar, wires subscriptions
`AppState.swift`	Central `ObservableObject` — all shared state, service singletons
`AppFonts.swift`	Custom font registration and font accessors
`AppTheme.swift`	Appearance system — frosted/solid modes, color scheme, theme tokens
`LiquidGlass.swift`	Glass design system: `LiquidGlassCard`, `GlassButton`, `GlassChip`, `Glass.textPrimary/Secondary/Tertiary`
`Logger.swift`	Structured logging with subsystems: `.pipeline`, `.system`, `.audio`, `.ui`

Pipeline

File	Purpose
`PipelineOrchestrator.swift`	Routes transcripts through the 4-stage pipeline
`PipelineModels.swift`	Core value types: `CleanedTranscript`, `TranscriptAnalysis`, `PipelineTaskRecord`
`PipelineStore.swift`	Persistence layer for pipeline data
`PipelineGroup.swift`	Groups related pipeline records for display
`ChunkManager.swift`	Buffers audio chunks, manages session lifecycle, calls PipelineOrchestrator
`StreamingLocalTranscriber.swift`	Live word-by-word SFSpeechRecognizer streaming; fires `onPartial` callbacks
`TranscriptCleaningService.swift`	Stage 1: Ollama Llama 3.2 transcript cleaning
`TranscriptAnalysisService.swift`	Stage 2: Ollama Llama 3.2 analysis, task extraction, world model update
`TaskCreationService.swift`	Stage 3: structured task creation with mode assignment
`TaskExecutionService.swift`	Stage 4: streams Claude Code sessions for auto tasks
`ClaudeCodeRunner.swift`	Low-level Claude Code SDK streaming client
`WorkflowRegistry.swift`	Registered execution workflows (e.g. `autoclawd-claude-code`)

FUCBC — Capability System

File	Purpose
`LearnModeModels.swift`	`LearnEvent`, `Capability`, `CapabilityCategory`, `CapabilityTriggers`, `SubWorkflow`, `LearnSession`, `CapabilityManifest`
`LearnModeService.swift`	FUCBC service: 5s event timer, `buildUserJourney()`, `buildFUCBCPrompt()`, `buildCapability()`
`CapabilityStore.swift`	Persists capabilities to `~/.autoclawd/capabilities/index.json`; `suggest()` for auto-trigger scoring
`AICanvasView.swift`	Canvas tab: Learn Mode node graph + capabilities grid subtabs
`AgentsView.swift`	"My Agents" panel: 3-col `LazyVGrid` of `AgentCard`; Run -> `executeCapability()`

Audio & Transcription

File	Purpose
`AudioRecorder.swift`	Always-on AVAudioEngine capture; engine stays hot between chunks
`SystemAudioCapturer.swift`	ScreenCaptureKit system audio + screen preview capture; `SystemAudioMixer` for thread-safe mixing
`SpeechService.swift`	Groq / Apple SFSpeech transcription of committed audio chunks
`TranscriptionService.swift`	Transcription orchestration

Storage & Persistence

File	Purpose
`TranscriptStore.swift`	SQLite transcript persistence
`PipelineStore.swift`	Pipeline record persistence
`StructuredTodoStore.swift`	Task queue with status history
`QAStore.swift`	Q&A session persistence
`ExtractionStore.swift`	Extraction result persistence
`ContextCaptureStore.swift`	Clipboard and screenshot context persistence
`SessionStore.swift`	Speaking session timeline persistence
`ProjectStore.swift`	Project list and metadata
`SkillStore.swift`	Built-in and custom skills persistence
`FileStorageManager.swift`	Attachment and file storage management

World Model & Intelligence

File	Purpose
`WorldModelService.swift`	Builds and updates per-project markdown world model
`WorldModelGraph.swift`	Graph data model parsed from world model markdown
`WorldModelGraphParser.swift`	Parses markdown world model into graph nodes/edges
`WorldModelGraphLayout.swift`	Force-directed layout for world model graph
`WorldModelGraphView.swift`	SwiftUI canvas graph visualization
`ExtractionService.swift`	Extracts structured facts, decisions, people from transcripts
`ExtractionItem.swift`	Extraction result value type
`Episode.swift`	A discrete event (song, place, person) captured in context
`NowPlayingService.swift`	ShazamKit song detection; creates Episodes
`PeopleTaggingService.swift`	Identifies and tags people mentioned in transcripts
`Person.swift`	Person value type

Context Capture

File	Purpose
`ScreenshotService.swift`	Periodic screen capture for ambient context
`ScreenVisionAnalyzer.swift`	OCR + accessibility tree analysis; feeds CapabilityStore.suggest()
`ClipboardMonitor.swift`	Monitors clipboard changes for context enrichment
`LocationService.swift`	Core Location — current place detection
`PlaceDetail.swift`	Place value type

Q&A & Skills

File	Purpose
`QAService.swift`	Handles AI search / Q&A queries against transcript context
`QAView.swift`	Q&A results UI
`Skill.swift`	Skill value type (built-in + custom)
`SkillStore.swift`	Skill persistence, seeding built-in skills on install

Todo & Task Management

File	Purpose
`TodoService.swift`	Todo list management
`TodoFramingService.swift`	Frames task titles using README/CLAUDE.md for context
`StructuredTodoStore.swift`	Persists structured task queue

UI — Windows & Shell

File	Purpose
`PillView.swift`	Floating widget SwiftUI view
`PillWindow.swift`	NSPanel wrapper with drag, snap-to-edge, height animation
`PillMode.swift`	`PillMode` enum (.ambient, .aiSearch, .learn)
`MainPanelView.swift`	Main dashboard shell; tabs: World / Agents / Canvas / Projects / Logs / Settings
`MainPanelWindow.swift`	NSWindow wrapper for dashboard
`ToastView.swift`	`CapabilityToastView` — glass-styled capability suggestion toast
`ToastWindow.swift`	Floating NSPanel for capability toasts (top-right corner)
`SetupView.swift`	First-run dependency setup UI
`OnboardingView.swift`	Onboarding flow (first launch)

UI — Panel Views

File	Purpose
`LogsPipelineView.swift`	Pipeline stage visualizer (column view)
`AgentsView.swift`	"My Agents" agents grid — 3-col capability cards, one-click run
`AICanvasView.swift`	Canvas + capabilities grid (Learn Mode / FUCBC)
`SettingsConsolidatedView.swift`	All settings UI
`IntelligenceView.swift`	Intelligence/context dashboard
`IntelligenceConsolidatedView.swift`	Consolidated intelligence panel
`SkillsView.swift`	Skills management UI
`QAView.swift`	Q&A results panel
`SessionTimelineView.swift`	Session history timeline
`UserProfileChatView.swift`	User profile and chat context view
`TagView.swift`	Tag display component

UI — Widget Canvas

File	Purpose
`WidgetView.swift`	Pill widget root view
`WidgetCanvasViews.swift`	Per-mode canvas content
`WidgetPanelViews.swift`	Expanded pill panel views

Integrations & System

File	Purpose
`WhatsAppPoller.swift`	Polls sidecar, filters to self-chat, routes to pipeline
`WhatsAppService.swift`	WhatsApp message handling and reply logic
`WhatsAppSidecar.swift`	Sidecar connection management
`ShazamKitService.swift`	ShazamKit audio fingerprinting for now-playing detection
`MCPConfigManager.swift`	MCP server configuration management
`MCPServer.swift`	Built-in HTTP MCP server (port 7892): screen/cursor/selection/transcript tools
`GlobalHotkeyMonitor.swift`	System-wide keyboard shortcut monitoring
`HotWordDetector.swift`	Real-time hotword detection in audio stream
`HotWordConfig.swift`	Hotword configuration
`ClipboardMonitor.swift`	Clipboard change monitoring
`UserProfileService.swift`	User profile and preferences
`SettingsManager.swift`	All user settings via UserDefaults + API keys
`KeychainStorage.swift`	API key storage (Keychain + env var fallback)
`DependencyInstaller.swift`	First-run Ollama/dependency setup
`CleanupService.swift`	Audio file retention cleanup
`Attachment.swift`	File attachment value type for tasks

OpenClaw Skill Library

144+ skills live in ~/.autoclawd/openclaw-skills/. Each is a directory with SKILL.md:

~/.autoclawd/openclaw-skills/
  video2ai/        <- convert video -> frames + transcript + LLM analysis (Python CLI + web UI)
  yt-dlp/          <- download videos from YouTube, YC, etc.
  remotion/        <- React-based motion graphics / programmatic video
  ffmpeg/          <- video/audio processing, segment assembly
  github/          <- GitHub issues, PRs, releases
  slack/           <- send messages, post to channels
  discord/         <- send messages, webhooks
  gdrive/          <- upload files, get shareable links
  whatsapp/        <- send messages via WhatsApp
  canvas/          <- Canva automation (screenshot + text extraction)
  coding-agent/    <- Claude Code sub-agent for code tasks
  ... 130+ more

When FUCBC discovers a new tool it hasn't seen, it auto-creates a new skill directory and SKILL.md.

Pill Modes (PillMode enum)

.ambient — always-on mic -> full pipeline; shows three-layer session transcript
.aiSearch — hotword-triggered QA queries
.learn — FUCBC mode: watches screen+voice, builds capabilities

Panel Tabs (PanelTab enum)

.world — World model graph visualization
.agents — "My Agents" grid of built capabilities (AgentsView)
.canvas — Learn Mode canvas + capabilities subtab (AICanvasView)
.projects — Project list (ProjectsListView)
.logs — Pipeline stage visualizer (LogsPipelineView)
.settings — All settings (SettingsConsolidatedView)

Task Modes (TaskMode enum)

.auto — executed immediately without approval
.ask — shown to user for approval in LogsPipelineView
.user — created but not executed (manual)

API Keys & Environment

API keys are resolved in priority order:

Environment variable (GROQ_API_KEY, ANTHROPIC_API_KEY)
macOS Keychain (legacy fallback)

Set env vars in ~/.zshenv or pass them to the app via launchd/launchctl setenv.

Groq is optional — if absent, transcription falls back to Apple SFSpeechRecognizer (fully local). Anthropic key is required for Claude Code execution (Stage 4) and FUCBC capability building.

WhatsApp Integration

Sidecar runs on localhost:7891
Only messages from the self-chat JID (myNumber@s.whatsapp.net) are processed
Group messages (JID ends with @g.us) are filtered at the sidecar level
Voice notes are transcribed then routed through the pipeline
Bot replies are sent back with "Dot: " prefix

Development Conventions

SwiftUI + AppKit: Use SwiftUI for views inside windows; AppKit (NSPanel/NSWindow) for window management
MainActor: All UI state and AppState mutations on @MainActor. Services are @unchecked Sendable crossing actors.
Logging: Use Log.info(.pipeline, "..."), Log.warn(.system, "...") — subsystems: .pipeline, .system, .audio, .ui
No force-unwraps in production paths. Use guard let or default values.
Single source of truth: AppState holds all published state. Don't duplicate state across views.
Avoid huge files: If a view exceeds ~300 lines, split into subviews.
Glass design system: Use LiquidGlassCard(tint:), GlassButton, GlassIconBadge, GlassDivider, GlassChip, Glass.textPrimary/Secondary/Tertiary — not raw Color.
Session transcript: Use appState.clearSessionTranscript() to reset. Never directly nil liveTranscriptText outside AppState.
Capability notifications: CapabilityStore.save() posts capabilityStoreDidChange — observe this in views instead of polling.

Common Tasks

Add a new pipeline stage

Add service in Sources/
Inject into PipelineOrchestrator.init()
Call it in processTranscript() after the appropriate stage
Update PipelineSource routing if stage should be skipped for certain modes

Add a new OpenClaw skill

Create directory: ~/.autoclawd/openclaw-skills/{slug}/
Write SKILL.md with: what the tool does, how to invoke it, input/output, workflow tags
Skill appears automatically in SkillStore.refreshOpenClawSkills()

Add a new capability manually

Create Capability with triggers, subWorkflows, skillMDPath
Call CapabilityStore.shared.save(capability)
Notification fires -> AgentsView reloads -> card appears immediately

Add a new setting

Add key constant + computed property in SettingsManager.swift
Add UI control in SettingsConsolidatedView.swift
Use SettingsManager.shared.yourSetting at call sites

Add a panel tab

Add case to PanelTab enum in MainPanelView.swift
Add icon to var icon: String switch
Add view to the ZStack in content with matching .opacity/.allowsHitTesting

Trigger a pipeline manually (testing)

await appState.pipelineOrchestrator.processTranscript(
    text: "test transcript",
    transcriptID: 0,
    sessionID: "test",
    sessionChunkSeq: 0,
    durationSeconds: 5,
    speakerName: "Test",
    source: .ambient
)

Test FUCBC capability toast

// Simulate a detected capability
appState.detectedCapability = CapabilityStore.shared.all().first
// -> CapabilityToastView appears in top-right corner
// Tap -> executeCapability() -> Claude Code streams output

FilesExpand file tree

CLAUDE.md

Latest commit

History

CLAUDE.md

File metadata and controls

AutoClawd — CLAUDE.md

Core Concept

Build & Run

Architecture

Process Layout

Pipeline Flow

FUCBC — Capability Learning Loop

Pipeline Sources (PipelineSource enum)

Transcript Session State (AppState)

Local AI Model Usage

FUCBC — Capability System

Three-Tier Model

Data Model

Storage

OCR Auto-Trigger (CapabilityStore.suggest)

Capability Toast (ToastWindow + CapabilityToastView)

Agents Panel (AgentsView)

Capability Execution (AppState.executeCapability)

Key Files

Core App

Pipeline

FUCBC — Capability System

Audio & Transcription

Storage & Persistence

World Model & Intelligence

Context Capture

Q&A & Skills

Todo & Task Management

UI — Windows & Shell

UI — Panel Views

UI — Widget Canvas

Integrations & System

OpenClaw Skill Library

Pill Modes (PillMode enum)

Panel Tabs (PanelTab enum)

Task Modes (TaskMode enum)

API Keys & Environment

WhatsApp Integration

Development Conventions

Common Tasks

Add a new pipeline stage

Add a new OpenClaw skill

Add a new capability manually

Add a new setting

Add a panel tab

Trigger a pipeline manually (testing)

Test FUCBC capability toast