AutoClawd is a macOS ambient AI agent. It runs as a floating pill widget and menu bar icon with an always-on microphone and a background pipeline that listens to conversations, transcribes them with local AI models, extracts tasks, and executes them autonomously via Claude Code — without the user ever typing a prompt.
- Always-on mic → captures 30-second audio chunks continuously with live word-by-word streaming
- Always-on screen → OCR + accessibility tree watching for context; detects app switches, URLs, text
- Always-on intelligence → local Llama 3.2 cleans and analyzes every transcript
- Zero-prompt execution → tasks are created and auto-run based on what was said/seen, not what was asked
- FUCBC → Find Use-Case, Build Capability: watches workflows, builds executable SKILL.md automations
- Agents → library of built capabilities, one click to run from the Agents panel
- Capability toasts → glass notification in top-right when OCR detects an automatable workflow
- World model → persistent per-project knowledge base built from every conversation
# Build (ad-hoc signed, no provisioning needed)
make
# Build + run immediately
make runThe Makefile copies the built bundle to build/AutoClawd.app. To install permanently: cp -r build/AutoClawd.app /Applications/.
The WhatsApp sidecar is a separate Node.js process:
cd WhatsAppSidecar && npm install && npm startAutoClawd.app (Swift/SwiftUI macOS app)
+-- PillWindow floating NSPanel widget (always on top)
+-- MainPanelWindow main dashboard (opens on pill tap or menu bar click)
+-- ToastWindow capability suggestion toasts (top-right glass card)
+-- SetupWindow first-run dependency setup
+-- NSStatusBarButton menu bar icon (primary entry point)
WhatsApp Sidecar (Node.js/Express on localhost:7891)
+-- Baileys WA Web client -> buffers messages -> polled every 2s
AutoClawd MCP Server (Swift HTTP on localhost:7892)
+-- screen context, cursor context, selection, audio transcript tools
[Mic] -> AudioRecorder -> ChunkManager -> PipelineOrchestrator
| |
StreamingLocalTranscriber +----v------------------+
(live partials via | Stage 1: Cleaning | TranscriptCleaningService
SFSpeechRecognizer) | local Llama 3.2 | merge chunks, denoise,
| | resolve speaker context
+----------+-----------+
| fires onTranscriptionCleaned for ALL sources
+----------v-----------+
| Stage 2: Analysis | TranscriptAnalysisService
| local Llama 3.2 | project, priority, tags,
| | tasks, world model update
+----------+-----------+
|
+----------v-----------+
| Stage 3: Task | TaskCreationService
| Creation | mode: auto / ask / user
+----------+-----------+
|
+----------v-----------+
| Stage 4: Execution | TaskExecutionService
| Claude Code SDK | streamed output, auto tasks
+----------------------+
[Screen] -> ScreenVisionAnalyzer -> OCR + AX tree
|
CapabilityStore.suggest()
|
AppState.detectedCapability
|
ToastWindow -> CapabilityToastView (top-right glass card)
[LearnMode active]
|
Every 5s: LearnEvent captured
(app name, window title, detected URLs, OCR snippet, SFSpeech partial)
|
User signals "build" (or enough context accumulated)
|
LearnModeService.buildCapability()
|
buildUserJourney() <- transforms raw events into coherent narrative
"T+0s -> User opened Threads - Screen: 'New Llama 3.3 from Meta'"
"T+5s -> User navigated to youtube.com - User said: 'I should try this'"
|
buildFUCBCPrompt() <- story + MCP tools + social media API examples
instructs Claude Code to write executable SKILL.md
|
ClaudeCodeRunner.startSession(prompt, in: openClawDir)
Claude writes: ~/.autoclawd/openclaw-skills/{slug}/SKILL.md
Claude outputs: JSON manifest
|
CapabilityStore.save(capability)
-> posts capabilityStoreDidChange notification
-> AgentsView reloads grid
Each transcript carries a source tag that controls which stages run:
.ambient— full pipeline (clean -> analyze -> task -> execute).transcription— clean only (merge/denoise; no task creation).whatsapp— full pipeline (same as ambient, with QA reply)
Three layers of live transcript text accumulate in the widget across ALL modes:
| Property | Source | Opacity |
|---|---|---|
liveTranscriptText |
cleaned chunks from Ollama (all sources) | full |
pendingRawSegment |
committed audio not yet through Ollama | medium |
latestTranscriptChunk |
live SFSpeech streaming partial | faint/italic |
Session lifecycle:
onTranscriptionCleanedfires for ALL pipeline sources, appending toliveTranscriptText- After each chunk cycle, if
Date().timeIntervalSince(lastSpeechTime) >= 10.0, session ends:clearSessionTranscript()is called - Mode changes do NOT clear the transcript — only silence-end and the manual Clear button do
| Stage | Model | Provider | Purpose |
|---|---|---|---|
| Transcription (streaming) | SFSpeechRecognizer | Apple (local) | Live word-by-word partials |
| Transcription (committed) | Whisper | Groq (cloud) or Apple SFSpeech (local) | Final chunk text |
| Cleaning | Llama 3.2 3B | Ollama (local) | Merge, denoise, resolve context |
| Analysis | Llama 3.2 3B | Ollama (local) | Extract tasks, update world model |
| Task framing | Llama 3.2 3B | Ollama (local) | Clean task titles from README/CLAUDE.md |
| FUCBC story + prompt | Llama 3.2 3B | Ollama (local) | Build user journey narrative |
| Capability building | Claude Code | Anthropic API | Write executable SKILL.md |
| Execution | Claude Code | Anthropic API | Run tasks in project folders |
| Tier | Name | Definition | Examples |
|---|---|---|---|
| 1 | Skill | An atomic unit — often just Claude by itself, or a single CLI tool invocation | "Write a tweet thread", yt-dlp {url}, video2ai {file}, "Summarise PDF" |
| 2 | Capability | One or more Skills combined with tool access — built by FUCBC from observed usage | "Post to all platforms" (Threads + Twitter + Buffer), "Ingest reference video" (video2ai + Claude) |
| 3 | Workflow | AutoClawd + an ordered sequence of Skills and Capabilities + Claude Code -> delivers a real output | "Launch Video" (8 capabilities in sequence, 1-click from 10-step manual process) |
Key insight: A Skill is something Claude can do alone. A Capability is a Skill that also needs a specific external tool. A Workflow is what AutoClawd assembles from Capabilities (and Skills) automatically — then hands to the user as a repeatable 1-click automation.
// Tier 1 — Skill (OpenClaw skill library)
// Lives in ~/.autoclawd/openclaw-skills/{slug}/SKILL.md
SKILL.md contents:
name, description, invocation // shell command or Claude prompt template
inputs, outputs // what it takes, what it produces
workflowTags: [String] // ["video-production", "content-creation"]
source: github | builtin | observed // where this skill came from
// Tier 2 — Capability (CapabilityStore)
// Stored in ~/.autoclawd/capabilities/index.json
Capability // a FUCBC-built modular automation (Skill + tool access)
+-- id, name, description, emoji, category
+-- triggers: CapabilityTriggers
| +-- apps: [String] // e.g. ["Threads", "Safari", "Twitter"]
| +-- urlPatterns: [String] // e.g. ["threads.net", "x.com"]
| +-- keywords: [String] // e.g. ["launch video", "post this"]
| +-- ocrPatterns: [String] // e.g. ["New Post", "Create thread"]
+-- subWorkflows: [SubWorkflow] // the Skill-level steps inside this capability
| +-- { name, description, invocation: String? } // shell cmd or skill slug
+-- skillMDPath: String? // path to ~/.autoclawd/openclaw-skills/{slug}/SKILL.md
// Tier 3 — Workflow (WorkflowStore) — UPCOMING
// Stored in ~/.autoclawd/workflows/index.json
Workflow
+-- id, name, description
+-- steps: [WorkflowStep] // ordered Capability / Skill references
+-- inputSpec: WorkflowInputSpec
| +-- references: [ReferenceField] // "Target URL", "Reference videos"
| +-- contextField: String // free text ("launch video for AI startup")
| +-- projectSelection: Bool
+-- createdFrom: .observed | .manual | .prebuilt~/.autoclawd/
capabilities/
index.json <- all Capability records (CapabilityStore)
openclaw-skills/
{slug}/
SKILL.md <- executable skill written by Claude Code via FUCBC
workflows/ <- UPCOMING: WorkflowRecord storage
index.json
Every OCR frame in ScreenVisionAnalyzer calls:
CapabilityStore.shared.suggest(screenText: ocr, app: appName, urls: urls)Scoring per capability: +4 app match, +3 URL pattern, +2 OCR pattern, +1 keyword.
If best score >= 3: sets AppState.detectedCapability -> capability toast appears in top-right.
Glass-styled notification in top-right corner:
- Emoji + capability name + "Automate this?" subtitle
- Tap ->
AppState.executeCapability(cap)-> streams via Claude Code - X dismiss ->
AppState.dismissDetectedCapability() - Auto-dismisses after 5 seconds
- Wired via
AppDelegate:appState.$detectedCapabilityCombine sink
"My Agents" — panel-level grid of all built capabilities:
- Accessible via "Agents" tab in MainPanelView sidebar (bolt.fill icon)
- 3-column
LazyVGridofAgentCardviews - Each card: app icon strip, bold title, description, step count, category pill, Run button
- Run ->
AppState.executeCapability()-> streams via Claude Code - Auto-reloads when
capabilityStoreDidChangefires - Empty state: CTA to switch to Learn Mode and record a session
func executeCapability(_ capability: Capability)- Clears
detectedCapability - Resolves project (
projects.firstfallback) - Builds prompt: uses
skillMDPathif present ("Read and execute SKILL.md"), else constructs from sub-workflow invocations - Creates
PipelineTaskRecord(id:CAP-{8hex}) - Prepends to
pipelineTasks(visible in LogsPipelineView) - Starts
claudeCodeRunner.startSession()-> streams output
| File | Purpose |
|---|---|
App.swift |
SwiftUI @main entry point (headless — no default window) |
AppDelegate.swift |
NSApplicationDelegate; creates all windows, menu bar, wires subscriptions |
AppState.swift |
Central ObservableObject — all shared state, service singletons |
AppFonts.swift |
Custom font registration and font accessors |
AppTheme.swift |
Appearance system — frosted/solid modes, color scheme, theme tokens |
LiquidGlass.swift |
Glass design system: LiquidGlassCard, GlassButton, GlassChip, Glass.textPrimary/Secondary/Tertiary |
Logger.swift |
Structured logging with subsystems: .pipeline, .system, .audio, .ui |
| File | Purpose |
|---|---|
PipelineOrchestrator.swift |
Routes transcripts through the 4-stage pipeline |
PipelineModels.swift |
Core value types: CleanedTranscript, TranscriptAnalysis, PipelineTaskRecord |
PipelineStore.swift |
Persistence layer for pipeline data |
PipelineGroup.swift |
Groups related pipeline records for display |
ChunkManager.swift |
Buffers audio chunks, manages session lifecycle, calls PipelineOrchestrator |
StreamingLocalTranscriber.swift |
Live word-by-word SFSpeechRecognizer streaming; fires onPartial callbacks |
TranscriptCleaningService.swift |
Stage 1: Ollama Llama 3.2 transcript cleaning |
TranscriptAnalysisService.swift |
Stage 2: Ollama Llama 3.2 analysis, task extraction, world model update |
TaskCreationService.swift |
Stage 3: structured task creation with mode assignment |
TaskExecutionService.swift |
Stage 4: streams Claude Code sessions for auto tasks |
ClaudeCodeRunner.swift |
Low-level Claude Code SDK streaming client |
WorkflowRegistry.swift |
Registered execution workflows (e.g. autoclawd-claude-code) |
| File | Purpose |
|---|---|
LearnModeModels.swift |
LearnEvent, Capability, CapabilityCategory, CapabilityTriggers, SubWorkflow, LearnSession, CapabilityManifest |
LearnModeService.swift |
FUCBC service: 5s event timer, buildUserJourney(), buildFUCBCPrompt(), buildCapability() |
CapabilityStore.swift |
Persists capabilities to ~/.autoclawd/capabilities/index.json; suggest() for auto-trigger scoring |
AICanvasView.swift |
Canvas tab: Learn Mode node graph + capabilities grid subtabs |
AgentsView.swift |
"My Agents" panel: 3-col LazyVGrid of AgentCard; Run -> executeCapability() |
| File | Purpose |
|---|---|
AudioRecorder.swift |
Always-on AVAudioEngine capture; engine stays hot between chunks |
SystemAudioCapturer.swift |
ScreenCaptureKit system audio + screen preview capture; SystemAudioMixer for thread-safe mixing |
SpeechService.swift |
Groq / Apple SFSpeech transcription of committed audio chunks |
TranscriptionService.swift |
Transcription orchestration |
| File | Purpose |
|---|---|
TranscriptStore.swift |
SQLite transcript persistence |
PipelineStore.swift |
Pipeline record persistence |
StructuredTodoStore.swift |
Task queue with status history |
QAStore.swift |
Q&A session persistence |
ExtractionStore.swift |
Extraction result persistence |
ContextCaptureStore.swift |
Clipboard and screenshot context persistence |
SessionStore.swift |
Speaking session timeline persistence |
ProjectStore.swift |
Project list and metadata |
SkillStore.swift |
Built-in and custom skills persistence |
FileStorageManager.swift |
Attachment and file storage management |
| File | Purpose |
|---|---|
WorldModelService.swift |
Builds and updates per-project markdown world model |
WorldModelGraph.swift |
Graph data model parsed from world model markdown |
WorldModelGraphParser.swift |
Parses markdown world model into graph nodes/edges |
WorldModelGraphLayout.swift |
Force-directed layout for world model graph |
WorldModelGraphView.swift |
SwiftUI canvas graph visualization |
ExtractionService.swift |
Extracts structured facts, decisions, people from transcripts |
ExtractionItem.swift |
Extraction result value type |
Episode.swift |
A discrete event (song, place, person) captured in context |
NowPlayingService.swift |
ShazamKit song detection; creates Episodes |
PeopleTaggingService.swift |
Identifies and tags people mentioned in transcripts |
Person.swift |
Person value type |
| File | Purpose |
|---|---|
ScreenshotService.swift |
Periodic screen capture for ambient context |
ScreenVisionAnalyzer.swift |
OCR + accessibility tree analysis; feeds CapabilityStore.suggest() |
ClipboardMonitor.swift |
Monitors clipboard changes for context enrichment |
LocationService.swift |
Core Location — current place detection |
PlaceDetail.swift |
Place value type |
| File | Purpose |
|---|---|
QAService.swift |
Handles AI search / Q&A queries against transcript context |
QAView.swift |
Q&A results UI |
Skill.swift |
Skill value type (built-in + custom) |
SkillStore.swift |
Skill persistence, seeding built-in skills on install |
| File | Purpose |
|---|---|
TodoService.swift |
Todo list management |
TodoFramingService.swift |
Frames task titles using README/CLAUDE.md for context |
StructuredTodoStore.swift |
Persists structured task queue |
| File | Purpose |
|---|---|
PillView.swift |
Floating widget SwiftUI view |
PillWindow.swift |
NSPanel wrapper with drag, snap-to-edge, height animation |
PillMode.swift |
PillMode enum (.ambient, .aiSearch, .learn) |
MainPanelView.swift |
Main dashboard shell; tabs: World / Agents / Canvas / Projects / Logs / Settings |
MainPanelWindow.swift |
NSWindow wrapper for dashboard |
ToastView.swift |
CapabilityToastView — glass-styled capability suggestion toast |
ToastWindow.swift |
Floating NSPanel for capability toasts (top-right corner) |
SetupView.swift |
First-run dependency setup UI |
OnboardingView.swift |
Onboarding flow (first launch) |
| File | Purpose |
|---|---|
LogsPipelineView.swift |
Pipeline stage visualizer (column view) |
AgentsView.swift |
"My Agents" agents grid — 3-col capability cards, one-click run |
AICanvasView.swift |
Canvas + capabilities grid (Learn Mode / FUCBC) |
SettingsConsolidatedView.swift |
All settings UI |
IntelligenceView.swift |
Intelligence/context dashboard |
IntelligenceConsolidatedView.swift |
Consolidated intelligence panel |
SkillsView.swift |
Skills management UI |
QAView.swift |
Q&A results panel |
SessionTimelineView.swift |
Session history timeline |
UserProfileChatView.swift |
User profile and chat context view |
TagView.swift |
Tag display component |
| File | Purpose |
|---|---|
WidgetView.swift |
Pill widget root view |
WidgetCanvasViews.swift |
Per-mode canvas content |
WidgetPanelViews.swift |
Expanded pill panel views |
| File | Purpose |
|---|---|
WhatsAppPoller.swift |
Polls sidecar, filters to self-chat, routes to pipeline |
WhatsAppService.swift |
WhatsApp message handling and reply logic |
WhatsAppSidecar.swift |
Sidecar connection management |
ShazamKitService.swift |
ShazamKit audio fingerprinting for now-playing detection |
MCPConfigManager.swift |
MCP server configuration management |
MCPServer.swift |
Built-in HTTP MCP server (port 7892): screen/cursor/selection/transcript tools |
GlobalHotkeyMonitor.swift |
System-wide keyboard shortcut monitoring |
HotWordDetector.swift |
Real-time hotword detection in audio stream |
HotWordConfig.swift |
Hotword configuration |
ClipboardMonitor.swift |
Clipboard change monitoring |
UserProfileService.swift |
User profile and preferences |
SettingsManager.swift |
All user settings via UserDefaults + API keys |
KeychainStorage.swift |
API key storage (Keychain + env var fallback) |
DependencyInstaller.swift |
First-run Ollama/dependency setup |
CleanupService.swift |
Audio file retention cleanup |
Attachment.swift |
File attachment value type for tasks |
144+ skills live in ~/.autoclawd/openclaw-skills/. Each is a directory with SKILL.md:
~/.autoclawd/openclaw-skills/
video2ai/ <- convert video -> frames + transcript + LLM analysis (Python CLI + web UI)
yt-dlp/ <- download videos from YouTube, YC, etc.
remotion/ <- React-based motion graphics / programmatic video
ffmpeg/ <- video/audio processing, segment assembly
github/ <- GitHub issues, PRs, releases
slack/ <- send messages, post to channels
discord/ <- send messages, webhooks
gdrive/ <- upload files, get shareable links
whatsapp/ <- send messages via WhatsApp
canvas/ <- Canva automation (screenshot + text extraction)
coding-agent/ <- Claude Code sub-agent for code tasks
... 130+ more
When FUCBC discovers a new tool it hasn't seen, it auto-creates a new skill directory and SKILL.md.
.ambient— always-on mic -> full pipeline; shows three-layer session transcript.aiSearch— hotword-triggered QA queries.learn— FUCBC mode: watches screen+voice, builds capabilities
.world— World model graph visualization.agents— "My Agents" grid of built capabilities (AgentsView).canvas— Learn Mode canvas + capabilities subtab (AICanvasView).projects— Project list (ProjectsListView).logs— Pipeline stage visualizer (LogsPipelineView).settings— All settings (SettingsConsolidatedView)
.auto— executed immediately without approval.ask— shown to user for approval in LogsPipelineView.user— created but not executed (manual)
API keys are resolved in priority order:
- Environment variable (
GROQ_API_KEY,ANTHROPIC_API_KEY) - macOS Keychain (legacy fallback)
Set env vars in ~/.zshenv or pass them to the app via launchd/launchctl setenv.
Groq is optional — if absent, transcription falls back to Apple SFSpeechRecognizer (fully local). Anthropic key is required for Claude Code execution (Stage 4) and FUCBC capability building.
- Sidecar runs on
localhost:7891 - Only messages from the self-chat JID (
myNumber@s.whatsapp.net) are processed - Group messages (JID ends with
@g.us) are filtered at the sidecar level - Voice notes are transcribed then routed through the pipeline
- Bot replies are sent back with
"Dot: "prefix
- SwiftUI + AppKit: Use SwiftUI for views inside windows; AppKit (NSPanel/NSWindow) for window management
- MainActor: All UI state and AppState mutations on
@MainActor. Services are@unchecked Sendablecrossing actors. - Logging: Use
Log.info(.pipeline, "..."),Log.warn(.system, "...")— subsystems:.pipeline,.system,.audio,.ui - No force-unwraps in production paths. Use
guard letor default values. - Single source of truth:
AppStateholds all published state. Don't duplicate state across views. - Avoid huge files: If a view exceeds ~300 lines, split into subviews.
- Glass design system: Use
LiquidGlassCard(tint:),GlassButton,GlassIconBadge,GlassDivider,GlassChip,Glass.textPrimary/Secondary/Tertiary— not rawColor. - Session transcript: Use
appState.clearSessionTranscript()to reset. Never directly nilliveTranscriptTextoutsideAppState. - Capability notifications:
CapabilityStore.save()postscapabilityStoreDidChange— observe this in views instead of polling.
- Add service in
Sources/ - Inject into
PipelineOrchestrator.init() - Call it in
processTranscript()after the appropriate stage - Update
PipelineSourcerouting if stage should be skipped for certain modes
- Create directory:
~/.autoclawd/openclaw-skills/{slug}/ - Write
SKILL.mdwith: what the tool does, how to invoke it, input/output, workflow tags - Skill appears automatically in
SkillStore.refreshOpenClawSkills()
- Create
Capabilitywith triggers, subWorkflows, skillMDPath - Call
CapabilityStore.shared.save(capability) - Notification fires -> AgentsView reloads -> card appears immediately
- Add key constant + computed property in
SettingsManager.swift - Add UI control in
SettingsConsolidatedView.swift - Use
SettingsManager.shared.yourSettingat call sites
- Add case to
PanelTabenum inMainPanelView.swift - Add icon to
var icon: Stringswitch - Add view to the
ZStackincontentwith matching.opacity/.allowsHitTesting
await appState.pipelineOrchestrator.processTranscript(
text: "test transcript",
transcriptID: 0,
sessionID: "test",
sessionChunkSeq: 0,
durationSeconds: 5,
speakerName: "Test",
source: .ambient
)// Simulate a detected capability
appState.detectedCapability = CapabilityStore.shared.all().first
// -> CapabilityToastView appears in top-right corner
// Tap -> executeCapability() -> Claude Code streams output