Skip to content

architecture: establish a hardware-first real-time interaction loop #2

@zevorn

Description

@zevorn

Background

RT-Claw already has the basic pieces in place: AI chat, tool use, scheduler, heartbeat, and early swarm support.

What is still missing is a clear system-level direction for real-time behavior. Right now the execution model is still too AI-
centered:

  • time-sensitive actions can still be shaped by AI latency
  • user input, hardware events, background work, and AI reasoning are not clearly separated
  • the project does not yet present a strong identity around RT responsiveness or smooth hardware interaction

This makes RT-Claw feel more like an agent running on an RTOS than a real-time AI runtime for the physical world.

This issue proposes a clear architectural direction:

RT-Claw should not just run an LLM on embedded hardware. It should run AI on top of a runtime with real-time reflexes.

Goals

  • establish a hardware-first real-time interaction loop
  • keep the fast path independent from LLM latency
  • give users immediate feedback before full completion or explanation
  • turn GPIO, PWM, ADC, LCD, timers, and swarm into part of the interaction model
  • make RT-Claw clearly different from a generic embedded agent

Core Principles

  1. LLM must never sit in the fast path
  2. ack first, complete second, explain last
  3. events must be classified and prioritized, not pushed through one plain FIFO
  4. local actions and local rules come first
  5. AI handles reasoning, planning, and summarization, not every control loop

Proposed Architecture

1. RT Event Fabric

Evolve the current gateway into a real event fabric instead of a message queue skeleton.

At minimum, events should be split into four classes:

  • P0 Reflex: interrupts, limit switches, threshold crossings, emergency stop, critical edge-triggered events
  • P1 Control: GPIO/PWM updates, display changes, scheduled device actions, node state changes
  • P2 Interaction: shell input, IM messages, WebSocket input, progress/status feedback
  • P3 AI/Background: LLM reasoning, heartbeat summaries, memory consolidation, archival work

Each event should carry metadata such as:

  • source
  • priority
  • deadline
  • correlation_id
  • requires_ai
  • state_snapshot_id

2. Fast Path Runtime

Provide a local execution path for hardware interaction that does not depend on LLM calls.

Typical fast-path capabilities include:

  • GPIO input/output
  • PWM control
  • ADC sampling with local threshold/rule checks
  • partial LCD updates
  • simple rule evaluation
  • deadline-sensitive scheduled actions
  • swarm state updates

The point is not "AI can call hardware tools".
The point is "the runtime can perform the right hardware action immediately".

3. Slow AI Plane

Keep ai_engine as the slow path for:

  • complex reasoning
  • natural language explanation
  • multi-step tool orchestration
  • periodic summaries
  • memory organization and writeback

The AI plane should only consume events that actually need AI.
It must not define the latency of the whole system.

4. Capability Registry

The current tool model should be extended beyond "tools exposed to the LLM".

RT-Claw should maintain a capability registry that both the runtime and the AI layer can use.

Each capability should eventually describe properties such as:

  • latency_class
  • safe_in_irq
  • safe_in_worker
  • requires_ai
  • display_affinity
  • deadline_hint

That gives the runtime enough information to decide whether something belongs on the fast path or the slow path.

Real-Time Interaction Loop

The target interaction loop should look like this:

  1. event arrives
  2. classify the event
  3. send immediate ack if user-facing
  4. execute local action if possible
  5. push state update to shell / LCD / IM / WebSocket
  6. call AI only if planning or explanation is needed
  7. write memory / logs asynchronously in the background

The key idea is simple:

action first, explanation later

Example Flows

Sensor / GPIO event

  • interrupt or sampling event arrives
  • runtime classifies it as P0 or P1
  • local rule runs immediately
  • GPIO / LCD / state changes are applied
  • user sees instant feedback
  • AI is called only if a summary or explanation is actually needed

IM command

  • Feishu or future IM message arrives
  • system immediately replies with a short ack such as "received" or "executing"
  • local tools execute first
  • final result is returned
  • optional AI text is added only when needed

Scheduled task

  • scheduler fires a local action or an AI-triggering task
  • local actions must not wait for AI
  • AI-based tasks must run in a worker and must never block scheduling behavior

Why This Matters

This direction gives RT-Claw a much clearer identity:

  • not just an embedded chatbot
  • not just a tool-calling agent on an RTOS
  • but a runtime that gives AI real-time reflexes in the physical world

Put differently:

  • cloud AI provides intelligence
  • RT-Claw provides reflexes

Impact on Existing Modules

gateway

Evolve into an RT Event Fabric with priority lanes, deadline-aware routing, and deferred work support.

scheduler

Move beyond a coarse polling scheduler.
The current 1s tick is fine for early demos, but it is not enough for the real-time story.

ai_engine

Keep it as a serialized slow-path executor, but make it consume only AI-worthy work.
AI execution should not block event/control handling.

tools

Separate "LLM tool" from "runtime capability".
The runtime should be able to invoke capabilities directly without routing through the model.

heartbeat

Keep the current direction of aggregating events first and only calling AI when useful.

swarm

Treat swarm as a distributed event source and coordination layer, not just as a future messaging feature.

Suggested Milestones

Phase 1: Immediate feedback

  • unify ack/status reporting for shell, LCD, and IM
  • consistently report received / executing / done / failed
  • expose tool execution progress outside the local shell

Phase 2: Event separation

  • introduce multiple queues or priority lanes
  • separate chat, control, hardware event, and background AI work
  • prevent AI work from blocking control/event handling

Phase 3: Hardware-first interaction

  • add a local rule engine for fast actions
  • support partial LCD refresh / dirty-region updates
  • improve scheduler granularity
  • integrate swarm events into the same event fabric

Acceptance Criteria

  • a time-sensitive local action can complete without waiting for AI
  • interactive requests always get an immediate ack
  • AI execution no longer blocks event/control handling
  • hardware capability metadata is available to runtime scheduling
  • shell / IM / LCD share one status reporting model
  • the architecture clearly separates reflex path from cognitive path

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions