feat(v0.12): Diagnostic Event Catalog — surface structured OpenClaw events as UI components#262
feat(v0.12): Diagnostic Event Catalog — surface structured OpenClaw events as UI components#262vivekchand wants to merge 1 commit intomainfrom
Conversation
…n.stuck, diagnostic.heartbeat, run.attempt, lane dequeue as UI components (closes #36) - Add session.state and session.stuck OTLP metric handlers to metrics_store - Add openclaw.diagnostic.heartbeat handler → gateway health pulse tracking - Add openclaw.run.attempt handler → per-session retry count tracking - Add /api/diagnostic-events endpoint exposing all 5 event types - Flow tab: Gateway Health stat (green/amber/unknown via OTLP heartbeat) - Flow tab: Queue Wait stat showing enqueue→dequeue delta from log tail - Flow tab: Retry count badge when run.attempt events detected - Flow tab: Stuck session alert banner (dismissable, auto-hides in 30s) - JS log handler: lane dequeue detection with wait-time computation - JS log handler: run.attempt retry badge from log tail - JS log handler: session.stuck alert from log tail - JS: polls /api/diagnostic-events every 15 events to update indicators
vivekchand
left a comment
There was a problem hiding this comment.
Good implementation overall -- all 5 event types are wired up and the new /api/diagnostic-events endpoint is clean.
One gap worth a follow-up: the dequeues bucket in metrics_store is never written to by the OTLP path. There is no handler for openclaw.lane.dequeue in _process_otlp_metrics, so queue_avg_wait_ms in the API response will always be 0 when data arrives via OTLP. Queue wait currently only works from the JS frontend (window._laneEnqueueTs set on log-tail lane enqueue lines). If the gateway emits openclaw.lane.dequeue as a metric, a matching elif name == 'openclaw.lane.dequeue' block in _process_otlp_metrics (storing enqueue_ts from an attribute) would close the loop.
Everything else looks good -- the stuck-session banner, gateway health card, and retry badge all follow the existing patterns correctly.
Closes #36
What
Implements all 5 missing diagnostic event handlers from the catalog spec, surfacing them as visible UI components on the Flow tab.
How
Backend (dashboard.py)
sessions,gw_health,retries,dequeues_process_otlp_metrics:openclaw.session.state→ tracks session state changesopenclaw.session.stuck→ flags stuck sessions (clears with timestamp)openclaw.diagnostic.heartbeat→ gateway health pulse (healthy if < 120s ago)openclaw.run.attempt→ per-session retry counterGET /api/diagnostic-events— returns gateway health, stuck sessions, session states, retry counts, and queue wait time in one responseFrontend (JS log tail + UI)
lane dequeuedetection: records enqueue timestamp atlane enqueue, computes wait-time delta atlane dequeue, displays in flow feedrun.attemptdetection: parses retry number from log line, bumps retry badgesession.stuckdetection: shows dismissable alert banner on Flow tab/api/diagnostic-eventsevery 15 events, shows ✅ Healthy /Acceptance criteria (from issue #36)
openclaw.queue.lane.depthparsed and shown in queue depth panel (pre-existing)openclaw.session.stucksurfaces stuck-session alertopenclaw.diagnostic.heartbeatdrives gateway health indicator in header