Skip to content

feat: per-service local LLM token accounting via Token Spy#384

Open
nt1412 wants to merge 6 commits intoLight-Heart-Labs:mainfrom
nt1412:feat/token-spy-local-monitoring
Open

feat: per-service local LLM token accounting via Token Spy#384
nt1412 wants to merge 6 commits intoLight-Heart-Labs:mainfrom
nt1412:feat/token-spy-local-monitoring

Conversation

@nt1412
Copy link
Contributor

@nt1412 nt1412 commented Mar 18, 2026

Summary

Enables optional per-service token accounting for local LLM traffic. Users can route individual services through Token Spy to track token usage, cost, and per-agent metrics at :3005/dashboard. Off by default — no behavior change for existing installs.

How it works

Token Spy runs multiple uvicorn processes inside a single container, each with its own AGENT_NAME and port. All processes share one SQLite database, so the dashboard shows all agents together.

Port Agent Service
8080 token-spy Main instance (dashboard)
8081 open-webui Open WebUI chat
8082 perplexica Perplexica deep research
8083 openclaw OpenClaw agents
8084 litellm LiteLLM gateway
8085 n8n n8n workflows (via UI credential)

To enable

Add to .env and restart:

TOKEN_SPY_AUTH_MODE=local
WEBUI_LLM_URL=http://token-spy:8081

Monitoring instances only start when TOKEN_SPY_AUTH_MODE=local.

Verified on fresh official macOS install

Service Agent Input Tok Output Tok Status
Open WebUI open-webui 276–385 370–1K Verified
Perplexica perplexica 1.1K–1.3K 12–85 Verified
OpenClaw openclaw 14.9K 1.0K Verified

Commits

  1. feat: enable optional per-service LLM monitoring via Token Spy — multi-process launcher (start-monitoring.sh), compose env var overrides for Open WebUI/Perplexica/OpenClaw/LiteLLM, AUTH_MODE=local bypass for internal Docker services, macOS overlay, .env.schema.json + .env.example documentation
  2. fix(token-spy): inject stream_options.include_usage — llama-server only includes token counts in streaming responses when explicitly requested. Without this, all streaming requests log 0/0 tokens.
  3. docs(token-spy): add local LLM monitoring quick start guide — README section with env vars, port mapping, per-service instructions, OpenClaw #token= URL workaround
  4. fix(openclaw): log browser-accessible URL with auth token on startup — OpenClaw's Control UI requires #token= in the URL hash for Docker deployments. Now logged to container output on startup: docker logs dream-openclaw | grep "Control UI"
  5. fix(dashboard): use URL hash for OpenClaw sidebar link — sidebar was generating ?token= (query param) but OpenClaw reads #token= (hash fragment). One character fix.

Known limitations

  • AUTH_MODE=local disables auth on Token Spy proxy routes (safe: only reachable within Docker network)
  • Routing through Token Spy adds one network hop to LLM requests
  • Restarting a service drops active connections
  • n8n requires manual credential setup in the n8n UI (not automatable via env var)
  • OpenClaw requires #token= URL hash for Docker — logged on startup, fixed in dashboard sidebar

🤖 Generated with Claude Code

nt1412 and others added 6 commits March 18, 2026 09:19
Users can opt-in to routing individual services through Token Spy
for per-service token accounting. All monitoring runs inside ONE
container using multiple uvicorn processes — each with its own
AGENT_NAME and port, sharing one SQLite database. The dashboard
at :3005 shows all agents together.

Port mapping (inside the token-spy container):
  8080 — main instance (dashboard, cloud/agent monitoring)
  8081 — open-webui monitoring (AGENT_NAME=open-webui)
  8082 — perplexica monitoring (AGENT_NAME=perplexica)
  8083 — openclaw monitoring (AGENT_NAME=openclaw)
  8084 — litellm monitoring (AGENT_NAME=litellm)

To enable, add to .env and restart:
  TOKEN_SPY_AUTH_MODE=local
  WEBUI_LLM_URL=http://token-spy:8081

Off by default. Monitoring instances only start when
TOKEN_SPY_AUTH_MODE=local. No behavior change for existing installs.

Known weaknesses:
- AUTH_MODE=local disables auth on proxy routes (safe: Docker-internal)
- Restarting a service drops active connections
- Routing through Token Spy adds one network hop to LLM requests

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…uests

llama-server (and other OpenAI-compatible endpoints) only include
token usage in streaming responses when stream_options.include_usage
is explicitly set to true. Without this, Token Spy logs 0 input/output
tokens for all streaming requests — making per-service token accounting
useless for OpenClaw and other streaming clients.

Injects {"include_usage": true} into the request body before forwarding,
only when stream_options is not already set by the client. This is a
standard OpenAI field, not a custom extension.

Verified: OpenClaw streaming request now shows 14.9K input, 1.0K output
(was 0/0 before this fix).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Documents how to enable per-service token accounting: env vars,
port mapping, restart commands, and the OpenClaw #token= URL fix
for Docker deployments.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
OpenClaw's Control UI requires the gateway token in the URL hash
(#token=...) for Docker deployments — without it, the browser gets
"device identity required" on every WebSocket connection. The token
is auto-generated but never shown to the user.

Now inject-token.js logs the full URL to container logs on startup:
  docker logs dream-openclaw | grep "Control UI"

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
OpenClaw's Control UI reads the gateway token from the URL hash
fragment, not query params. The sidebar was generating ?token=
which doesn't work — the WebSocket handshake fails with "device
identity required" because the token isn't passed to the WS connect.

One character fix: ? → #

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…l-monitoring

# Conflicts:
#	dream-server/installers/macos/docker-compose.macos.yml
Copy link
Collaborator

@Lightheartdevs Lightheartdevs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve — Well-designed token monitoring feature.

Architecture

Multi-process uvicorn inside a single Token Spy container, each with its own AGENT_NAME and port (8080-8085). All processes share one SQLite database, so the dashboard shows all agents together. Off by default — only activates when TOKEN_SPY_AUTH_MODE=local.

Changes reviewed

  • start-monitoring.sh: Launches per-service monitoring instances
  • Compose env overrides for Open WebUI, Perplexica, OpenClaw, LiteLLM using ${SERVICE_LLM_URL:-${LLM_API_URL:-default}} pattern — clean fallback chain
  • inject-token.js: OpenClaw provider baseUrl override for monitoring + browser URL logging with #token=
  • Sidebar fix: ?token=#token= (OpenClaw reads hash fragment, not query param)
  • stream_options.include_usage injection for llama-server streaming responses
  • .env.example and .env.schema.json documentation

Verified claims

  • Off by default (no behavior change for existing installs)
  • Per-service agent names in dashboard
  • Tested on fresh macOS install per PR description

Minor notes

  • AUTH_MODE=local disables auth on proxy routes — acceptable since only reachable within Docker network
  • Apple Silicon overlay correctly included

Well-documented, well-tested, clean implementation. LGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants