feat: per-service local LLM token accounting via Token Spy#384
Open
nt1412 wants to merge 6 commits intoLight-Heart-Labs:mainfrom
Open
feat: per-service local LLM token accounting via Token Spy#384nt1412 wants to merge 6 commits intoLight-Heart-Labs:mainfrom
nt1412 wants to merge 6 commits intoLight-Heart-Labs:mainfrom
Conversation
Users can opt-in to routing individual services through Token Spy for per-service token accounting. All monitoring runs inside ONE container using multiple uvicorn processes — each with its own AGENT_NAME and port, sharing one SQLite database. The dashboard at :3005 shows all agents together. Port mapping (inside the token-spy container): 8080 — main instance (dashboard, cloud/agent monitoring) 8081 — open-webui monitoring (AGENT_NAME=open-webui) 8082 — perplexica monitoring (AGENT_NAME=perplexica) 8083 — openclaw monitoring (AGENT_NAME=openclaw) 8084 — litellm monitoring (AGENT_NAME=litellm) To enable, add to .env and restart: TOKEN_SPY_AUTH_MODE=local WEBUI_LLM_URL=http://token-spy:8081 Off by default. Monitoring instances only start when TOKEN_SPY_AUTH_MODE=local. No behavior change for existing installs. Known weaknesses: - AUTH_MODE=local disables auth on proxy routes (safe: Docker-internal) - Restarting a service drops active connections - Routing through Token Spy adds one network hop to LLM requests Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…uests
llama-server (and other OpenAI-compatible endpoints) only include
token usage in streaming responses when stream_options.include_usage
is explicitly set to true. Without this, Token Spy logs 0 input/output
tokens for all streaming requests — making per-service token accounting
useless for OpenClaw and other streaming clients.
Injects {"include_usage": true} into the request body before forwarding,
only when stream_options is not already set by the client. This is a
standard OpenAI field, not a custom extension.
Verified: OpenClaw streaming request now shows 14.9K input, 1.0K output
(was 0/0 before this fix).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Documents how to enable per-service token accounting: env vars, port mapping, restart commands, and the OpenClaw #token= URL fix for Docker deployments. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
OpenClaw's Control UI requires the gateway token in the URL hash (#token=...) for Docker deployments — without it, the browser gets "device identity required" on every WebSocket connection. The token is auto-generated but never shown to the user. Now inject-token.js logs the full URL to container logs on startup: docker logs dream-openclaw | grep "Control UI" Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
OpenClaw's Control UI reads the gateway token from the URL hash fragment, not query params. The sidebar was generating ?token= which doesn't work — the WebSocket handshake fails with "device identity required" because the token isn't passed to the WS connect. One character fix: ? → # Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…l-monitoring # Conflicts: # dream-server/installers/macos/docker-compose.macos.yml
Lightheartdevs
approved these changes
Mar 18, 2026
Collaborator
Lightheartdevs
left a comment
There was a problem hiding this comment.
Approve — Well-designed token monitoring feature.
Architecture
Multi-process uvicorn inside a single Token Spy container, each with its own AGENT_NAME and port (8080-8085). All processes share one SQLite database, so the dashboard shows all agents together. Off by default — only activates when TOKEN_SPY_AUTH_MODE=local.
Changes reviewed
start-monitoring.sh: Launches per-service monitoring instances- Compose env overrides for Open WebUI, Perplexica, OpenClaw, LiteLLM using
${SERVICE_LLM_URL:-${LLM_API_URL:-default}}pattern — clean fallback chain inject-token.js: OpenClaw provider baseUrl override for monitoring + browser URL logging with#token=- Sidebar fix:
?token=→#token=(OpenClaw reads hash fragment, not query param) stream_options.include_usageinjection for llama-server streaming responses.env.exampleand.env.schema.jsondocumentation
Verified claims
- Off by default (no behavior change for existing installs)
- Per-service agent names in dashboard
- Tested on fresh macOS install per PR description
Minor notes
AUTH_MODE=localdisables auth on proxy routes — acceptable since only reachable within Docker network- Apple Silicon overlay correctly included
Well-documented, well-tested, clean implementation. LGTM.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Enables optional per-service token accounting for local LLM traffic. Users can route individual services through Token Spy to track token usage, cost, and per-agent metrics at
:3005/dashboard. Off by default — no behavior change for existing installs.How it works
Token Spy runs multiple uvicorn processes inside a single container, each with its own
AGENT_NAMEand port. All processes share one SQLite database, so the dashboard shows all agents together.To enable
Add to
.envand restart:Monitoring instances only start when
TOKEN_SPY_AUTH_MODE=local.Verified on fresh official macOS install
open-webuiperplexicaopenclawCommits
feat: enable optional per-service LLM monitoring via Token Spy— multi-process launcher (start-monitoring.sh), compose env var overrides for Open WebUI/Perplexica/OpenClaw/LiteLLM,AUTH_MODE=localbypass for internal Docker services, macOS overlay,.env.schema.json+.env.exampledocumentationfix(token-spy): inject stream_options.include_usage— llama-server only includes token counts in streaming responses when explicitly requested. Without this, all streaming requests log 0/0 tokens.docs(token-spy): add local LLM monitoring quick start guide— README section with env vars, port mapping, per-service instructions, OpenClaw#token=URL workaroundfix(openclaw): log browser-accessible URL with auth token on startup— OpenClaw's Control UI requires#token=in the URL hash for Docker deployments. Now logged to container output on startup:docker logs dream-openclaw | grep "Control UI"fix(dashboard): use URL hash for OpenClaw sidebar link— sidebar was generating?token=(query param) but OpenClaw reads#token=(hash fragment). One character fix.Known limitations
AUTH_MODE=localdisables auth on Token Spy proxy routes (safe: only reachable within Docker network)#token=URL hash for Docker — logged on startup, fixed in dashboard sidebar🤖 Generated with Claude Code