Quick start for plugin logs in local Grafana/Loki, optional OTLP metrics (OpenTelemetry Collector → Prometheus), and the loki-gateway push endpoint. Canonical log schema and events: docs/GRAFANA-LOGGING.md. Routing rationale (Loki vs metrics): docs/DATA-ROUTING-OBSERVABILITY.md.
Chosen topology: SimHub plugin → OTLP (gRPC default on port 4317, or HTTP/protobuf on 4318) → OpenTelemetry Collector (otel-collector service) → Prometheus text on :8889 → Prometheus scrapes the collector → Grafana datasource prometheus_local (PromQL).
- Why not
/metricsinside the plugin: SimHub targets .NET Framework 4.8; exposing a pull endpoint without HttpListener (admin/port issues) or a separate process is awkward. OTLP to a localhost collector matches docs/DATA-ROUTING-OBSERVABILITY.md and keeps a single happy path for local dev. - Grafana → Prometheus URL: use the Docker service name
http://prometheus:9090in provisioning (notlocalhost), because Grafana runs inside the compose network. - Plugin → collector URL: use
http://127.0.0.1:4317(or 4318 withOTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf) from the Windows host so SimHub resolves IPv4 reliably.
-
Create Loki storage path — Default in
observability/local/docker-compose.ymlisS:\sim-steward-grafana-storage\. Create it or setGRAFANA_STORAGE_PATHinobservability/local/.env.observability.local. -
Start the stack (repo root):
pnpm run obs:up
Or copy
observability/local/.env.observability.example→.env.observability.local, set passwords/tokens, thenpnpm run obs:up:env. Check:pnpm run obs:ps. -
Configure the plugin — SimHub does not load
.envby default. Recommended:.\scripts\run-simhub-local-observability.ps1(setsSIMSTEWARD_LOKI_URL=http://localhost:3100,SIMSTEWARD_LOG_ENV=local, and OTLP for metrics — see script). Or set those in Windows user env and restart SimHub. See.env.example“Local Loki” and “OTLP / Prometheus (local metrics)” blocks. -
Grafana — http://localhost:3000 → Explore → Loki →
{app="sim-steward", env="local"}. Provisioned dashboard Sim Steward — Deploy health (simsteward-deploy-health) correlatesdeploy.ps1markers (event=deploy_marker) with plugin bring-up and errors. PutSIMSTEWARD_LOKI_URL(andLOKI_PUSH_TOKENif using loki-gateway) in repo.env—deploy.ps1loads it automatically viascripts/load-dotenv.ps1(optional merge:observability/local/.env.observability.local). -
Metrics (optional) — With the stack up, set
OTEL_EXPORTER_OTLP_ENDPOINT=http://127.0.0.1:4317(or useSIMSTEWARD_OTLP_ENDPOINT) before starting SimHub. After the plugin loads, Explore → Prometheus Local → e.g.simsteward_process_cpu_percentorup{job="otel-collector"}. Smoke:pnpm run obs:poll:prometheusor.\scripts\poll-prometheus.ps1. -
Generate traffic — Use SimHub + web dashboard; confirm logs in Explore with
{app="sim-steward", env="local"}(no repo-provisioned Grafana dashboards until you add JSON underobservability/local/grafana/provisioning/dashboards/).
Storage override: Set GRAFANA_STORAGE_PATH in .env.observability.local; compose uses ${GRAFANA_STORAGE_PATH:-S:/sim-steward-grafana-storage}.
Terminal tail: pnpm run obs:poll (direct Loki :3100) or pnpm run obs:poll:grafana / .\scripts\poll-loki.ps1 -ViaGrafana using GRAFANA_API_TOKEN (or admin user/password) in repo .env — same path Grafana Explore uses (loki_local datasource). Prometheus: pnpm run obs:poll:prometheus / .\scripts\poll-prometheus.ps1.
To clear Loki chunks/WAL, optional Prometheus TSDB, and optional Grafana bind-mount state without changing compose, loki-config.yml, datasource provisioning, LOKI_PUSH_TOKEN, or SIMSTEWARD_LOKI_*:
- From repo root, run
pnpm run obs:wipe -- -Force(clears thelokiandprometheussubdirectories underGRAFANA_STORAGE_PATH). - Optional flags:
-Grafana(wipesgrafana.db; re-runscripts/grafana-bootstrap.ps1if you useGRAFANA_API_TOKEN),-SampleLogs(clearsobservability/local/sample-logs/*files), or-Allfor both.
Equivalent: .\scripts\obs-wipe-local-data.ps1 -Force (same switches).
Grafana Cloud (delete dashboards and old log lines without rotating Loki credentials): see docs/GRAFANA-LOGGING.md § Housekeeping (Grafana Cloud).
The repo stack includes Grafana, Loki, and loki-gateway (nginx). The plugin writes plugin-structured.jsonl on disk (and streams logs over WebSocket to the dashboard); compose does not tail that file — rely on send-deploy-loki-marker.ps1 (called from deploy.ps1) to POST deploy markers when SIMSTEWARD_LOKI_URL is set. For pushes, use http://localhost:3100 (Loki) or http://localhost:3500 (gateway) with Authorization: Bearer <LOKI_PUSH_TOKEN> on the gateway — see docs/GRAFANA-LOGGING.md.
| Service | URL |
|---|---|
| Grafana | http://localhost:3000 |
| Loki (query / direct API) | http://localhost:3100 |
| loki-gateway (push) | http://localhost:3500 |
| OpenTelemetry Collector (OTLP gRPC) | http://127.0.0.1:4317 (host → container) |
| OpenTelemetry Collector (OTLP HTTP) | http://127.0.0.1:4318 |
| Collector Prometheus exporter (host curl / debug) | http://127.0.0.1:18889/metrics (mapped from container 8889; Prometheus scrapes otel-collector:8889 inside compose) |
| Prometheus (UI / API) | http://localhost:9090 |
| Collector health_check | http://127.0.0.1:13133 |
Files under observability/local/. Security: LOKI_PUSH_TOKEN required for POST /loki/api/v1/push on the gateway; gateway denies other routes.
Setup: Copy observability/local/.env.observability.example → .env.observability.local, set LOKI_PUSH_TOKEN, then:
docker compose --env-file .env.observability.local -f observability/local/docker-compose.yml up -d
Validate: Grafana datasource loki_local; LogQL {app="sim-steward",env="local"} once the plugin is pushing to Loki (or your configured SIMSTEWARD_LOKI_URL). MCP: list_datasources, query_loki_logs.
Troubleshooting: Token format Bearer <token>; ensure plugin-structured.jsonl is actually ingested (see docs/TROUBLESHOOTING.md §8).
The stack publishes these host ports together; any other process (or second compose project) using the same port will prevent docker compose up:
| Host port | Service |
|---|---|
| 3000 | Grafana |
| 3100 | Loki |
| 3500 | loki-gateway |
| 4317, 4318 | OpenTelemetry Collector (OTLP) |
| 8080 | data-api |
| 9090 | Prometheus |
| 13133 | Collector health_check |
| 18889 | Collector Prometheus exporter (host; container listens on 8889) |
SimHub (separate from Docker) commonly uses 8888 (HTTP) and 19847 (Sim Steward WebSocket default). Those can collide with unrelated tools, not usually with this compose file.
Audit script: from repo root run pwsh -NoProfile -File scripts/check-obs-ports.ps1 to see what is already listening on these ports (and owning process name).
Typical conflicts: 3000 (other Grafana, React dev server), 8080 (many dev backends), 9090 (another Prometheus), 4317/4318 (another OTel collector or agent). 8889: On some setups SimHub (SimHubWPF.exe) also listens on 8889 alongside 8888 — that blocks mapping collector 8889 to the host, which is why compose publishes 18889:8889 (Prometheus still scrapes otel-collector:8889 inside Docker).
up{job="otel-collector"} == 0— Prometheus cannot reach the collector onotel-collector:8889(compose network). Confirmotel-collectoris running:pnpm run obs:ps.- No
simsteward_*series — OTLP is off untilOTEL_EXPORTER_OTLP_ENDPOINTorSIMSTEWARD_OTLP_ENDPOINTis set before SimHub starts. Usehttp://127.0.0.1:4317for gRPC; for port 4318 setOTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf. - Connection refused on 4317 — Collector not started or ports not published; run
pnpm run obs:upfrom repo root. - Grafana Prometheus query errors — Datasource must be
http://prometheus:9090(container DNS), notlocalhost:9090. - Loki remains authoritative for
host_resource_sampleuntil you rely on Prom-only SLOs; metrics duplicate CPU/working set at OTLP export cadence.
- docs/GRAFANA-LOGGING.md — Labels, events, LogQL, housekeeping.
- docs/observability-scaling.md — Many users / large grids.
- docs/observability-testing.md — Harness and Explore validation.