From 1c61e0f728a000804ec27f3fc5116f9df25bbd94 Mon Sep 17 00:00:00 2001 From: YK Date: Wed, 11 Mar 2026 12:11:06 +0800 Subject: [PATCH] Revert "docs: add builder observability page (#140)" This reverts commit 77ccc384909f2f50d90772886adde47c77232468. --- .../observability/builder-observability.mdx | 95 ------------------- vocs.config.ts | 10 -- 2 files changed, 105 deletions(-) delete mode 100644 src/pages/guide/observability/builder-observability.mdx diff --git a/src/pages/guide/observability/builder-observability.mdx b/src/pages/guide/observability/builder-observability.mdx deleted file mode 100644 index 24aca45..0000000 --- a/src/pages/guide/observability/builder-observability.mdx +++ /dev/null @@ -1,95 +0,0 @@ ---- -title: Builder Observability -description: Internal tooling and data sources for investigating Tempo block building and validation performance, outliers, and execution breakdowns. ---- - -# Builder Observability - -Tooling and workflows for investigating Tempo block building and validation performance. The goal is to quickly identify outliers (slow builds, timed-out proposals, slow validations) and drill into per-node execution breakdowns. - -## Workflow - -The investigation loop: - -1. **Spot an outlier** — monitor [ValScope](#valscope) or Grafana dashboards for slow proposals, builds, or validations -2. **Inspect the network view** — open the block/view in ValScope to see per-validator timelines across the network -3. **Drill into execution** — jump to [BlockScope](#blockscope) for a detailed per-node execution breakdown (traces, spans, timeline) - -## Tools - -### ValScope - -Real-time validator monitoring dashboard. Ingests consensus and execution logs from all validators, correlates events into per-block timelines, and serves a live web UI. - -- **Repo:** [tempoxyz/valscope](https://github.com/tempoxyz/valscope) -- **Testnet:** `dev-joshie:3004` (Tailscale) -- **Mainnet:** `dev-joshie:3005` (Tailscale) - -**What it shows:** -- Live block and view tables with validator health stats -- Per-block swim-lane timelines showing events across all validators -- Consensus analytics — gas vs quorum scatter, quorum latency, receive delay heatmap -- Execution analytics — gas vs build time, build time dumbbell, persistence metrics -- Nullified (failed) consensus views - -**Key pages:** -| Page | Route | Description | -|---|---|---| -| Overview | `/` | Live block + view tables, validator health | -| Consensus | `/consensus` | Quorum latency, receive delays | -| Execution | `/execution` | Build times, persistence metrics | -| Block Detail | `/blocks/:height` | Full event timeline for a committed block | -| View Detail | `/epoch/:epoch/views/:view` | Full event timeline for a consensus view | - -**Validator configs:** -- [Testnet validators](https://github.com/tempoxyz/valscope/blob/main/apps/api/validators.toml) -- [Mainnet validators](https://github.com/tempoxyz/valscope/blob/main/apps/api/validators-mainnet.toml) - -### BlockScope - -Execution-level dashboard for comparing block processing across clients. Shows per-block trace breakdowns, execution timelines, and mempool overlap analysis. - -- **Repo:** [tempoxyz/blockscope](https://github.com/tempoxyz/blockscope) -- **Current deploy:** `dev-alexey:5173` (Tailscale, port-forwarded — being migrated) - -**What it shows:** -- Block-by-block comparison across execution clients (reth, nethermind, ethrex) -- Per-block execution trace timeline (state root, sub-blocks, EVM execution) -- Mempool overlap analysis — how much of each block was in the local txpool -- Per-builder block history with overlap stats - -**Key pages:** -| Page | Route | Description | -|---|---|---| -| Overview | `/` | Block comparison table across clients | -| Block Detail | `/blocks/:height` | Execution breakdown with trace timeline | -| Mempool | `/mempool` | Gas usage vs overlap scatter plot | -| Builder Detail | `/builder/:name` | Per-builder block history | - -## Data Sources - -All endpoints are internal Tailscale hostnames — requires being on the Tempo tailnet. - -| Service | Env Var | What it does | Testnet | Mainnet | -|---|---|---|---|---| -| External VLogs | `VLOGS_URL` | Logs from partner/external validators (VictoriaLogs) | `dev-euw-vl-partners.tail388b2e.ts.net` | _(none)_ | -| Internal VLogs | `VLOGS_INTERNAL_URL` | Logs from Tempo's own nodes — structured reth output during build/validation (VictoriaLogs) | `stg-nae-vl-internal.tail388b2e.ts.net` | `prd-nae-vl-internal.tail388b2e.ts.net` | -| VM External | `VM_EXTERNAL_URL` | Prometheus-style metrics (block times, gas, peers) from partner nodes (VictoriaMetrics) | `dev-euw-vm-partners.tail388b2e.ts.net` | same (namespace-filtered) | -| VM Internal | `VM_INTERNAL_URL` | Prometheus-style metrics (CPU, memory, block processing) from Tempo's own nodes (VictoriaMetrics) | `stg-nae-vm-internal.tail388b2e.ts.net` | `prd-nae-vm-internal.tail388b2e.ts.net` | -| Tempo Traces | `TEMPO_URL` | Distributed traces/spans — powers execution timeline breakdowns. **Internal nodes only.** (Grafana Tempo) | `stg-nae-grafana-tempo.tail388b2e.ts.net` | `prd-nae-grafana-tempo.tail388b2e.ts.net` | -| Namespace | `NETWORK` | Cluster/namespace selector | `moderato-stable` | `tempo-mainnet-stable` | - -## Known Outlier Patterns - -Issues surfaced through monitoring: - -- **Execution cache mutex contention** — `Updated execution cache` blocked for 400ms+ during fork/reorg scenarios. Tracked in [RETH-498](https://linear.app/tempoxyz/issue/RETH-498) -- **Late build start** — building starts after the view has already begun, reducing available build time. See [tempo#2952](https://github.com/tempoxyz/tempo/pull/2952) -- **Persistence during building** — disk persistence overlapping with block building, observed on memory-constrained machines -- **Long-running newPayload** — inability to cancel an in-progress `newPayload` execution - -## Limitations - -- **External validators have no traces/spans** — only logs and metrics are available for partner nodes. Detailed execution breakdowns (Grafana Tempo) are internal-only. -- **New instrumentation requires a release** — adding new spans or logs to testnet/mainnet requires shipping a new Tempo version. Existing instrumentation must be used until then. -- **ValScope log parsing limitations** — currently parses log lines with regex, which can be slow and sometimes misses events that need timestamp-based correlation. diff --git a/vocs.config.ts b/vocs.config.ts index f5ee342..9ebdd60 100644 --- a/vocs.config.ts +++ b/vocs.config.ts @@ -551,16 +551,6 @@ export default defineConfig({ }, ], }, - { - text: 'Observability', - collapsed: true, - items: [ - { - text: 'Builder Observability', - link: '/guide/observability/builder-observability', - }, - ], - }, // { // text: 'Infrastructure & Tooling', // items: [