Skip to content

[metrics] Add Bitcoin confirmation-count gauges for deposits and withdrawals#454

Draft
0xsiddharthks wants to merge 5 commits intosiddharth/telemetry-1from
siddharth/telemetry-2
Draft

[metrics] Add Bitcoin confirmation-count gauges for deposits and withdrawals#454
0xsiddharthks wants to merge 5 commits intosiddharth/telemetry-1from
siddharth/telemetry-2

Conversation

@0xsiddharthks
Copy link
Copy Markdown
Contributor

@0xsiddharthks 0xsiddharthks commented Apr 14, 2026

Adds two new metrics:

  • hashi_deposit_request_confirmations
  • hashi_withdrawal_tx_confirmations

each metric is labelled by the status set {not_found, mempool, 0, 1, 2, 3, 4, 5, 6_plus}

The Grafana dashboard can now show the live distribution of in-flight items across Bitcoin confirmation buckets.

The new confirmation_metrics task wakes on every Kyoto block_height tick, snapshots both queues from onchain state (dropping the read lock before any awaits), and queries btc_monitor.get_transaction_status for each txid in bounded parallelism (max 8 concurrent bitcoind RPCs). Runs on every validator, not just the leader, so the dashboard survives leader rotations. Gated on kyoto_synced == 1; skips the gauge write when any query errored rather than flashing partial data.

@0xsiddharthks 0xsiddharthks requested a review from bmwill as a code owner April 14, 2026 23:30
@0xsiddharthks 0xsiddharthks marked this pull request as draft April 14, 2026 23:36
@0xsiddharthks 0xsiddharthks force-pushed the siddharth/telemetry-1 branch from 5777787 to 2820309 Compare April 15, 2026 00:07
@0xsiddharthks 0xsiddharthks force-pushed the siddharth/telemetry-2 branch from 1dd2c35 to 4715fee Compare April 15, 2026 00:07
Replaces the per-binary init_tracing_subscriber() boilerplate with a shared
hashi-telemetry crate that auto-detects the log format via std::io::IsTerminal
and honors RUST_LOG / RUST_LOG_JSON:

  - stderr is a terminal (local dev) → human format
  - stderr is a pipe (K8s pod)       → JSON (for Loki)
  - RUST_LOG_JSON=1 / =0             → hard override, wins over auto-detect

CI workflows that want human-readable output for pipe'd GitHub Actions runner
stderr set RUST_LOG_JSON=0 at the workflow env level. .github/workflows/ci.yml
does this for the hashi CI job.

Instruments the main unit-of-work boundaries across the daemon: the gRPC
handlers in bridge_service, deposit/withdrawal validation, every leader
task (deposit processing, withdrawal approval/commitment/signing/confirmation
fan-outs), the Bitcoin and Sui indexers, MPC run loops and handle_reconfig,
every execute_* method in sui_tx_executor (with sui_digest recorded on the
generic execute), and mpc::signing::sign so presig-reassignment logs carry
their parent span.

Service identification (hashi vs hashi-screener vs hashi-guardian vs
hashi-monitor) is done via Kubernetes pod labels already injected by
Promtail at ingest, not via a log-body field.
@0xsiddharthks 0xsiddharthks force-pushed the siddharth/telemetry-1 branch from 2820309 to 92086db Compare April 15, 2026 00:16
@0xsiddharthks 0xsiddharthks force-pushed the siddharth/telemetry-2 branch from 4715fee to 4cf57df Compare April 15, 2026 00:17
The big module-level bullet list, per-field doc comments, per-method doc
comments, and the inline NOTE block in init() were mostly describing what
the code already says. Keep one short module-level summary and one line on
with_env's RUST_LOG_JSON parsing (the only non-obvious bit — silent
fall-through on unknown values).
Per code-review feedback: hashi-telemetry was small enough that the extra
crate wasn't pulling its weight. Move the whole thing to
hashi_types::telemetry and delete the standalone crate.

- crates/hashi-telemetry/ → crates/hashi-types/src/telemetry.rs
- hashi-types now depends on tracing-subscriber (already transitively present
  in every workspace consumer except internal-tools, which already has it too)
- All 5 binaries drop `hashi-telemetry = ...` and swap
  `hashi_telemetry::TelemetryConfig` for `hashi_types::telemetry::TelemetryConfig`
- Docker Containerfiles for hashi and hashi-screener drop the extra
  COPY/stub-source steps for the removed crate
…drawals

Adds two IntGaugeVecs — hashi_deposit_request_confirmations and
hashi_withdrawal_tx_confirmations — each labelled by the status set
{not_found, mempool, 0, 1, 2, 3, 4, 5, 6_plus}. The Grafana dashboard can
now show the live distribution of in-flight items across Bitcoin
confirmation buckets, so it is obvious at a glance whether the queue is
moving even when the configured threshold is high enough that individual
items sit for an hour or more.

The new confirmation_metrics task wakes on every Kyoto block_height tick,
snapshots both queues from onchain state, and queries btc_monitor for each
txid in bounded parallelism (max 8 concurrent bitcoind RPCs). Runs on
every validator, not just the leader, so the dashboard survives leader
rotations. Gated on kyoto_synced == 1; skips the gauge write when any
query errored rather than flashing partial data.
@0xsiddharthks 0xsiddharthks force-pushed the siddharth/telemetry-2 branch from 4cf57df to 2ed9198 Compare April 15, 2026 01:08
@0xsiddharthks 0xsiddharthks force-pushed the siddharth/telemetry-1 branch from 16e1805 to e4a9c97 Compare April 15, 2026 19:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant