Skip to content

[telemetry] Add structured logging with span-based tracing#453

Open
0xsiddharthks wants to merge 6 commits intomainfrom
siddharth/telemetry-1
Open

[telemetry] Add structured logging with span-based tracing#453
0xsiddharthks wants to merge 6 commits intomainfrom
siddharth/telemetry-1

Conversation

@0xsiddharthks
Copy link
Copy Markdown
Contributor

@0xsiddharthks 0xsiddharthks commented Apr 14, 2026

Add a new telemetry module in hashi-types .

additional changes:

  • tty vs JSON logging
    • using std::io::IsTerminal to detect stderr (terminal -> ttl , pipe -> JSON)
    • override possible via RUST_LOG_JSON
  • service identification
    • no longer requiring explicit service-name in the log body
    • using two mechanisms for service identification:
      • loki stream labels (in k8s)
      • target field on every event. (tracing subscriber emits the module path in the field)

Also add #[tracing::instrument] at the main unit-of-work boundaries:

  • gRPC handlers in bridge_service
  • Deposit / withdrawal validation paths
  • Every leader task (deposit, withdrawal approval / commitment / signing / confirmation fan-outs)
  • Bitcoin and Sui indexers
  • MPC run loops, handle_reconfig, and mpc::signing::sign
  • Every execute_* in sui_tx_executor — the generic execute records sui_digest on success so one span tree covers handler → validator → Sui submission

deployment changes:

MystenLabs/sui-operations#7626

Comment thread crates/hashi-telemetry/Cargo.toml Outdated
@0xsiddharthks 0xsiddharthks marked this pull request as ready for review April 15, 2026 01:04
Comment thread crates/hashi-types/src/telemetry.rs Outdated
Comment on lines +100 to +107
TelemetryGuard { _private: () }
}
}

#[must_use = "dropping the guard immediately will lose buffered log output"]
pub struct TelemetryGuard {
_private: (),
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have this guard? I don't think that its actually doing what this says it is doing.

Comment thread crates/e2e-tests/src/main.rs
Replaces the per-binary init_tracing_subscriber() boilerplate with a shared
hashi-telemetry crate that auto-detects the log format via std::io::IsTerminal
and honors RUST_LOG / RUST_LOG_JSON:

  - stderr is a terminal (local dev) → human format
  - stderr is a pipe (K8s pod)       → JSON (for Loki)
  - RUST_LOG_JSON=1 / =0             → hard override, wins over auto-detect

CI workflows that want human-readable output for pipe'd GitHub Actions runner
stderr set RUST_LOG_JSON=0 at the workflow env level. .github/workflows/ci.yml
does this for the hashi CI job.

Instruments the main unit-of-work boundaries across the daemon: the gRPC
handlers in bridge_service, deposit/withdrawal validation, every leader
task (deposit processing, withdrawal approval/commitment/signing/confirmation
fan-outs), the Bitcoin and Sui indexers, MPC run loops and handle_reconfig,
every execute_* method in sui_tx_executor (with sui_digest recorded on the
generic execute), and mpc::signing::sign so presig-reassignment logs carry
their parent span.

Service identification (hashi vs hashi-screener vs hashi-guardian vs
hashi-monitor) is done via Kubernetes pod labels already injected by
Promtail at ingest, not via a log-body field.
The big module-level bullet list, per-field doc comments, per-method doc
comments, and the inline NOTE block in init() were mostly describing what
the code already says. Keep one short module-level summary and one line on
with_env's RUST_LOG_JSON parsing (the only non-obvious bit — silent
fall-through on unknown values).
Per code-review feedback: hashi-telemetry was small enough that the extra
crate wasn't pulling its weight. Move the whole thing to
hashi_types::telemetry and delete the standalone crate.

- crates/hashi-telemetry/ → crates/hashi-types/src/telemetry.rs
- hashi-types now depends on tracing-subscriber (already transitively present
  in every workspace consumer except internal-tools, which already has it too)
- All 5 binaries drop `hashi-telemetry = ...` and swap
  `hashi_telemetry::TelemetryConfig` for `hashi_types::telemetry::TelemetryConfig`
- Docker Containerfiles for hashi and hashi-screener drop the extra
  COPY/stub-source steps for the removed crate
@0xsiddharthks 0xsiddharthks force-pushed the siddharth/telemetry-1 branch from 16e1805 to e4a9c97 Compare April 15, 2026 19:43
… main

The guard was a ZST marker with no Drop impl and no buffered I/O to
flush, so it could not do what its #[must_use] message claimed. The
hashi-localnet init sat inside cmd_start, which meant FaucetSui and
Deposit silently dropped their spans; lift verbose to a global clap
flag and initialize tracing at the top of main, matching every other
binary.
Comment thread crates/hashi-types/src/telemetry.rs
`tracing_subscriber::fmt::layer()` defaults its writer to stdout, so
log output was leaking onto stdout even though every other signal in
this module (TTY autodetect, ANSI colorization) is keyed off stderr.
Explicitly set the writer on both the JSON and TTY branches.
@0xsiddharthks 0xsiddharthks enabled auto-merge (squash) April 15, 2026 20:57
@0xsiddharthks 0xsiddharthks disabled auto-merge April 15, 2026 20:57
@0xsiddharthks 0xsiddharthks enabled auto-merge (squash) April 15, 2026 20:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants