Skip to content

Conversation

@fhanau
Copy link
Contributor

@fhanau fhanau commented Oct 21, 2025

Purpose:
This PR serves to perform two long-standing cleanup tasks in the STW implementation:

  1. Sending the SpanOpen event as soon as a span is opened instead of when it closes
  2. Getting rid of the CompleteSpan struct, which represents a full span but is something that won't be needed once SpanOpen is handled separately.

To implement this in a backwards-compatible way, we need to land it in two parts so that the old code path and the new code path are both supported until we have phased out the old version which doesn't have the APIs for handling SpanOpen separately.

For code that is workerd-only and thus never involved in RPC or that is solely on the RPC server side, we can already decompose function calls so that we don't need to implement the sape functionality twice. This needs to land alongside a downstream PR. A follow-up PR will actually invoke the code path to send SpanOpen first, get rid of CompleteSpan struct and perform a bunch of cleanup – see #6051.

Note that:

  • The internal tracing system will not be affected by these changes – we still propagate completed spans there. In the final version, this differentiation is implemented through differences in the SpanObserver implementations.
  • Some functions that are being added here won't actually be called just yet, that will change in the follow-up and in some cases they are already necessary based on backwards-compatibility.
  • Commit history still needs to be cleared up

@fhanau fhanau requested review from a team as code owners October 21, 2025 20:36
@fhanau fhanau force-pushed the felix/102125-stw-cleanup branch 2 times, most recently from 20b779f to b6452a3 Compare October 22, 2025 20:01
@fhanau fhanau requested a review from a team as a code owner October 22, 2025 20:01
@codspeed-hq
Copy link

codspeed-hq bot commented Oct 22, 2025

Merging this PR will not alter performance

✅ 70 untouched benchmarks
⏩ 129 skipped benchmarks1


Comparing felix/102125-stw-cleanup (cc51938) with main (26d1b59)

Open in CodSpeed

Footnotes

  1. 129 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@mar-cf
Copy link
Contributor

mar-cf commented Oct 28, 2025

Tests fail, but that just might be from being out of sync or ontop of something old?

A short PR description would help.

@fhanau
Copy link
Contributor Author

fhanau commented Dec 24, 2025

Closing this for now – delivering SpanOpen earlier would make it more difficult to implement renaming spans, which we may support in the future.

@fhanau fhanau closed this Dec 24, 2025
@danlapid
Copy link
Collaborator

We should definitely reopen this and send SpanOpen events when the spans open and not when they close.
That is a key goal of Streaming Tail Workers compared to Buffered Tail Workers, we should not lose sight of that.
The rename does not relate to this.
OTEL officially supports a UpdateName message (https://opentelemetry.io/docs/specs/otel/trace/api/#updatename) which can be emitted at any point between the SpanOpen and the SpanClose to rename the span.
That's what we should also build into the Streaming Tail Workers protocol.

@fhanau fhanau reopened this Dec 26, 2025
@fhanau fhanau marked this pull request as draft December 26, 2025 15:24
@fhanau fhanau force-pushed the felix/102125-stw-cleanup branch 3 times, most recently from 7db22e8 to c73e84d Compare February 9, 2026 19:19
@fhanau fhanau changed the title [o11y] Report SpanOpen event earlier EW-9372 EW-9455 [o11y] Report SpanOpen event earlier Feb 10, 2026
@fhanau fhanau force-pushed the felix/102125-stw-cleanup branch from 6d91a06 to 538c76c Compare February 10, 2026 15:07
@github-actions
Copy link

github-actions bot commented Feb 10, 2026

The generated output of @cloudflare/workers-types matches the snapshot in types/generated-snapshot 🎉

@codecov-commenter
Copy link

codecov-commenter commented Feb 10, 2026

Codecov Report

❌ Patch coverage is 31.13208% with 73 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.30%. Comparing base (d2c9058) to head (cc51938).
⚠️ Report is 7 commits behind head on main.

Files with missing lines Patch % Lines
src/workerd/io/tracer.c++ 37.28% 29 Missing and 8 partials ⚠️
src/workerd/io/trace.c++ 5.71% 33 Missing ⚠️
src/workerd/server/server.c++ 75.00% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5370      +/-   ##
==========================================
- Coverage   70.35%   70.30%   -0.06%     
==========================================
  Files         408      408              
  Lines      108651   108735      +84     
  Branches    17991    18007      +16     
==========================================
+ Hits        76444    76448       +4     
- Misses      21409    21482      +73     
- Partials    10798    10805       +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

This PR serves to perform two long-standing cleanup tasks in the STW
implementation:
1) Sending the SpanOpen event as soon as a span is opened instead of when it
   closes
2) Getting rid of the CompleteSpan struct, which represents a full span but is
   something that won't be needed once SpanOpen is handled separately.

To implement this in a backwards-compatible way, we need to land it in two parts
so that the old code path and the new code path are both supported until we have
phased out the old version which doesn't have the APIs for handling SpanOpen
separately.
@fhanau fhanau force-pushed the felix/102125-stw-cleanup branch from 538c76c to c5c817f Compare February 10, 2026 23:14
@fhanau
Copy link
Contributor Author

fhanau commented Feb 10, 2026

Codecov Report

❌ Patch coverage is 39.39394% with 80 lines in your changes missing coverage. Please review. ✅ Project coverage is 70.29%. Comparing base (d2c9058) to head (538c76c). ⚠️ Report is 1 commits behind head on main.

The coverage percentage appears lower than it should be here as some functions are only used upstream/not used at all until the follow-up PR. I'm convinced that our coverage doesn't actually get worse when taking that into account.

@fhanau fhanau force-pushed the felix/102125-stw-cleanup branch from c5c817f to cc51938 Compare February 10, 2026 23:35

// helper method for addSpan() implementations
void adjustSpanTime(tracing::CompleteSpan& span);
void adjustSpanTime(tracing::SpanEndData& span);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These will be deduplicated in the follow-up PR

//
// This should always be called exactly once per observer.
// This should always be called exactly once per observer at span completion time.
virtual void report(const Span& span) = 0;
Copy link
Contributor Author

@fhanau fhanau Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For user tracing, report() will be used to only transmit the span end data in the follow-up. For internal tracing, this will continue to be used to transmit the full span. reportStart() is not yet used and will only used in the user tracing framework (no-op otherwise).

kj::HashMap<kj::ConstString, tracing::Attribute::Value> tags;

// Convert CompleteSpan to SpanEndData
explicit SpanEndData(CompleteSpan&& span);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will also no longer be needed with the follow-up.

}
}

void BaseTracer::adjustSpanTime(tracing::SpanEndData& span) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is copied from BaseTracer::adjustSpanTime(tracing::CompleteSpan&), will be deduplicated in the follow-up PR. Note that the operationName variable is no longer available, but if we need to debug an error here we can still infer the operation using the spanId, which maps to the span's SpanOpen event which has the operationName.

# Representation of an event that indicates completion of a user span. This information is
# provided to the streaming tail worker in the Attributes and SpanClose events.

# TODO(cleanup): startTimeNs is merely used as a fallback timestamp, consider obsoleting it.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only used as a fallback timestamp in the case of errors, but @mar-cf urged me to still support it.

}
}

void SpanEndData::copyTo(rpc::SpanEndData::Builder builder) const {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll notice that this is also duplicated with CompleteSpan, but that will go away in the follow-up as discussed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants