Implement streaming report uploads #4272

jcjones · 2026-01-15T15:00:59Z

This change plumbs through asynchronous stream processing for the Report upload path, so that bytes incoming via Trillium's methods are decoded without buffering more than a few reports at a time, and are then processed as before, in whatever order, as fast as the executor can go. The resulting errors are collected and returned at the conclusion.

I've implemented an HTTP/1.1 test of this using raw chunked encoding.

Implements #4149.

Fixes #4149.

tgeoghegan · 2026-01-15T19:10:03Z

aggregator/src/aggregator.rs

+
+            select! {
+                // Poll for new reports from the stream (only if not eof)
+                stream_result = report_stream.next(), if !stream_eof => {


Does something bad happen if we call report_stream.next() when it's already yielded None? What I'm getting at is, can we avoid tracking stream_eof?

We'd need to fuse the stream at the end but I think I still need a status variable to be sure that both the stream is None and the futures.is_empty().

FWIW the private type inside async-stream-impl does implement FusedStream already, but this isn't documented. If it has yielded None already, it will keep doing so, leading to the select! macro wasting time by randomly polling it during some loop iterations.

Oooh, I think I see!

I fixed it in c4b0be1

aggregator/src/aggregator.rs

aggregator/src/aggregator/http_handlers/tests/report.rs

aggregator/src/aggregator/http_handlers.rs

aggregator/src/aggregator.rs

divergentdave · 2026-01-15T21:55:14Z

aggregator/src/aggregator.rs

+
+            select! {
+                // Poll for new reports from the stream (only if not eof)
+                stream_result = report_stream.next(), if !stream_eof => {


FWIW the private type inside async-stream-impl does implement FusedStream already, but this isn't documented. If it has yielded None already, it will keep doing so, leading to the select! macro wasting time by randomly polling it during some loop iterations.

aggregator/src/aggregator.rs

divergentdave · 2026-01-16T16:00:07Z

I removed the "aggregator-api" label, as that's for things in the janus_aggregator_api crate. (not the clearest name)

…n possible

jcjones · 2026-01-16T23:25:11Z

Moving this to draft because my changes aren't testing the way I expect -- with as-of-yet unpushed new tests.

aggregator/src/aggregator/http_handlers/tests/report.rs

tgeoghegan · 2026-01-21T20:36:20Z

aggregator/src/aggregator/http_handlers/tests/report.rs

+}
+
+/// These are all tests of the decode_reports_stream method in http_handlers.rs
+mod decode_reports_stream_tests {


This module could go in its own file but I don't mind it as-is.

divergentdave · 2026-01-21T21:50:43Z

aggregator/src/aggregator/http_handlers.rs

+                                // Consume bytes up to the point of failure (including metadata and
+                                // any successfully decoded fields). This allows the stream to continue
+                                // processing subsequent reports after a corrupted one.
+                                bytes_consumed = cursor.position() as usize;
+                                yield Err(decode_error);


I don't think we'll be able to continue decoding after encountering a codec error in one report. The reports are only delimited by their own internal sequence of length prefixes, and if decoding fails, the method leaves the cursor's position in an indeterminate position.

If we want to do better in this regard, we'd have to write a separate routine that peeks at the length prefixes and determines the total length of the report, or returns an error for an early EOF. If we did that before trying to decode a report, that would let us isolate the impact of a report containing an invalid encoding of a field element, for example.

Since we don't have an length covering the whole Report, I don't think we can even peek and figure this out in any way more meaningful than is currently happening, as we don't have bounds on any given element. To peek and figure out, say, the leader_encrypted_input_share length, we need to first determine the metadata.public_extensions and public_share lengths. Whether we do that while we're filling in the structures (as this change currently does) or beforehand doesn't change the failure scenario: We don't know we've a corrupt report until something doesn't match expectations.

All that to say, I agree with you that continuing from here will only work if the corrupted report is an early EOF that ends the entire report, not an early EOF in the middle of the report, and we cannot tell the difference.

On further consideration, it's best to just fail the stream and ensure that if at all possible we emit this CodecError for the failed report, assuming we could get the metadata out.

Yeah, we'd have to peek six times to get all the variable lengths.

Digging more, it's not actually possible to ever return a decoding error corresponding to a specific report ID, metadata or not, because all decoding errors are effectively fatal. The only thing we can detect going wrong during a decode is incorrect lengths, which are indistinguishable from the stream still being in-progress. The whole Result<Result<Report, ReportDecodeError>> is, I'm afraid, ineffective.

In 433c4da I did a partial revert of efa84bb so we're back to the whole-stream-aborts thing. I also adjusted the tests, and particularly changed upload_report_decode_failure (new name) to confirm that if you upload stream[Good Report, Undecodable Gibberish] that you get a stream error and the Good Report gets processed, per spec.

aggregator/src/aggregator.rs

aggregator/src/aggregator/http_handlers/tests/report.rs

divergentdave · 2026-01-22T00:30:39Z

aggregator/src/aggregator/http_handlers/tests/report.rs

+        // Create malformed data that fails to decode metadata
+        // Using an invalid fixed size value (all 0xFF) should cause early failure
+        let bad_data = vec![0xFF; 100];


I was surprised to see an input this long for an error in the ReportMetadata, but it turns out that the metadata includes a length prefix for the public extensions, and thus the 0xFFs lead to an EOF error.

aggregator/src/aggregator/http_handlers/tests/report.rs

This is a partial revert of efa84bb

jcjones · 2026-01-23T18:19:53Z

---- aggregator::upload_tests::upload_report_decode_failure stdout ----

thread 'aggregator::upload_tests::upload_report_decode_failure' (10859) panicked at aggregator/src/aggregator/upload_tests.rs:1101:6:
called `Result::unwrap()` on an `Err` value: Elapsed(())

Timing out on runtime_manager.wait_for_completed_tasks("aggregator", 1). Naturally, this test isn't brittle locally, I'm ... pondering.

jcjones · 2026-01-23T18:30:42Z

---- aggregator::upload_tests::upload_report_decode_failure stdout ----

thread 'aggregator::upload_tests::upload_report_decode_failure' (10859) panicked at aggregator/src/aggregator/upload_tests.rs:1101:6:
called `Result::unwrap()` on an `Err` value: Elapsed(())
Timing out on runtime_manager.wait_for_completed_tasks("aggregator", 1). Naturally, this test isn't brittle locally, I'm ... pondering.

oh, this is a failure to finish processing a future as a race. Ugh.

…rror.

jcjones · 2026-01-23T19:26:19Z

oh, this is a failure to finish processing a future as a race. Ugh.

70582cc fixes the race and makes the behavior predictable. I've re-run the test in a loop for the last 20 minutes without issue. I should have caught this before.

The only reason I see not to do this is if there's a prohibition in the protocol spec, but I don't see one.

Implement streaming report uploads

f1147e0

Fixes #4149.

jcjones added this to the draft-ietf-ppm-dap-16 milestone Jan 15, 2026

jcjones self-assigned this Jan 15, 2026

jcjones added the aggregator-api Issues relating to the aggregator API (/aggregator_api/) label Jan 15, 2026

jcjones marked this pull request as ready for review January 15, 2026 18:41

jcjones requested a review from a team as a code owner January 15, 2026 18:41

tgeoghegan reviewed Jan 15, 2026

View reviewed changes

divergentdave reviewed Jan 15, 2026

View reviewed changes

Simplify the select!

c4b0be1

divergentdave removed the aggregator-api Issues relating to the aggregator API (/aggregator_api/) label Jan 16, 2026

Gather and pass along report metadata upon Report::decode errors, whe…

efa84bb

…n possible

jcjones marked this pull request as draft January 16, 2026 23:25

jcjones added 5 commits January 16, 2026 17:36

Add decode failure tests

a80bed9

Fix decode state machine

fa39eb9

Add tests for decode_reports_stream

5e7b7fb

Add a chunk-reading test.

d4e1a95

Use VecDeque

9369735

jcjones marked this pull request as ready for review January 21, 2026 02:06

jcjones requested review from divergentdave and tgeoghegan January 21, 2026 02:06

tgeoghegan approved these changes Jan 21, 2026

View reviewed changes

jcjones added 4 commits January 21, 2026 14:08

Get rid of sleep in upload_client_http11_bulk

a7e91e1

unbreak test

85a30f9

try 100ms?

f6d6621

try 200ms?

0d3f4c7

divergentdave reviewed Jan 22, 2026

View reviewed changes

jcjones added 2 commits January 22, 2026 11:09

Really fix test, thanks David!

2388ef6

Switch to FuturesOrdered

01af4b4

jcjones added 2 commits January 22, 2026 11:39

Use report.encode

54e8fe5

Return to 'abort stream on decode error' behavior

433c4da

This is a partial revert of efa84bb

jcjones requested a review from divergentdave January 23, 2026 17:49

handle_upload_generic needs to resolve all futures before returning e…

70582cc

…rror.

Implement streaming report uploads #4272

Are you sure you want to change the base?

Implement streaming report uploads #4272

Uh oh!

Conversation

jcjones commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

divergentdave commented Jan 16, 2026

Uh oh!

jcjones commented Jan 16, 2026

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jcjones commented Jan 23, 2026

Uh oh!

jcjones commented Jan 23, 2026

Uh oh!

jcjones commented Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jcjones commented Jan 15, 2026 •

edited

Loading