Skip to content

Conversation

@prestonvasquez
Copy link
Member

@prestonvasquez prestonvasquez commented Jan 27, 2026

DRIVERS-3269

Please complete the following before merging:

  • Is the relevant DRIVERS ticket in the PR title?
  • Update changelog.
  • Test changes in at least one language driver.
  • Test these changes against all server versions and topologies (including standalone, replica set, and sharded
    clusters).

@prestonvasquez prestonvasquez marked this pull request as ready for review January 28, 2026 17:33
@prestonvasquez prestonvasquez requested a review from a team as a code owner January 28, 2026 17:33
@prestonvasquez prestonvasquez changed the title DRIVERS-3269 Move aggregation tests out of tailable awaidData cursor DRIVERS-3269 Move aggregation tests out of tailable awaitData cursor Jan 28, 2026
- client: *client
events: []

- description: "error on aggregate if maxAwaitTimeMS is greater than timeoutMS"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no client-side validation that would produce an error here: https://github.com/mongodb/specifications/blob/master/source/client-side-operations-timeout/client-side-operations-timeout.md#non-tailable-cursors

The validation of maxAwaitTimeMS vs timeoutMS only comes into play when the cursor is tailable + awaitData

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Crud spec says:

This options only applies to aggregations which return a TAILABLE_AWAIT cursor. Drivers SHOULD always send this value, if the cursor is not a TAILABLE_AWAIT cursor the server will ignore it.

It sounds like we should not validate the values in case of non-tailable cursors. Should we revert the expected result to be succeeded?

Copy link
Member Author

@prestonvasquez prestonvasquez Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sanych-sun @baileympearson

It sounds like we should not validate the values in case of non-tailable cursors. Should we revert the expected result to be succeeded?

The server will reject getMore w/ maxTimeMS for non-awaitData cursors. Notably, the only team to have synced this is the Java Driver:

Looks like in Java we pass the test because its a client validation on the aggregation iterable which triggers when setting maxAwaitTimeMS.

Node skips, but claims

We have the same validation [as Java] [...] but it doesn't trigger unless awaitData is provided.

I think these tests are valid. Without upfront validation, this could happen:

  1. User creates aggregate cursor with maxAwaitTimeMS
  2. First batch returns fine
  3. N-th call needs getMore
  4. getMore fails with confusing server error mid-iteration (DRIVERS-2868)
  5. User has already processed some documents

This fits the mantra exception: "Exceptions should be rare: for cases where the server might not error and correctness is at stake." The server does error, but it errors mid-iteration, which is arguably a correctness/UX issue. Failing fast is probably better than failing halfway through.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand your proposed scenario, a getMore throws an error because it includes maxTimeMS. But attaching maxTimeMS to getMore is already prohibited by the CSOT spec:

Drivers MUST NOT append a maxTimeMS field to getMore commands.

To clarify my comment about Node having this validation: I still do not understand why this test passes in Java (this seems like a bug imo). The spec is pretty clear this validation is only relevant for tailable awaitData cursors

Copy link
Member Author

@prestonvasquez prestonvasquez Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC the above was the argument made in a CSOT triage and I currently don't have the bandwidth to determine if the spec is mistaken / there's a gap in rationale. More specifically, perhaps it was to avoid checking pipeline stages for the $changeStream cases and just do a catch-all validation for simplicity, which IMO is arguably reasonable.

Notably, the part of the spec you quoted is sort of ambiguous-y to me in the sense that you can use maxAwaitTimeMS for aggregate using a $changeStream pipeline. I understand that's tailable but you'd have to inspect client input to realize that. I'm not sure that's a reasonable driver-level requirement. The alternatives are a high-level catch-all validation or accepting the pathological case above.

If this isn't satisfactory, suggest we remove these tests as an amendment to DRIVERS-3092, closing DRIVERS-3269 as "won't do". I don't think we should impress opinions on not doing client-level validations for driver team. I can open a separate drivers ticket to clarify why the pathological case is legitimate.


Edit: differentiate possible motives
Edit: clarify that it's reasonable to do a catch-all
Edit: Clean up intention
Edit: Remove confusing comment about $out / $merge
Edit: point out spec wrongness

Copy link
Member Author

@prestonvasquez prestonvasquez Jan 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants