Glob collections behavior on partial responses

I have a question on how clients are supposed to interpret glob collection responses from an xDS control plane. gRPC has a default message limit of 4MB, which can cause clients to reject a response from the control plane if it is too large. In practice, most glob collections will be small enough to fit in a single response, however, at LinkedIn, some clusters teeter over the edge of this limit during high load, which was causing some clients to simply reject the response. This is especially likely during startup since the clients may request multiple collections at once which can easily cross this size threshold. Because the limit is not trivial to raise (and there is no guarantee a single value will fit all usecases), our control plane implementation instead splits the response into multiple "chunks", each representing a subset of the collection, such that each response is smaller than 4MB. However, this raises the question of how the client should behave under such circumstances.

The spec does not dictate that the collection be sent as a whole every time (nor should it, for the reason listed above), but it also provides no way to mark the "end" of a collection or a means to provide the collection's size. This means in some extreme cases the client may receive only a very small subset of the collection on the initial response from the control plane. In this scenario, should the client:
1. Wait an arbitrary amount of time for the control plane to send the rest of the collection? In the case where the client already received everything, it could introduce unwanted latency.
2. Simply treat the contents of the response as the full collection, even if it is partial? This is equally bad since it could cause the client to send too much traffic to a subset of hosts if the collection is being used for LEDS.

There is no room in the protocol today to really communicate the size of the collection, and arguably it's something that would provide little to no purpose other than for this specific edge case. My suggestion would be to mimic the glob collection deletion notification, but in reverse. Here is what it would look like (following the example in [TP1](https://github.com/cncf/xds/blob/main/proposals/TP1-xds-transport-next.md#glob)):
1. Client requests `xdstp://some-authority/envoy.config.listener.v3.Listener/foo/*`.
2. Server responds with resources `[xdstp://some-authority/envoy.config.listener.v3.Listener/foo/bar, xdstp://some-authority/envoy.config.listener.v3.Listener/foo/baz, xdstp://some-authority/envoy.config.listener.v3.Listener/foo/*]`.

By adding the glob collection's name in the response, the control plane can signal to the client that it has sent everything. This serves to effectively bookend the response from the control plane. The client can subsequently wait for this "end-of-glob-collection" notification to unambiguously determine whether it has received every resource in the collection. The resource named after the collection would have to be null or some special value to prevent it from being interpreted as an actual member of the collection. This proposition could require some changes on clients, but this problem seems important to address as more systems leverage the xDS protocol.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Glob collections behavior on partial responses #99

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Glob collections behavior on partial responses #99

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions