-
Notifications
You must be signed in to change notification settings - Fork 17
Description
We're a bit stuck on some weird behaviour in our application.
We're using fs2-pubsub, with EmberClientwith http2 support to make gRPC requests to Google pubsub api.
I.e. Setup something like this
val httpClient = EmberClientBuilder
.default[F]
.withHttp2
.build
[...]
// passed in here
override def messages: Stream[F, AcknowledgeablePubSubMessage[F, UserOptStatus]] =
PubSubSubscriber
.grpc[F]
.projectId(config.subscriber.projectId)
.subscription(Subscription(subscription))
.uri(config.subscriber.uri)
.httpClient(client)
.retryPolicy(RetryPolicy(exponentialBackoff(1.minute, maxRetry = 3)))
.noErrorHandling // We had error logging here which was never triggered
.batchSize(config.subscriber.batchSize)
.maxLatency(config.subscriber.maxLatency)
.readMaxMessages(config.subscriber.readMaxMessages)
.readConcurrency(config.subscriber.readConcurrency)
.rawWe're seeing periodic buildup of un-acked messages
A newly started pod behaves properly for a while, but after 1-2 hours we see these logs
2024-10-11 01:06:54.215 opt-out-gateway pubsub.googleapis.com:443 Read - GoAwayidentifier=0, lastStreamId=581, errorCode=NoError, additionalDebugData=Some(ByteVector(7 bytes, 0x6d61785f616765)))
2024-10-11 01:06:54.216 opt-out-gateway pubsub.googleapis.com:443 Write - Ping.Ack
2024-10-11 01:06:54.217 opt-out-gateway pubsub.googleapis.com:443 Read - Ping
2024-10-11 01:06:54.217 opt-out-gateway HTTP/2.0 200 OK Headers(content-disposition: attachment, content-type: application/grpc, date: Fri, 11 Oct 2024 00:06:54 GMT) body=""|
2024-10-11 01:06:54.218 opt-out-qateway pubsub.googleapis.com:443 Write - Ping.Ack
2024-10-11 01:06:54.218 opt-out-gateway pubsub.googleapis.com:443 Write - Ping.Ack
2024-10-11 01:06.54.219 opt-out-gateway pubsub.googleapis.com:443 Read - GoAway(identifier=0, lastStreamId=381, errorCode=EnhanceYourCalm, additionalDebugData=Some(ByteVector(14 bytes, 0x746f6f5f6d616e795f70696e6773)))
2024-10-11 01:06:54.219 opt-out-qateway Connection pubsub.googleapis.com:443 readLoop Terminated with empty
2024-10-11 01:06:54.219 opt-out-gateway HTTP/1.1 GET http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token Headers(Metadata-Flavor: Google, Accept: application/json) body=*"
2024-10-11 01:06:54.220 opt-out-qateway writeLoop terminated
2024-10-15 13:06:06.805 HTTP/1.1 200 OK Headers(Content-Type: application/json, Metadata-Flavor: Google[...]
2024-10-15 13:06:06.808 HTTP/2.0 POST https://pubsub.googleapis.com/google.pubsub.v1.Subscriber/Pull He[...]
2024-10-15 13:06:37.161 Shutting Down Connection - RequestKey: http://metadata.google.internal
2024-10-15 13:06:37.582 Shutting Down Connection - RequestKey: http://metadata.google.internal
GoAway HTTP2 frames indicate the server intends to close the connection.
The first GoAway additionalDebugData decodes to max_age. The second GoAway message's additionalDebugData decodes to too_many_pings. Notice the two back-to-back Ping.Ack which triggers the GoAway: EnhanceYourCalm
Following these logs that pod will no longer process any new pubsub messages
Stream cancellation with reason: [java.util.concurrent.CancellationException: Received GoAway, cancelling: GoAway(identifier=0, lastStreamId=387, errorCode=EnhanceYourCalm, additionalDebugData=Some(ByteVector(14 bytes, 0x746f6f5f6d616e795f70696e6773)))]
We have (unsuccessfully) tried many things to fix this:
- Adjusting retries on the http4s client. This increases the number of times we see blocks of logs similar to above, but ultimately results in stream cancellation.
- Try using http only (no gRPC).
- Any many more small things like adjusting idle connection time in pool / pool size.
We're a bit confused, but it appears the issue may lie within the interaction of fs2-pubsub and http4s' EmberClient and how it handles HTTP2 connection lifecycle management.
