Skip to content

GoAway Frames Leading to App Errors #544

@sam0jones0

Description

@sam0jones0

We're a bit stuck on some weird behaviour in our application.

We're using fs2-pubsub, with EmberClientwith http2 support to make gRPC requests to Google pubsub api.

I.e. Setup something like this

val httpClient = EmberClientBuilder
        .default[F]
        .withHttp2
        .build

[...]

// passed in here

    override def messages: Stream[F, AcknowledgeablePubSubMessage[F, UserOptStatus]] =
      PubSubSubscriber
        .grpc[F]
        .projectId(config.subscriber.projectId)
        .subscription(Subscription(subscription))
        .uri(config.subscriber.uri)
        .httpClient(client)
        .retryPolicy(RetryPolicy(exponentialBackoff(1.minute, maxRetry = 3)))
        .noErrorHandling  // We had error logging here which was never triggered
        .batchSize(config.subscriber.batchSize)
        .maxLatency(config.subscriber.maxLatency)
        .readMaxMessages(config.subscriber.readMaxMessages)
        .readConcurrency(config.subscriber.readConcurrency)
        .raw

We're seeing periodic buildup of un-acked messages

Screenshot 2024-10-15 at 15 11 05

A newly started pod behaves properly for a while, but after 1-2 hours we see these logs

2024-10-11 01:06:54.215 opt-out-gateway pubsub.googleapis.com:443 Read - GoAwayidentifier=0, lastStreamId=581, errorCode=NoError, additionalDebugData=Some(ByteVector(7 bytes, 0x6d61785f616765)))
2024-10-11 01:06:54.216 opt-out-gateway pubsub.googleapis.com:443 Write - Ping.Ack
2024-10-11 01:06:54.217 opt-out-gateway pubsub.googleapis.com:443 Read - Ping
2024-10-11 01:06:54.217 opt-out-gateway HTTP/2.0 200 OK Headers(content-disposition: attachment, content-type: application/grpc, date: Fri, 11 Oct 2024 00:06:54 GMT) body=""|
2024-10-11 01:06:54.218 opt-out-qateway pubsub.googleapis.com:443 Write - Ping.Ack
2024-10-11 01:06:54.218 opt-out-gateway pubsub.googleapis.com:443 Write - Ping.Ack
2024-10-11 01:06.54.219 opt-out-gateway pubsub.googleapis.com:443 Read - GoAway(identifier=0, lastStreamId=381, errorCode=EnhanceYourCalm, additionalDebugData=Some(ByteVector(14 bytes, 0x746f6f5f6d616e795f70696e6773)))
2024-10-11 01:06:54.219 opt-out-qateway Connection pubsub.googleapis.com:443 readLoop Terminated with empty
2024-10-11 01:06:54.219 opt-out-gateway HTTP/1.1 GET http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token Headers(Metadata-Flavor: Google, Accept: application/json) body=*"
2024-10-11 01:06:54.220 opt-out-qateway writeLoop terminated
2024-10-15 13:06:06.805 HTTP/1.1 200 OK Headers(Content-Type: application/json, Metadata-Flavor: Google[...]
2024-10-15 13:06:06.808 HTTP/2.0 POST https://pubsub.googleapis.com/google.pubsub.v1.Subscriber/Pull He[...]
2024-10-15 13:06:37.161 Shutting Down Connection - RequestKey: http://metadata.google.internal
2024-10-15 13:06:37.582 Shutting Down Connection - RequestKey: http://metadata.google.internal

GoAway HTTP2 frames indicate the server intends to close the connection.

The first GoAway additionalDebugData decodes to max_age. The second GoAway message's additionalDebugData decodes to too_many_pings. Notice the two back-to-back Ping.Ack which triggers the GoAway: EnhanceYourCalm

Following these logs that pod will no longer process any new pubsub messages

Stream cancellation with reason: [java.util.concurrent.CancellationException: Received GoAway, cancelling: GoAway(identifier=0, lastStreamId=387, errorCode=EnhanceYourCalm, additionalDebugData=Some(ByteVector(14 bytes, 0x746f6f5f6d616e795f70696e6773)))]

We have (unsuccessfully) tried many things to fix this:

  • Adjusting retries on the http4s client. This increases the number of times we see blocks of logs similar to above, but ultimately results in stream cancellation.
  • Try using http only (no gRPC).
  • Any many more small things like adjusting idle connection time in pool / pool size.

We're a bit confused, but it appears the issue may lie within the interaction of fs2-pubsub and http4s' EmberClient and how it handles HTTP2 connection lifecycle management.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions