Skip to content

Conversation

@hardisty
Copy link

@hardisty hardisty commented May 23, 2021

Currently our router performance degrades significantly when jobs are finishing. After looking into gRPC keepalive settings, I believe one issue is that by default gRPC servers only permit 2 keepalive pings outside the context of data being exchanged. Our Python clients are sending keepalive pings every 5 minutes. Therefore, the servers automatically disconnect clients after 10 minutes. The client does not know this until it tries to make its next call, and errors result. There may be other issues as well, but I would like to test changing this one setting first to observe the effects of setting it.

Helpful documentation:
https://grpc.github.io/grpc-java/javadoc/io/grpc/netty/NettyServerBuilder.html
https://grpc.io/docs/what-is-grpc/core-concepts/#deadlines
grpc/grpc-java#7237
grpc/grpc#17667

@hardisty hardisty marked this pull request as draft May 24, 2021 00:04
@hardisty hardisty requested review from mark-idleman and michaz May 24, 2021 00:05
@hardisty
Copy link
Author

If changing this setting works as I hope it will, the other setting I am currently interested in would be something like
.keepAliveTime(20, TimeUnit.MINUTES)
so that the sever can free up connections that have gone stale. The docs say that the default is 2 hours, which seems too long for life in a K8s cluster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants