Conversation
* Generic 500 for a client disconnect is not good for observability * Littering the code with GRPC error handling everywhere would not be great either Handle the specific cases of GRPC client disconnect and gateway timeout in the generic error handler (only when the current error code is 500). This should make it easier to find actual issues in logs Signed-off-by: Jussi Kukkonen <jkukkonen@google.com>
ERROR panic detected: write tcp 10.1.4.25:3000->35.191.75.48:57722: i/o timeout This looks scary in the logs but seems to be just a client connection issue: Rekor has not failed here, the client connection has just been lost. Make sure the middleware recoverer does not log client connection issues as errors. Signed-off-by: Jussi Kukkonen <jkukkonen@google.com>
Signed-off-by: Jussi Kukkonen <jkukkonen@google.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2775 +/- ##
===========================================
- Coverage 66.46% 26.20% -40.26%
===========================================
Files 92 191 +99
Lines 9258 20145 +10887
===========================================
- Hits 6153 5280 -873
- Misses 2359 14037 +11678
- Partials 746 828 +82
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
loosebazooka
left a comment
There was a problem hiding this comment.
yeah this is sorely needed going through the logs
| func handleRekorAPIError(params interface{}, code int, err error, message string, fields ...interface{}) middleware.Responder { | ||
| code = mapGRPCToHTTP(code, err) | ||
|
|
||
| if message == "" { |
There was a problem hiding this comment.
apparently 499s have no default message in the http.StatusText. So if we want to print Client Closed Context to logs, we would need to check if code is 499 and then set our own message.
| {http.StatusOK, status.Error(codes.Canceled, "context canceled"), http.StatusOK}, | ||
| {http.StatusInternalServerError, status.Error(codes.Canceled, "context canceled"), 499}, | ||
| {http.StatusInternalServerError, status.Error(codes.DeadlineExceeded, "deadline exceeded"), http.StatusGatewayTimeout}, | ||
| {http.StatusInternalServerError, status.Error(codes.DataLoss, "dataloss"), http.StatusInternalServerError}, |
There was a problem hiding this comment.
add test for wrapped errors?
{http.StatusInternalServerError, fmt.Errorf("outer: %w", status.Error(codes.Canceled, "ctx")), 499},
| log.ContextLogger(ctx).With(fields...).Errorf("panic detected: %v", rvr) | ||
| // Check if the panic is due to a connection issue: Don't log these | ||
| // cases as serious errors | ||
| isNetworkError := false |
There was a problem hiding this comment.
This is wading into "style" territory and I don't care either way, but we can compress this with a lambda or a separate function
isClientConnError := func(err error) bool {
var netErr net.Error
return go_errors.Is(err, io.EOF) || go_errors.Is(err, syscall.EPIPE) || go_errors.Is(err, syscall.ECONNRESET) || (go_errors.As(err, &netErr) && netErr.Timeout())
}
if err, ok := rvr.(error); ok && isClientConnError(err) {
log.ContextLogger(ctx).With(fields...).Debugf("client connection closed: %v", rvr)
} else {
log.ContextLogger(ctx).With(fields...).Errorf("panic detected: %v", rvr)
}
This is a potential fix for some observability issues:
These are separate issues, I can file separate PRs.
The recoverer fix is mostly from AI but makes sense to me and should handle the
panic detected: write tcp 10.1.4.25:3000->35.191.75.48:57722: i/o timeouterror that started this