Skip to content

Handle client connection issues better#2775

Open
jku wants to merge 3 commits intosigstore:mainfrom
jku:handle-client-closed-request
Open

Handle client connection issues better#2775
jku wants to merge 3 commits intosigstore:mainfrom
jku:handle-client-closed-request

Conversation

@jku
Copy link
Member

@jku jku commented Mar 13, 2026

This is a potential fix for some observability issues:

  • we return and log generic 500 errors for GRPC issues where we can reasonably use a better HTTP code (without bloating every callsite with grpc error handling)
  • Internally we log ERRORs for client connection issues: That seems unnecessary

These are separate issues, I can file separate PRs.

The recoverer fix is mostly from AI but makes sense to me and should handle the panic detected: write tcp 10.1.4.25:3000->35.191.75.48:57722: i/o timeout error that started this

jku added 2 commits March 13, 2026 16:06
* Generic 500 for a client disconnect is not good for observability
* Littering the code with GRPC error handling everywhere would not be
  great either

Handle the specific cases of GRPC client disconnect and gateway
timeout in the generic error handler (only when the current
error code is 500).

This should make it easier to find actual issues in logs

Signed-off-by: Jussi Kukkonen <jkukkonen@google.com>
  ERROR panic detected: write tcp 10.1.4.25:3000->35.191.75.48:57722: i/o timeout

This looks scary in the logs but seems to be just a client connection issue:
Rekor has not failed here, the client connection has just been lost.

Make sure the middleware recoverer does not log client connection issues
as errors.

Signed-off-by: Jussi Kukkonen <jkukkonen@google.com>
@jku jku requested a review from a team as a code owner March 13, 2026 14:22
Signed-off-by: Jussi Kukkonen <jkukkonen@google.com>
@codecov
Copy link

codecov bot commented Mar 13, 2026

Codecov Report

❌ Patch coverage is 54.16667% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 26.20%. Comparing base (488eb97) to head (07b7688).
⚠️ Report is 650 commits behind head on main.

Files with missing lines Patch % Lines
pkg/generated/restapi/configure_rekor_server.go 0.00% 10 Missing ⚠️
pkg/api/error.go 92.85% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #2775       +/-   ##
===========================================
- Coverage   66.46%   26.20%   -40.26%     
===========================================
  Files          92      191       +99     
  Lines        9258    20145    +10887     
===========================================
- Hits         6153     5280      -873     
- Misses       2359    14037    +11678     
- Partials      746      828       +82     
Flag Coverage Δ
e2etests 49.55% <28.57%> (+1.99%) ⬆️
unittests 16.75% <50.00%> (-30.93%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@loosebazooka loosebazooka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this is sorely needed going through the logs

func handleRekorAPIError(params interface{}, code int, err error, message string, fields ...interface{}) middleware.Responder {
code = mapGRPCToHTTP(code, err)

if message == "" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apparently 499s have no default message in the http.StatusText. So if we want to print Client Closed Context to logs, we would need to check if code is 499 and then set our own message.

{http.StatusOK, status.Error(codes.Canceled, "context canceled"), http.StatusOK},
{http.StatusInternalServerError, status.Error(codes.Canceled, "context canceled"), 499},
{http.StatusInternalServerError, status.Error(codes.DeadlineExceeded, "deadline exceeded"), http.StatusGatewayTimeout},
{http.StatusInternalServerError, status.Error(codes.DataLoss, "dataloss"), http.StatusInternalServerError},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add test for wrapped errors?

  {http.StatusInternalServerError, fmt.Errorf("outer: %w", status.Error(codes.Canceled, "ctx")), 499}, 

log.ContextLogger(ctx).With(fields...).Errorf("panic detected: %v", rvr)
// Check if the panic is due to a connection issue: Don't log these
// cases as serious errors
isNetworkError := false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wading into "style" territory and I don't care either way, but we can compress this with a lambda or a separate function

  isClientConnError := func(err error) bool {                                                                                                                                                                                                 
      var netErr net.Error                                                                                                                                                                                                                    
      return go_errors.Is(err, io.EOF) || go_errors.Is(err, syscall.EPIPE) || go_errors.Is(err, syscall.ECONNRESET) || (go_errors.As(err, &netErr) && netErr.Timeout())                                                                       
  }                                                                                                                                                                                                                                           
                  
  if err, ok := rvr.(error); ok && isClientConnError(err) {                                                                                                                                                                                   
      log.ContextLogger(ctx).With(fields...).Debugf("client connection closed: %v", rvr)
  } else {                                                                                                                                                                                                                                    
      log.ContextLogger(ctx).With(fields...).Errorf("panic detected: %v", rvr)
  }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants