Skip to content

Sporadic 500 errors on /statement route in production #370

@anthonybailey

Description

@anthonybailey

Problem

Sporadic 500 Internal Server Error responses for the /statement route in production, despite the route working correctly most of the time.

Investigation Findings

Local Reproduction Attempt

  • Environment: Main branch, en-only mode (PARAGLIDE_LOCALES=en)
  • Test method: 30+ concurrent requests to /statement route via netlify serve
  • Result: All requests returned 200 status codes

Root Cause Analysis

The /statement route has graceful error handling that masks most API failures:

  1. API Endpoint Behavior (/api/signatories):

    try {
      // Fetch from Airtable API
      const records = await fetchAllPages(fetch, url)
      // Process and return real data
    } catch (e) {
      console.error('Error fetching signatories:', e)
      // Always returns HTTP 200 with fallback data
      return json({
        signatories: fallbackSignatories,
        totalCount: 0
      })
    }
  2. Page Loader (+page.ts):

    const response = await fetch('api/signatories');
    // No error handling - assumes API always returns 200
    const { signatories, totalCount } = await response.json();

Key Insight

Why we see 200s despite errors: The API endpoint catches Airtable failures and returns HTTP 200 with placeholder data ("Error", "This should be" signatories) instead of throwing 500 errors.

Performance Observations

  • Without Airtable API key: Sub-second responses (fallback data)
  • With Airtable API key: 1-4 second responses (real API calls)
  • Implication: Airtable API latency could contribute to timeout-related 500s

Likely Causes of Production 500s

Since our local testing with graceful degradation didn't reproduce 500s, the production failures likely occur from:

  1. Infrastructure-level failures:

    • Edge function timeouts before reaching API endpoint
    • Netlify platform issues
    • Memory/resource exhaustion
  2. Race conditions under high load:

    • Concurrent requests overwhelming edge function
    • Airtable API rate limiting causing different error paths
  3. Unhandled JavaScript errors:

    • Runtime errors that bypass the try/catch in API endpoint
    • Edge function bundling/execution issues
  4. Network/timeout issues:

    • Airtable API taking >4 seconds (production timeout threshold)
    • DNS resolution failures

Next Steps (Claude's suggestions - not yet endorsed)

  1. Add error handling to page loader:

    try {
      const response = await fetch('api/signatories');
      if (\!response.ok) throw new Error(`API error: ${response.status}`);
      const data = await response.json();
    } catch (e) {
      // Handle API failures gracefully
    }
  2. Monitor edge function logs in production for specific error patterns

  3. Consider caching strategy for signatory data to reduce Airtable API dependency

  4. Add timeout handling for slow Airtable API responses

Environment Details

  • Branch tested: main
  • Local tool: netlify serve
  • Configuration: English-only mode
  • Airtable access: Read-only API key configured

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions