Skip to content

[HIGH] Load Test Distributed Webhook Circuit Breaker #828

@nickna

Description

@nickna

Context

Following the implementation of distributed webhook circuit breaker and metrics (#818), we need to validate that the system behaves correctly under production-like load with multiple instances.

Requirements

Create and execute load tests to verify:

  1. Circuit breaker opens after exactly 5 failures across all instances (not 5 per instance)
  2. Metrics are correctly aggregated from multiple instances
  3. No duplicate webhook deliveries occur
  4. Circuit transitions (Closed → Open → Half-Open → Closed) work correctly
  5. System handles Redis failures gracefully (fallback to in-memory)

Test Scenarios

Scenario 1: Multi-Instance Circuit Breaking

  • Run 4 instances of the Core API
  • Configure a test webhook endpoint that fails after 3 requests
  • Send 100 webhook events concurrently
  • Expected: Circuit opens after 5 total failures (not 20)

Scenario 2: Metrics Aggregation

  • Run 3 instances
  • Send 1000 webhook events spread across instances
  • Expected: Redis metrics show correct totals, not fragmented counts

Scenario 3: Circuit Recovery

  • Trigger circuit breaker to open
  • Wait for half-open transition (5 minutes or configured duration)
  • Send test request that succeeds
  • Expected: Circuit closes and normal operation resumes

Scenario 4: Redis Failure Handling

  • Run load test
  • Kill Redis mid-test
  • Expected: System continues operating with in-memory fallback

Implementation Suggestions

  • Use k6 or similar load testing tool
  • Create docker-compose setup with multiple Core API instances
  • Use WireMock or similar for controllable webhook endpoints
  • Monitor with Prometheus/Grafana if available

Success Criteria

  • Load test scripts created and documented
  • All scenarios pass with expected behavior
  • Performance metrics documented (throughput, latency)
  • Any issues found are addressed
  • Results shared with team

Priority

HIGH - Should be completed before next production deployment to ensure the distributed circuit breaker works correctly under load.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions