Skip to content

Allow circuit to close even when open with failures < threshold#7

Open
njbennett wants to merge 1 commit intoahawkins:masterfrom
vmware-archive:concurrency_fix
Open

Allow circuit to close even when open with failures < threshold#7
njbennett wants to merge 1 commit intoahawkins:masterfrom
vmware-archive:concurrency_fix

Conversation

@njbennett
Copy link
Copy Markdown

  • We observed an issue recently where the circuit breakers for our
    application got stuck open and had to be manually closed by editing
    the fuse database.
  • We believe that the issue was that multiple threads moving through the
    breaker simultaneously triggered a race condition, where the breaker
    recorded a failure as it was also opening the fuse. This caused the fuse
    to get into a state where it was open with failures below the failure
    threshhold.
  • At this point (as demonstrated by the test we've added) the circuit
    will stay closed forever: 'tripped' will always return 'false'
    because failure count is below threshold, so it will never enter
    half-open state and allow a successful test request to close the circuit.
  • By also sending test requests when the circuit is open but not tripped
    (which we think should only ever happen in this error state) the
    circuits will be able to close again once the system that they guard
    against returns to normal, even if request volume during an outage is
    high enough to put them into this state.

Signed-off-by: Natalie Bennett nbennett@pivotal.io
Signed-off-by: Tom Viehman tviehman@pivotal.io

- We observed an issue recently where the circuit breakers for our
  application got stuck open and had to be manually closed by editing
  the fuse database.
- We believe that the issue was that multiple threads moving through the
  breaker simultaneously triggered a race condition, where the breaker
  recorded a failure as it was also opening the fuse. This caused the fuse
  to get into a state where it was open with failures below the failure
  threshhold.
- At this point (as demonstrated by the test we've added) the circuit
  will stay closed forever: 'tripped' will always return 'false'
  because failure count is below threshold, so it will never enter
  half-open state and allow a successful test request to close the circuit.
- By also sending test requests when the circuit is open but not tripped
  (which we think should only ever happen in this error state) the
  circuits will be able to close again once the system that they guard
  against returns to normal, even if request volume during an outage is
  high enough to put them into this state.

Signed-off-by: Natalie Bennett <nbennett@pivotal.io>
Signed-off-by: Tom Viehman <tviehman@pivotal.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant