In the case of the connection to photon failing, need to define the recovery and visibility of the failure

Currently when the connected photon restarts, the following happens

* `Shutting down shared-route due to channel failure` - Muon Core detects the photon instance is no longer available
* `Error in subscription Connection lost to remote service, the channel has shut down due to a transport failure` - Newton drops the active replay
* `Subscribing to event stream 'newton-sample/Task' for full local replay` - attempts to reconnect to the streams
* `NewtonEvent subscription has ended, will attempt to reconnect in 5000ms` - backoff behaviour
* `Subscribing from index 26 to event stream saga-manager-newton-sample/Task 'newton-sample/Task'` - Connection succeeds, replay now continues.

The issue here is the gap between disconnect and reconnect for event emit. Currently in `MuonEventSourceRepository` EventClient.event() is used without checking the return value. This means that events can be emitted, but not fully persisted. 

Adding a check and fail on event failure will then expose a second issue, which is what to do in case of failure. If the failure occurs it takes a *max* of 1000ms (less in some circumstances) to be declared an error. If photon drops, it takes in the order of 5-7s for a full reconnect over the AMQP transport. As such, failure is fairly expensive.  When an event emit fails, should the event persist operation be retried, or should it simply fail?  Lastly, should the event protocol be updated to have a fallback SEDA mode to enable the transport to give further reliability.

Updates to muon-core will improve this, via client side load balancing (https://github.com/muoncore/muon-java/issues/62) making HA photon that much more reliable and enabling transparent failover.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

In the case of the connection to photon failing, need to define the recovery and visibility of the failure #61

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

In the case of the connection to photon failing, need to define the recovery and visibility of the failure #61

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions