You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Jun 2, 2025. It is now read-only.
Currently when the connected photon restarts, the following happens
Shutting down shared-route due to channel failure - Muon Core detects the photon instance is no longer available
Error in subscription Connection lost to remote service, the channel has shut down due to a transport failure - Newton drops the active replay
Subscribing to event stream 'newton-sample/Task' for full local replay - attempts to reconnect to the streams
NewtonEvent subscription has ended, will attempt to reconnect in 5000ms - backoff behaviour
Subscribing from index 26 to event stream saga-manager-newton-sample/Task 'newton-sample/Task' - Connection succeeds, replay now continues.
The issue here is the gap between disconnect and reconnect for event emit. Currently in MuonEventSourceRepository EventClient.event() is used without checking the return value. This means that events can be emitted, but not fully persisted.
Adding a check and fail on event failure will then expose a second issue, which is what to do in case of failure. If the failure occurs it takes a max of 1000ms (less in some circumstances) to be declared an error. If photon drops, it takes in the order of 5-7s for a full reconnect over the AMQP transport. As such, failure is fairly expensive. When an event emit fails, should the event persist operation be retried, or should it simply fail? Lastly, should the event protocol be updated to have a fallback SEDA mode to enable the transport to give further reliability.
Updates to muon-core will improve this, via client side load balancing (muoncore/muon-java#62) making HA photon that much more reliable and enabling transparent failover.