feat: web socket health check (WPB-21876)#3837
Conversation
There was a problem hiding this comment.
Pull request overview
This PR implements a WebSocket health check mechanism to detect stale connections that appear active but have stopped receiving events. The solution tracks the timestamp of the last WebSocket event to enable detection of "zombie connections" in environments where network issues or load balancer disconnections can cause events to stop flowing without proper disconnection notification.
Changes:
- Added timestamp tracking in IncrementalSyncRepository with lastWebSocketEventInstant() and recordLastWebSocketEvent() methods
- Integrated timestamp recording in IncrementalSyncManager when processing LIVE WebSocket events
- Exposed GetLastWebSocketEventInstantUseCase in UserSessionScope for application-layer queries
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| logic/src/commonMain/kotlin/com/wire/kalium/logic/data/sync/IncrementalSyncRepository.kt | Adds timestamp tracking with @volatile field and methods to record/query last WebSocket event time |
| logic/src/commonMain/kotlin/com/wire/kalium/logic/sync/incremental/IncrementalSyncManager.kt | Integrates recordLastWebSocketEvent() call when transitioning to LIVE state |
| logic/src/commonMain/kotlin/com/wire/kalium/logic/feature/user/webSocketStatus/GetLastWebSocketEventInstantUseCase.kt | New use case interface and implementation to expose timestamp query functionality |
| logic/src/commonMain/kotlin/com/wire/kalium/logic/feature/UserSessionScope.kt | Exposes the new use case in the public API |
| logic/src/commonTest/kotlin/com/wire/kalium/logic/data/sync/IncrementalSyncRepositoryTest.kt | Comprehensive tests for timestamp tracking functionality |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
logic/src/commonMain/kotlin/com/wire/kalium/logic/data/sync/IncrementalSyncRepository.kt
Outdated
Show resolved
Hide resolved
logic/src/commonMain/kotlin/com/wire/kalium/logic/sync/incremental/IncrementalSyncManager.kt
Outdated
Show resolved
Hide resolved
| if (eventSource == EventSource.LIVE) { | ||
| incrementalSyncRepository.recordLastWebSocketEvent() | ||
| Uuid.random().toString() to Clock.System.now() | ||
| } else { | ||
| syncData | ||
| } |
There was a problem hiding this comment.
Consider adding a test in IncrementalSyncManagerTest to verify that recordLastWebSocketEvent is called when processing LIVE events. This would ensure the integration between IncrementalSyncManager and IncrementalSyncRepository works as expected. A test similar to the existing "givenSyncIsLive_whenWorkerEmitsSources_thenShouldResetBackoffForUserConfigSync" test would be appropriate.
|
| Branch | feat/socket-health-check |
| Testbed | ubuntu-latest |
⚠️ WARNING: No Threshold found!Without a Threshold, no Alerts will ever be generated.
Click here to create a new Threshold
For more information, see the Threshold documentation.
To only post results if a Threshold exists, set the--ci-only-thresholdsflag.
Click to view all benchmark results
| Benchmark | Latency | microseconds (µs) |
|---|---|---|
| com.wire.kalium.benchmarks.logic.CoreLogicBenchmark.createObjectInFiles | 📈 view plot | 675.48 µs |
| com.wire.kalium.benchmarks.logic.CoreLogicBenchmark.createObjectInMemory | 📈 view plot | 336,808.04 µs |
| com.wire.kalium.benchmarks.persistence.MessagesNoPragmaTuneBenchmark.messageInsertionBenchmark | 📈 view plot | 1,356,210.19 µs |
| com.wire.kalium.benchmarks.persistence.MessagesNoPragmaTuneBenchmark.queryMessagesBenchmark | 📈 view plot | 21,197.24 µs |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #3837 +/- ##
===========================================
+ Coverage 59.56% 59.59% +0.02%
===========================================
Files 1898 1899 +1
Lines 59227 59293 +66
Branches 6417 6421 +4
===========================================
+ Hits 35277 35334 +57
- Misses 21037 21043 +6
- Partials 2913 2916 +3
... and 7 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
f9cb4ee to
dfb27ec
Compare
dfb27ec to
f5c53da
Compare
|
| } | ||
| .filterIsInstance<EventStreamData.NewEvents>() | ||
| .collect { streamData -> | ||
| if (isLive) { |
There was a problem hiding this comment.
Do we need to record only when we are live or for each event we receive via socket?
I mean again the async notifications - what if there is a scenario that we only got some messages when we were offline - in that case when we connect the socket we do receive these messages as pending non-live events so it means that at this moment the socket is healthy, but with this if (isLive) we don't record that for these non-live events, so if we don't receive any live events after pending ones then the LastWebSocketEvent will not update and will be outdated. 🤔



https://wearezeta.atlassian.net/browse/WPB-21876
Issue
WebSocket connections can become stale - appearing connected while not actually receiving events. This "zombie connection" state is undetectable because the connection technically remains open, but the server has stopped sending events (e.g., due to network issues, server-side timeouts, or load balancer disconnections).
Solution
See wireapp/wire-android#4561 for application update.