-
Notifications
You must be signed in to change notification settings - Fork 18
Bug: connection state report mismatch #53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
When the network switches (e.g., WiFi to cellular), the SDK restarts
while the VPN tunnel remains connected. Previously, the UI would show
"Disconnected" during this ~10 second window, confusing users who were
actually still protected by the VPN.
Changes:
- Add `isRestarting` field to StatusDetails for IPC communication
- Update iOS and tvOS PacketTunnelProvider to include restart status
- Update MainViewModel to track isRestarting from polled status
- Update CustomLottieView state machine to show "Reconnecting..." when
tunnel is up but SDK reports disconnected during a restart sequence
This fixes the status mismatch where users saw "Disconnected" when the
VPN tunnel was actually still running during network transitions.
The iOS app was showing incorrect connection states when the SDK lost connectivity to management/signal servers but the VPN tunnel was still active. Users would see "Disconnected" when they could still connect, or "Connected" when they couldn't. Root cause: The Go SDK's gRPC keepalive settings (30s ping, 10s timeout) are at the edge of cellular NAT timeout windows, causing frequent disconnections on mobile networks. The SDK would enter "Connecting" state but fail to recover. Changes: UI (CustomLottieView.swift): - Always show "Reconnecting..." when tunnel is up but SDK is not connected, regardless of isRestarting flag - Removed premature exit to "Disconnected" state during reconnection - This covers both iOS-initiated and SDK-initiated reconnections ConnectionListener.swift: - Don't reset isRestarting in onDisconnected() - wait for onConnected() - Prevents UI flashing "Disconnected" during network switch restarts PacketTunnelProvider.swift (iOS & tvOS): - Added stuck "Connecting" state detection with auto-recovery - Parameters: 90s threshold, max 3 attempts, 60s backoff - Checks network availability before attempting restart - Thread-safe restart flag to prevent concurrent restarts This is a client-side workaround for Go SDK keepalive issues on cellular networks (same behavior seen on Android).
Add early returns on animations if the network is unavailable (airplane mode)
Also lower the detection interval to 60 seconds
onConnected will now leave it to the restart completion handler
So that the UI won't report a wrong state when the TUN is still up
For networkUnavailable purposes; when app was closed with that value saved to UserPreferences, it would then report the wrong UI state when the app was reopened.
The stale value wrongly sets the UI to disconnected even after starting a new TUN successfully.
Also display "Reconnecting..." as well.
Version is bundled with NetBirdSDK 0.64.3
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the
✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@NetbirdNetworkExtension/PacketTunnelProvider.swift`:
- Around line 30-50: The stuck-state tracking fields (stuckStateStartTime,
stuckRestartAttempts, lastStuckRestartTime) are not thread-safe because they’re
read from getStatus/handleAppMessage and modified from
monitorQueue/restartClient; add a dedicated lock (e.g., stuckStateLock =
NSLock()) and guard all reads/writes to these three vars with
stuckStateLock.lock()/unlock() (including inside checkAndRecoverFromStuckState,
restartClient, and any access in getStatus/handleAppMessage), keeping
isRestartInProgress lock as-is; alternatively you can dispatch all accesses to
monitorQueue instead—pick one approach and apply it consistently to those
functions to ensure serialization.
In `@NetBirdTVNetworkExtension/PacketTunnelProvider.swift`:
- Around line 43-50: The tvOS PacketTunnelProvider uses a plain Bool
isRestartInProgress which can be modified in restartClient() and read in
getStatus(), causing potential data races; change isRestartInProgress to a
thread-safe property (mirror the iOS pattern) by introducing a private lock
(e.g., DispatchQueue or NSLock) and exposing a computed property
isRestartInProgress that reads/writes under the lock, then update all references
in restartClient() and getStatus() to use the new computed property so all
access is serialized and race-free.
🧹 Nitpick comments (3)
NetBirdTVNetworkExtension/PacketTunnelProvider.swift (2)
213-232: Consider adding a restart timeout like iOS implementation.The iOS version (lines 249-256 of
NetbirdNetworkExtension/PacketTunnelProvider.swift) includes a 30-second timeout to reset flags if the restart hangs. The tvOS version doesn't have this safeguard, which could leave the extension stuck ifadapter?.startnever completes.⏱️ Suggested timeout addition
func restartClient() { guard !isRestartInProgress else { logger.info("restartClient: skipping - restart already in progress") return } isRestartInProgress = true adapter?.isRestarting = true + // Timeout after 30 seconds to reset flags if restart hangs + let timeoutWorkItem = DispatchWorkItem { [weak self] in + guard let self = self, self.isRestartInProgress else { return } + logger.info("restartClient: timeout - resetting flags") + self.adapter?.isRestarting = false + self.isRestartInProgress = false + } + monitorQueue.asyncAfter(deadline: .now() + 30, execute: timeoutWorkItem) + logger.info("restartClient: Restarting client due to network change") adapter?.stop() adapter?.start { [weak self] error in + timeoutWorkItem.cancel() self?.isRestartInProgress = false self?.adapter?.isRestarting = false
273-277: tvOS silently gives up after max retries, unlike iOS which tears down the tunnel.On iOS (line 326-336), after max restart attempts, the tunnel is canceled with an error so the user knows to reconnect manually. On tvOS, the code just logs and returns, leaving the tunnel in a potentially broken state without user feedback.
Consider whether tvOS should also cancel the tunnel or at least provide user feedback.
NetBird/Source/App/Views/Components/CustomLottieView.swift (1)
6-14: Consider documenting theisRestartingbinding's purpose in a comment.The
isRestartingbinding is used as a trigger for state re-evaluation (line 50) rather than being directly checked in conditionals. While this pattern is valid and the behavior is documented in comments at lines 227-228, adding a brief doc comment at the property declaration would help future maintainers understand this design choice.`@Binding` var networkUnavailable: Bool + /// Used to trigger UI state re-evaluation when restart begins/ends. + /// Actual reconnecting behavior is determined by extensionStatus/engineStatus combination. `@Binding` var isRestarting: Bool
|
|
||
| /// Stuck state detection - for auto-recovery when SDK fails to reconnect | ||
| /// Tracks both "Connecting" and "Disconnected" states when tunnel is up | ||
| private var stuckStateStartTime: Date? | ||
| private var stuckRestartAttempts: Int = 0 | ||
| private var lastStuckRestartTime: Date? | ||
| private let stuckStateThreshold: TimeInterval = 60 // seconds before considering "stuck" | ||
| private let maxStuckRestartAttempts: Int = 3 | ||
| private let minTimeBetweenStuckRestarts: TimeInterval = 60 // minimum backoff between attempts | ||
|
|
||
| /// Thread-safe restart progress flag | ||
| private let restartLock = NSLock() | ||
| private var _isRestartInProgress = false | ||
| private var isRestartInProgress: Bool { | ||
| get { restartLock.lock(); defer { restartLock.unlock() }; return _isRestartInProgress } | ||
| set { restartLock.lock(); defer { restartLock.unlock() }; _isRestartInProgress = newValue } | ||
| } | ||
|
|
||
| /// Flag to indicate tunnel is being intentionally stopped (user disconnect) | ||
| /// Prevents stuck recovery from triggering during shutdown | ||
| private var isTunnelStopping = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thread-safety concern with stuck-state tracking fields.
The fields stuckStateStartTime, stuckRestartAttempts, and lastStuckRestartTime are accessed from getStatus (which runs on an unspecified queue via handleAppMessage) but also modified from monitorQueue via restartClient. Only isRestartInProgress is protected with a lock.
Consider either:
- Protecting these fields with a lock (similar to
isRestartInProgress) - Dispatching
checkAndRecoverFromStuckStatetomonitorQueueto ensure all accesses are serialized
🔒 Option 1: Add lock protection
+ private let stuckStateLock = NSLock()
private var stuckStateStartTime: Date?
private var stuckRestartAttempts: Int = 0
private var lastStuckRestartTime: Date?Then wrap accesses in stuckStateLock.lock()/unlock().
🤖 Prompt for AI Agents
In `@NetbirdNetworkExtension/PacketTunnelProvider.swift` around lines 30 - 50, The
stuck-state tracking fields (stuckStateStartTime, stuckRestartAttempts,
lastStuckRestartTime) are not thread-safe because they’re read from
getStatus/handleAppMessage and modified from monitorQueue/restartClient; add a
dedicated lock (e.g., stuckStateLock = NSLock()) and guard all reads/writes to
these three vars with stuckStateLock.lock()/unlock() (including inside
checkAndRecoverFromStuckState, restartClient, and any access in
getStatus/handleAppMessage), keeping isRestartInProgress lock as-is;
alternatively you can dispatch all accesses to monitorQueue instead—pick one
approach and apply it consistently to those functions to ensure serialization.
| /// Stuck "Connecting" state detection - for auto-recovery when SDK fails to reconnect | ||
| private var connectingStartTime: Date? | ||
| private var stuckRestartAttempts: Int = 0 | ||
| private var lastStuckRestartTime: Date? | ||
| private let stuckConnectingThreshold: TimeInterval = 90 // seconds before considering "stuck" | ||
| private let maxStuckRestartAttempts: Int = 3 | ||
| private let minTimeBetweenStuckRestarts: TimeInterval = 60 // minimum backoff between attempts | ||
| private var isRestartInProgress: Bool = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isRestartInProgress lacks thread-safety on tvOS.
Unlike the iOS implementation which uses a lock-protected computed property, tvOS uses a simple Bool. Since restartClient() modifies this flag while getStatus() reads it (potentially from different threads), this could cause data races.
🔒 Suggested fix for thread safety
+ private let restartLock = NSLock()
+ private var _isRestartInProgress: Bool = false
- private var isRestartInProgress: Bool = false
+ private var isRestartInProgress: Bool {
+ get { restartLock.lock(); defer { restartLock.unlock() }; return _isRestartInProgress }
+ set { restartLock.lock(); defer { restartLock.unlock() }; _isRestartInProgress = newValue }
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| /// Stuck "Connecting" state detection - for auto-recovery when SDK fails to reconnect | |
| private var connectingStartTime: Date? | |
| private var stuckRestartAttempts: Int = 0 | |
| private var lastStuckRestartTime: Date? | |
| private let stuckConnectingThreshold: TimeInterval = 90 // seconds before considering "stuck" | |
| private let maxStuckRestartAttempts: Int = 3 | |
| private let minTimeBetweenStuckRestarts: TimeInterval = 60 // minimum backoff between attempts | |
| private var isRestartInProgress: Bool = false | |
| /// Stuck "Connecting" state detection - for auto-recovery when SDK fails to reconnect | |
| private var connectingStartTime: Date? | |
| private var stuckRestartAttempts: Int = 0 | |
| private var lastStuckRestartTime: Date? | |
| private let stuckConnectingThreshold: TimeInterval = 90 // seconds before considering "stuck" | |
| private let maxStuckRestartAttempts: Int = 3 | |
| private let minTimeBetweenStuckRestarts: TimeInterval = 60 // minimum backoff between attempts | |
| private let restartLock = NSLock() | |
| private var _isRestartInProgress: Bool = false | |
| private var isRestartInProgress: Bool { | |
| get { restartLock.lock(); defer { restartLock.unlock() }; return _isRestartInProgress } | |
| set { restartLock.lock(); defer { restartLock.unlock() }; _isRestartInProgress = newValue } | |
| } |
🤖 Prompt for AI Agents
In `@NetBirdTVNetworkExtension/PacketTunnelProvider.swift` around lines 43 - 50,
The tvOS PacketTunnelProvider uses a plain Bool isRestartInProgress which can be
modified in restartClient() and read in getStatus(), causing potential data
races; change isRestartInProgress to a thread-safe property (mirror the iOS
pattern) by introducing a private lock (e.g., DispatchQueue or NSLock) and
exposing a computed property isRestartInProgress that reads/writes under the
lock, then update all references in restartClient() and getStatus() to use the
new computed property so all access is serialized and race-free.
UI is still getting stuck on iOS when switching between network types
This PR addresses it by ignoring certain state updates when a restart is triggered, so that it doesn't wrongly report that the app is disconnected while the VPN Tunnel is still up. Restarting only triggers the SDK engine stop and start mechanisms.
It also adds mechanisms to automatically restart the engine if the app is stuck in connecting or disconnected states if the VPN tunnel has already been established, for a maximum of three times; if still stuck after that, it will fully tear down the VPN and require user to manually reconnect.
This also addresses the networkUnavailable flag set in UserDefaults that was left stale if the app closed with airplane mode toggled on, or when a VPN tunnel is being created. This flag's value will be ignored when the app is opened AND there's no current VPN tunnel, and unset when a new tunnel is established.
Summary by CodeRabbit
Release Notes – Version 0.0.18
New Features
Bug Fixes
✏️ Tip: You can customize this high-level summary in your review settings.