Skip to content

Conversation

@doromaraujo
Copy link
Contributor

@doromaraujo doromaraujo commented Jan 31, 2026

UI is still getting stuck on iOS when switching between network types

This PR addresses it by ignoring certain state updates when a restart is triggered, so that it doesn't wrongly report that the app is disconnected while the VPN Tunnel is still up. Restarting only triggers the SDK engine stop and start mechanisms.

It also adds mechanisms to automatically restart the engine if the app is stuck in connecting or disconnected states if the VPN tunnel has already been established, for a maximum of three times; if still stuck after that, it will fully tear down the VPN and require user to manually reconnect.

This also addresses the networkUnavailable flag set in UserDefaults that was left stale if the app closed with airplane mode toggled on, or when a VPN tunnel is being created. This flag's value will be ignored when the app is opened AND there's no current VPN tunnel, and unset when a new tunnel is established.

Summary by CodeRabbit

Release Notes – Version 0.0.18

  • New Features

    • Added "Reconnecting..." status indicator during SDK restart scenarios for clearer user feedback.
  • Bug Fixes

    • Improved network transition handling to enable faster reconnection when network becomes available.
    • Implemented automatic recovery mechanism to detect and resolve stuck connection states.
    • Enhanced UI state transitions for smoother behavior during network changes.

✏️ Tip: You can customize this high-level summary in your review settings.

When the network switches (e.g., WiFi to cellular), the SDK restarts
while the VPN tunnel remains connected. Previously, the UI would show
"Disconnected" during this ~10 second window, confusing users who were
actually still protected by the VPN.

Changes:
  - Add `isRestarting` field to StatusDetails for IPC communication
  - Update iOS and tvOS PacketTunnelProvider to include restart status
  - Update MainViewModel to track isRestarting from polled status
  - Update CustomLottieView state machine to show "Reconnecting..." when
    tunnel is up but SDK reports disconnected during a restart sequence

This fixes the status mismatch where users saw "Disconnected" when the
VPN tunnel was actually still running during network transitions.
The iOS app was showing incorrect connection states when the SDK
lost connectivity to management/signal servers but the VPN tunnel
was still active. Users would see "Disconnected" when they could
still connect, or "Connected" when they couldn't.

Root cause: The Go SDK's gRPC keepalive settings (30s ping, 10s
timeout) are at the edge of cellular NAT timeout windows, causing
frequent disconnections on mobile networks. The SDK would enter
"Connecting" state but fail to recover.

Changes:

UI (CustomLottieView.swift):
- Always show "Reconnecting..." when tunnel is up but SDK is not
connected, regardless of isRestarting flag
- Removed premature exit to "Disconnected" state during reconnection
- This covers both iOS-initiated and SDK-initiated reconnections

ConnectionListener.swift:
- Don't reset isRestarting in onDisconnected() - wait for onConnected()
- Prevents UI flashing "Disconnected" during network switch restarts

PacketTunnelProvider.swift (iOS & tvOS):
- Added stuck "Connecting" state detection with auto-recovery
- Parameters: 90s threshold, max 3 attempts, 60s backoff
- Checks network availability before attempting restart
- Thread-safe restart flag to prevent concurrent restarts

This is a client-side workaround for Go SDK keepalive issues on
  cellular networks (same behavior seen on Android).
Add early returns on animations if the network is unavailable
(airplane mode)
Also lower the detection interval to 60 seconds
onConnected will now leave it to the restart completion handler
So that the UI won't report a wrong state when the TUN is still up
For networkUnavailable purposes; when app was closed
with that value saved to UserPreferences, it would then
report the wrong UI state when the app was reopened.
The stale value wrongly sets the UI to disconnected even
after starting a new TUN successfully.
Version is bundled with NetBirdSDK 0.64.3
@coderabbitai
Copy link

coderabbitai bot commented Jan 31, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

  • ✅ Full review completed - (🔄 Check again to review again)
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch bug/connection-state-report-mismatch

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@NetbirdNetworkExtension/PacketTunnelProvider.swift`:
- Around line 30-50: The stuck-state tracking fields (stuckStateStartTime,
stuckRestartAttempts, lastStuckRestartTime) are not thread-safe because they’re
read from getStatus/handleAppMessage and modified from
monitorQueue/restartClient; add a dedicated lock (e.g., stuckStateLock =
NSLock()) and guard all reads/writes to these three vars with
stuckStateLock.lock()/unlock() (including inside checkAndRecoverFromStuckState,
restartClient, and any access in getStatus/handleAppMessage), keeping
isRestartInProgress lock as-is; alternatively you can dispatch all accesses to
monitorQueue instead—pick one approach and apply it consistently to those
functions to ensure serialization.

In `@NetBirdTVNetworkExtension/PacketTunnelProvider.swift`:
- Around line 43-50: The tvOS PacketTunnelProvider uses a plain Bool
isRestartInProgress which can be modified in restartClient() and read in
getStatus(), causing potential data races; change isRestartInProgress to a
thread-safe property (mirror the iOS pattern) by introducing a private lock
(e.g., DispatchQueue or NSLock) and exposing a computed property
isRestartInProgress that reads/writes under the lock, then update all references
in restartClient() and getStatus() to use the new computed property so all
access is serialized and race-free.
🧹 Nitpick comments (3)
NetBirdTVNetworkExtension/PacketTunnelProvider.swift (2)

213-232: Consider adding a restart timeout like iOS implementation.

The iOS version (lines 249-256 of NetbirdNetworkExtension/PacketTunnelProvider.swift) includes a 30-second timeout to reset flags if the restart hangs. The tvOS version doesn't have this safeguard, which could leave the extension stuck if adapter?.start never completes.

⏱️ Suggested timeout addition
 func restartClient() {
     guard !isRestartInProgress else {
         logger.info("restartClient: skipping - restart already in progress")
         return
     }
     isRestartInProgress = true
     adapter?.isRestarting = true

+    // Timeout after 30 seconds to reset flags if restart hangs
+    let timeoutWorkItem = DispatchWorkItem { [weak self] in
+        guard let self = self, self.isRestartInProgress else { return }
+        logger.info("restartClient: timeout - resetting flags")
+        self.adapter?.isRestarting = false
+        self.isRestartInProgress = false
+    }
+    monitorQueue.asyncAfter(deadline: .now() + 30, execute: timeoutWorkItem)
+
     logger.info("restartClient: Restarting client due to network change")
     adapter?.stop()
     adapter?.start { [weak self] error in
+        timeoutWorkItem.cancel()
         self?.isRestartInProgress = false
         self?.adapter?.isRestarting = false

273-277: tvOS silently gives up after max retries, unlike iOS which tears down the tunnel.

On iOS (line 326-336), after max restart attempts, the tunnel is canceled with an error so the user knows to reconnect manually. On tvOS, the code just logs and returns, leaving the tunnel in a potentially broken state without user feedback.

Consider whether tvOS should also cancel the tunnel or at least provide user feedback.

NetBird/Source/App/Views/Components/CustomLottieView.swift (1)

6-14: Consider documenting the isRestarting binding's purpose in a comment.

The isRestarting binding is used as a trigger for state re-evaluation (line 50) rather than being directly checked in conditionals. While this pattern is valid and the behavior is documented in comments at lines 227-228, adding a brief doc comment at the property declaration would help future maintainers understand this design choice.

     `@Binding` var networkUnavailable: Bool
+    /// Used to trigger UI state re-evaluation when restart begins/ends.
+    /// Actual reconnecting behavior is determined by extensionStatus/engineStatus combination.
     `@Binding` var isRestarting: Bool

Comment on lines +30 to +50

/// Stuck state detection - for auto-recovery when SDK fails to reconnect
/// Tracks both "Connecting" and "Disconnected" states when tunnel is up
private var stuckStateStartTime: Date?
private var stuckRestartAttempts: Int = 0
private var lastStuckRestartTime: Date?
private let stuckStateThreshold: TimeInterval = 60 // seconds before considering "stuck"
private let maxStuckRestartAttempts: Int = 3
private let minTimeBetweenStuckRestarts: TimeInterval = 60 // minimum backoff between attempts

/// Thread-safe restart progress flag
private let restartLock = NSLock()
private var _isRestartInProgress = false
private var isRestartInProgress: Bool {
get { restartLock.lock(); defer { restartLock.unlock() }; return _isRestartInProgress }
set { restartLock.lock(); defer { restartLock.unlock() }; _isRestartInProgress = newValue }
}

/// Flag to indicate tunnel is being intentionally stopped (user disconnect)
/// Prevents stuck recovery from triggering during shutdown
private var isTunnelStopping = false
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Thread-safety concern with stuck-state tracking fields.

The fields stuckStateStartTime, stuckRestartAttempts, and lastStuckRestartTime are accessed from getStatus (which runs on an unspecified queue via handleAppMessage) but also modified from monitorQueue via restartClient. Only isRestartInProgress is protected with a lock.

Consider either:

  1. Protecting these fields with a lock (similar to isRestartInProgress)
  2. Dispatching checkAndRecoverFromStuckState to monitorQueue to ensure all accesses are serialized
🔒 Option 1: Add lock protection
+    private let stuckStateLock = NSLock()
     private var stuckStateStartTime: Date?
     private var stuckRestartAttempts: Int = 0
     private var lastStuckRestartTime: Date?

Then wrap accesses in stuckStateLock.lock()/unlock().

🤖 Prompt for AI Agents
In `@NetbirdNetworkExtension/PacketTunnelProvider.swift` around lines 30 - 50, The
stuck-state tracking fields (stuckStateStartTime, stuckRestartAttempts,
lastStuckRestartTime) are not thread-safe because they’re read from
getStatus/handleAppMessage and modified from monitorQueue/restartClient; add a
dedicated lock (e.g., stuckStateLock = NSLock()) and guard all reads/writes to
these three vars with stuckStateLock.lock()/unlock() (including inside
checkAndRecoverFromStuckState, restartClient, and any access in
getStatus/handleAppMessage), keeping isRestartInProgress lock as-is;
alternatively you can dispatch all accesses to monitorQueue instead—pick one
approach and apply it consistently to those functions to ensure serialization.

Comment on lines +43 to +50
/// Stuck "Connecting" state detection - for auto-recovery when SDK fails to reconnect
private var connectingStartTime: Date?
private var stuckRestartAttempts: Int = 0
private var lastStuckRestartTime: Date?
private let stuckConnectingThreshold: TimeInterval = 90 // seconds before considering "stuck"
private let maxStuckRestartAttempts: Int = 3
private let minTimeBetweenStuckRestarts: TimeInterval = 60 // minimum backoff between attempts
private var isRestartInProgress: Bool = false
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

isRestartInProgress lacks thread-safety on tvOS.

Unlike the iOS implementation which uses a lock-protected computed property, tvOS uses a simple Bool. Since restartClient() modifies this flag while getStatus() reads it (potentially from different threads), this could cause data races.

🔒 Suggested fix for thread safety
+    private let restartLock = NSLock()
+    private var _isRestartInProgress: Bool = false
-    private var isRestartInProgress: Bool = false
+    private var isRestartInProgress: Bool {
+        get { restartLock.lock(); defer { restartLock.unlock() }; return _isRestartInProgress }
+        set { restartLock.lock(); defer { restartLock.unlock() }; _isRestartInProgress = newValue }
+    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
/// Stuck "Connecting" state detection - for auto-recovery when SDK fails to reconnect
private var connectingStartTime: Date?
private var stuckRestartAttempts: Int = 0
private var lastStuckRestartTime: Date?
private let stuckConnectingThreshold: TimeInterval = 90 // seconds before considering "stuck"
private let maxStuckRestartAttempts: Int = 3
private let minTimeBetweenStuckRestarts: TimeInterval = 60 // minimum backoff between attempts
private var isRestartInProgress: Bool = false
/// Stuck "Connecting" state detection - for auto-recovery when SDK fails to reconnect
private var connectingStartTime: Date?
private var stuckRestartAttempts: Int = 0
private var lastStuckRestartTime: Date?
private let stuckConnectingThreshold: TimeInterval = 90 // seconds before considering "stuck"
private let maxStuckRestartAttempts: Int = 3
private let minTimeBetweenStuckRestarts: TimeInterval = 60 // minimum backoff between attempts
private let restartLock = NSLock()
private var _isRestartInProgress: Bool = false
private var isRestartInProgress: Bool {
get { restartLock.lock(); defer { restartLock.unlock() }; return _isRestartInProgress }
set { restartLock.lock(); defer { restartLock.unlock() }; _isRestartInProgress = newValue }
}
🤖 Prompt for AI Agents
In `@NetBirdTVNetworkExtension/PacketTunnelProvider.swift` around lines 43 - 50,
The tvOS PacketTunnelProvider uses a plain Bool isRestartInProgress which can be
modified in restartClient() and read in getStatus(), causing potential data
races; change isRestartInProgress to a thread-safe property (mirror the iOS
pattern) by introducing a private lock (e.g., DispatchQueue or NSLock) and
exposing a computed property isRestartInProgress that reads/writes under the
lock, then update all references in restartClient() and getStatus() to use the
new computed property so all access is serialized and race-free.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant