Skip to content

fix: WebSocket reconnect v2 — backoff ineffective, kicked bots never recover, silent close not handled #139

@Jerry-Xin

Description

@Jerry-Xin

Problem

Follow-up to #135. The initial reconnection storm fix (merged via #134) resolved the token refresh flood but introduced two regressions, and one pre-existing issue remains unaddressed.

Observed in production (14-bot deployment, 72+ hours of logs):

Bug A: Exponential backoff is ineffective

reconnectAttempts resets to 0 on every successful CONNACK. When connections live only ~300ms before being closed by the server, backoff never kicks in — reconnect interval stays fixed at 3s forever.

Evidence: 3/30 03:00 logs show exactly 60 connects + 60 disconnects per minute (fixed 3s interval), no backoff progression.

Bug B: Kicked bots permanently die during cooldown window

When a bot is kicked, needReconnect is set to false in socket.ts. If the kick happens within the 60s token-refresh cooldown, onError does nothing — no refresh, no reconnect. The bot silently dies and never recovers.

Evidence: 3/30 22:15:34 token refresh → 22:15:40 kicked again (6s < 60s cooldown) → zero logs for 33+ minutes → bot confirmed dead until manual gateway restart.

This is the direct cause of "bots show offline" reports.

Bug C: Server-side silent close does not trigger token refresh

When WuKongIM server restarts, it closes WebSocket connections without sending a Kicked/DISCONNECT packet. The adapter reconnects with the old (now invalid) IM token indefinitely — token refresh only triggers on Kicked by server errors.

Evidence: 192.168.201.101 — 15 bots in connect→close loop (7,600 disconnects/hour) for 8+ hours, zero token refreshes triggered.

Proposed Fix

A: Delayed backoff reset

Don't reset reconnectAttempts on CONNACK. Instead, reset only after the connection has been stable for 30 seconds (via a delayed timer cleared on disconnect).

B: Cooldown limits refresh only, not reconnection

When kicked during cooldown: skip token refresh but still reconnect with current credentials. Combined with backoff (fix A), reconnect intervals grow (3s→6s→...→60s) preventing storms while keeping bots alive.

C: Rapid-disconnect detection triggers token refresh

Track connection duration. If 3 consecutive connections last <5 seconds each, emit a synthetic Connect failed error to trigger token refresh (subject to cooldown).

Impact

All three fixes are in socket.ts and channel.ts only. No config changes or server-side changes needed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions