Problem
Follow-up to #135. The initial reconnection storm fix (merged via #134) resolved the token refresh flood but introduced two regressions, and one pre-existing issue remains unaddressed.
Observed in production (14-bot deployment, 72+ hours of logs):
Bug A: Exponential backoff is ineffective
reconnectAttempts resets to 0 on every successful CONNACK. When connections live only ~300ms before being closed by the server, backoff never kicks in — reconnect interval stays fixed at 3s forever.
Evidence: 3/30 03:00 logs show exactly 60 connects + 60 disconnects per minute (fixed 3s interval), no backoff progression.
Bug B: Kicked bots permanently die during cooldown window
When a bot is kicked, needReconnect is set to false in socket.ts. If the kick happens within the 60s token-refresh cooldown, onError does nothing — no refresh, no reconnect. The bot silently dies and never recovers.
Evidence: 3/30 22:15:34 token refresh → 22:15:40 kicked again (6s < 60s cooldown) → zero logs for 33+ minutes → bot confirmed dead until manual gateway restart.
This is the direct cause of "bots show offline" reports.
Bug C: Server-side silent close does not trigger token refresh
When WuKongIM server restarts, it closes WebSocket connections without sending a Kicked/DISCONNECT packet. The adapter reconnects with the old (now invalid) IM token indefinitely — token refresh only triggers on Kicked by server errors.
Evidence: 192.168.201.101 — 15 bots in connect→close loop (7,600 disconnects/hour) for 8+ hours, zero token refreshes triggered.
Proposed Fix
A: Delayed backoff reset
Don't reset reconnectAttempts on CONNACK. Instead, reset only after the connection has been stable for 30 seconds (via a delayed timer cleared on disconnect).
B: Cooldown limits refresh only, not reconnection
When kicked during cooldown: skip token refresh but still reconnect with current credentials. Combined with backoff (fix A), reconnect intervals grow (3s→6s→...→60s) preventing storms while keeping bots alive.
C: Rapid-disconnect detection triggers token refresh
Track connection duration. If 3 consecutive connections last <5 seconds each, emit a synthetic Connect failed error to trigger token refresh (subject to cooldown).
Impact
All three fixes are in socket.ts and channel.ts only. No config changes or server-side changes needed.
Problem
Follow-up to #135. The initial reconnection storm fix (merged via #134) resolved the token refresh flood but introduced two regressions, and one pre-existing issue remains unaddressed.
Observed in production (14-bot deployment, 72+ hours of logs):
Bug A: Exponential backoff is ineffective
reconnectAttemptsresets to 0 on every successful CONNACK. When connections live only ~300ms before being closed by the server, backoff never kicks in — reconnect interval stays fixed at 3s forever.Evidence: 3/30 03:00 logs show exactly 60 connects + 60 disconnects per minute (fixed 3s interval), no backoff progression.
Bug B: Kicked bots permanently die during cooldown window
When a bot is kicked,
needReconnectis set tofalsein socket.ts. If the kick happens within the 60s token-refresh cooldown,onErrordoes nothing — no refresh, no reconnect. The bot silently dies and never recovers.Evidence: 3/30 22:15:34 token refresh → 22:15:40 kicked again (6s < 60s cooldown) → zero logs for 33+ minutes → bot confirmed dead until manual gateway restart.
This is the direct cause of "bots show offline" reports.
Bug C: Server-side silent close does not trigger token refresh
When WuKongIM server restarts, it closes WebSocket connections without sending a Kicked/DISCONNECT packet. The adapter reconnects with the old (now invalid) IM token indefinitely — token refresh only triggers on
Kicked by servererrors.Evidence: 192.168.201.101 — 15 bots in connect→close loop (7,600 disconnects/hour) for 8+ hours, zero token refreshes triggered.
Proposed Fix
A: Delayed backoff reset
Don't reset
reconnectAttemptson CONNACK. Instead, reset only after the connection has been stable for 30 seconds (via a delayed timer cleared on disconnect).B: Cooldown limits refresh only, not reconnection
When kicked during cooldown: skip token refresh but still reconnect with current credentials. Combined with backoff (fix A), reconnect intervals grow (3s→6s→...→60s) preventing storms while keeping bots alive.
C: Rapid-disconnect detection triggers token refresh
Track connection duration. If 3 consecutive connections last <5 seconds each, emit a synthetic
Connect failederror to trigger token refresh (subject to cooldown).Impact
All three fixes are in
socket.tsandchannel.tsonly. No config changes or server-side changes needed.