[DEV-1437] Add leader failover reconnection tests#475
Conversation
test: fix leader failover tests for dual-connection architecture - Remove redundant write reconnection test (covered by UnavailableError.test.ts) - Remove test.only marker - Remove customer name references from comments - Fix "readStream after kill" test with retry loop for Rust client stabilization - Fix "mixed read/write" test to trigger NotLeader on both gRPC and bridge paths
e72d149 to
1170335
Compare
Review Summary by QodoAdd leader failover reconnection tests for dual-connection architecture
WalkthroughsDescription• Add comprehensive leader failover reconnection tests for dual-connection architecture • Cover gRPC write path with leader kill/resurrect scenarios • Cover Rust bridge read path with NotLeader error recovery • Test concurrent operations and mixed read/write during failover Diagramflowchart LR
A["Test Suite"] --> B["Write Operations<br/>gRPC Path"]
A --> C["Read Operations<br/>Rust Bridge Path"]
A --> D["Concurrent Operations<br/>Mixed Paths"]
B --> B1["Leader Kill/Resurrect"]
C --> C1["NotLeader Error Recovery"]
C --> C2["Cluster Stabilization"]
D --> D1["Parallel Writes"]
D --> D2["Mixed Read/Write"]
File Changes1. packages/test/src/connection/reconnect/leader-failover.test.ts
|
Code Review by Qodo
|
| // First operation should fail | ||
| try { | ||
| await client.appendToStream( | ||
| "resurrect-stream", | ||
| jsonEvent({ type: "should-fail", data: { message: "test" } }), | ||
| { credentials: { username: "admin", password: "changeit" } } | ||
| ); | ||
| } catch (error) { | ||
| // Expected failure | ||
| } | ||
|
|
||
| // Resurrect the killed leader (it comes back as follower) | ||
| await cluster.resurrect(); | ||
| await delay(5_000); | ||
|
|
||
| // Subsequent operations should succeed on the new leader | ||
| const afterResurrect = await client.appendToStream( | ||
| "resurrect-stream", | ||
| jsonEvent({ | ||
| type: "after-resurrect", | ||
| data: { message: "test" }, | ||
| }), | ||
| { credentials: { username: "admin", password: "changeit" } } | ||
| ); | ||
| expect(afterResurrect).toBeDefined(); |
There was a problem hiding this comment.
1. Missing failover assertions 🐞 Bug ≡ Correctness
The leader-resurrection write test never asserts that the post-kill append actually failed and never verifies the client did not reconnect to the resurrected (now follower) node, so it can pass while not validating the behavior described in the test comments.
Agent Prompt
### Issue description
The test `should reconnect after leader is killed and resurrected as follower` can pass even if the post-kill operation does not fail and even if the client reconnects back to the resurrected node, because it does not assert either condition.
### Issue Context
The test’s comment requires: "Client should NOT reconnect to the old (now follower) node", but the test only checks that a later append returns a defined result.
### Fix Focus Areas
- packages/test/src/connection/reconnect/leader-failover.test.ts[29-82]
### Suggested changes
- Replace the post-kill `try/catch` with an explicit rejection assertion (e.g., `await expect(client.appendToStream(...)).rejects.toBeDefined()` or `rejects.toBeInstanceOf(UnavailableError)` if appropriate).
- After the successful post-resurrect append, call `getCurrentConnection(client)` again and assert it is **not** equal to the original `leaderConnection` (or assert it matches the newly elected leader if you can derive that reliably).
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| // Read with requiresLeader: true should fail on follower | ||
| try { | ||
| const events = await collect( | ||
| client.readStream("read-reconnect-stream", { | ||
| maxCount: 10, | ||
| fromRevision: START, | ||
| requiresLeader: true, | ||
| }) | ||
| ); | ||
| expect(events).toBe("unreachable"); | ||
| } catch (error) { | ||
| // Expected: NotLeaderError from the Rust bridge | ||
| } |
There was a problem hiding this comment.
2. Caught errors not asserted 🐞 Bug ≡ Correctness
Multiple tests catch and ignore errors without asserting the expected error type (e.g., NotLeaderError) or even asserting that an error occurred, which can hide regressions and allow unintended success/failure modes to slip through.
Agent Prompt
### Issue description
The tests currently swallow “expected” errors (NotLeader, node-down) without asserting they actually happened and/or without asserting the error type. This reduces the tests’ ability to detect regressions and makes failures harder to diagnose.
### Issue Context
For Rust bridge reads, `convertBridgeError` maps bridge errors into typed JS errors (including `NotLeaderError`), so tests can validate behavior precisely.
### Fix Focus Areas
- packages/test/src/connection/reconnect/leader-failover.test.ts[57-66]
- packages/test/src/connection/reconnect/leader-failover.test.ts[119-131]
- packages/test/src/connection/reconnect/leader-failover.test.ts[157-169]
- packages/test/src/connection/reconnect/leader-failover.test.ts[220-230]
- packages/test/src/connection/reconnect/leader-failover.test.ts[274-285]
### Suggested changes
- Import and assert specific error classes where appropriate (e.g., `NotLeaderError`, `UnavailableError`).
- Use `await expect(promise).rejects.toBeInstanceOf(NotLeaderError)` (or `.toThrow(NotLeaderError)`), rather than empty catch blocks.
- Where a rejection is required for the test’s logic, add an assertion that ensures the failure path executed (e.g., `expect.assertions(n)` or a `let failed = false` guard).
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
Cover reconnection behavior for both gRPC (write) and Rust bridge (read) code paths during leader failover scenarios including node kill/resurrect, NotLeader error recovery, concurrent operations, and repeated reads with requiresLeader.