[DEV-1437] Add leader failover reconnection tests by w1am · Pull Request #475 · kurrent-io/KurrentDB-Client-NodeJS

w1am · 2026-04-06T11:31:50Z

Cover reconnection behavior for both gRPC (write) and Rust bridge (read) code paths during leader failover scenarios including node kill/resurrect, NotLeader error recovery, concurrent operations, and repeated reads with requiresLeader.

test: fix leader failover tests for dual-connection architecture - Remove redundant write reconnection test (covered by UnavailableError.test.ts) - Remove test.only marker - Remove customer name references from comments - Fix "readStream after kill" test with retry loop for Rust client stabilization - Fix "mixed read/write" test to trigger NotLeader on both gRPC and bridge paths

qodo-code-review · 2026-04-13T10:21:20Z

Review Summary by Qodo

Add leader failover reconnection tests for dual-connection architecture

🧪 Tests

Walkthroughs

Description

• Add comprehensive leader failover reconnection tests for dual-connection architecture
• Cover gRPC write path with leader kill/resurrect scenarios
• Cover Rust bridge read path with NotLeader error recovery
• Test concurrent operations and mixed read/write during failover

Diagram

flowchart LR
  A["Test Suite"] --> B["Write Operations<br/>gRPC Path"]
  A --> C["Read Operations<br/>Rust Bridge Path"]
  A --> D["Concurrent Operations<br/>Mixed Paths"]
  B --> B1["Leader Kill/Resurrect"]
  C --> C1["NotLeader Error Recovery"]
  C --> C2["Cluster Stabilization"]
  D --> D1["Parallel Writes"]
  D --> D2["Mixed Read/Write"]

File Changes

1. packages/test/src/connection/reconnect/leader-failover.test.ts 🧪 Tests +441/-0

Leader failover reconnection test suite

• New test file with 441 lines covering leader failover reconnection scenarios
• Tests write operations (gRPC path) with leader kill/resurrect and reconnection validation
• Tests read operations (Rust bridge path) with NotLeader error handling and recovery
• Tests concurrent write operations and mixed read/write operations during failover
• Includes retry loops for Rust client stabilization and cluster recovery delays

packages/test/src/connection/reconnect/leader-failover.test.ts

qodo-code-review · 2026-04-13T10:21:21Z

Code Review by Qodo

🐞 Bugs (4) 📘 Rule violations (0) 📎 Requirement gaps (0) 🖥 UI issues (0) 🎨 UX Issues (0)

🐞\ ≡ Correctness (2) ☼ Reliability (2)

1. Missing failover assertions 🐞 ≡

Description

The leader-resurrection write test never asserts that the post-kill append actually failed and never
verifies the client did not reconnect to the resurrected (now follower) node, so it can pass while
not validating the behavior described in the test comments.

Code

packages/test/src/connection/reconnect/leader-failover.test.ts[R57-81]

+      // First operation should fail
+      try {
+        await client.appendToStream(
+          "resurrect-stream",
+          jsonEvent({ type: "should-fail", data: { message: "test" } }),
+          { credentials: { username: "admin", password: "changeit" } }
+        );
+      } catch (error) {
+        // Expected failure
+      }
+
+      // Resurrect the killed leader (it comes back as follower)
+      await cluster.resurrect();
+      await delay(5_000);
+
+      // Subsequent operations should succeed on the new leader
+      const afterResurrect = await client.appendToStream(
+        "resurrect-stream",
+        jsonEvent({
+          type: "after-resurrect",
+          data: { message: "test" },
+        }),
+        { credentials: { username: "admin", password: "changeit" } }
+      );
+      expect(afterResurrect).toBeDefined();

Evidence
The test explicitly claims it must not reconnect to the resurrected follower, but it only swallows
the expected failure and then checks that a later append is defined; it never validates that an
error occurred nor that the connection endpoint changed away from the killed node.
packages/test/src/connection/reconnect/leader-failover.test.ts[29-35]
packages/test/src/connection/reconnect/leader-failover.test.ts[51-66]
packages/test/src/connection/reconnect/leader-failover.test.ts[68-82]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
The test `should reconnect after leader is killed and resurrected as follower` can pass even if the post-kill operation does not fail and even if the client reconnects back to the resurrected node, because it does not assert either condition.

### Issue Context
The test’s comment requires: "Client should NOT reconnect to the old (now follower) node", but the test only checks that a later append returns a defined result.

### Fix Focus Areas
- packages/test/src/connection/reconnect/leader-failover.test.ts[29-82]

### Suggested changes
- Replace the post-kill `try/catch` with an explicit rejection assertion (e.g., `await expect(client.appendToStream(...)).rejects.toBeDefined()` or `rejects.toBeInstanceOf(UnavailableError)` if appropriate).
- After the successful post-resurrect append, call `getCurrentConnection(client)` again and assert it is **not** equal to the original `leaderConnection` (or assert it matches the newly elected leader if you can derive that reliably).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

2. Caught errors not asserted 🐞 ≡

Description

Multiple tests catch and ignore errors without asserting the expected error type (e.g.,
NotLeaderError) or even asserting that an error occurred, which can hide regressions and allow
unintended success/failure modes to slip through.

Code

packages/test/src/connection/reconnect/leader-failover.test.ts[R119-131]

+      // Read with requiresLeader: true should fail on follower
+      try {
+        const events = await collect(
+          client.readStream("read-reconnect-stream", {
+            maxCount: 10,
+            fromRevision: START,
+            requiresLeader: true,
+          })
+        );
+        expect(events).toBe("unreachable");
+      } catch (error) {
+        // Expected: NotLeaderError from the Rust bridge
+      }

Evidence
Several try/catch blocks treat errors as “expected” but never assert what was thrown; for
Rust-bridge reads, the library explicitly maps bridge NotLeader errors into NotLeaderError, so the
tests can and should validate the specific exception.
packages/test/src/connection/reconnect/leader-failover.test.ts[119-131]
packages/test/src/connection/reconnect/leader-failover.test.ts[157-169]
packages/test/src/connection/reconnect/leader-failover.test.ts[220-230]
packages/test/src/connection/reconnect/leader-failover.test.ts[274-285]
packages/db-client/src/utils/convertBridgeError.ts[29-44]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
The tests currently swallow “expected” errors (NotLeader, node-down) without asserting they actually happened and/or without asserting the error type. This reduces the tests’ ability to detect regressions and makes failures harder to diagnose.

### Issue Context
For Rust bridge reads, `convertBridgeError` maps bridge errors into typed JS errors (including `NotLeaderError`), so tests can validate behavior precisely.

### Fix Focus Areas
- packages/test/src/connection/reconnect/leader-failover.test.ts[57-66]
- packages/test/src/connection/reconnect/leader-failover.test.ts[119-131]
- packages/test/src/connection/reconnect/leader-failover.test.ts[157-169]
- packages/test/src/connection/reconnect/leader-failover.test.ts[220-230]
- packages/test/src/connection/reconnect/leader-failover.test.ts[274-285]

### Suggested changes
- Import and assert specific error classes where appropriate (e.g., `NotLeaderError`, `UnavailableError`).
- Use `await expect(promise).rejects.toBeInstanceOf(NotLeaderError)` (or `.toThrow(NotLeaderError)`), rather than empty catch blocks.
- Where a rejection is required for the test’s logic, add an assertion that ensures the failure path executed (e.g., `expect.assertions(n)` or a `let failed = false` guard).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

3. Resurrect leader readiness assumed 🐞 ☼

Description

Tests assume a fixed delay after cluster.resurrect() is sufficient, but resurrect() only waits
for liveness and does not wait for leader election, which can cause intermittent failures when
operations run before the cluster is stable.

Code

packages/test/src/connection/reconnect/leader-failover.test.ts[R68-71]

+      // Resurrect the killed leader (it comes back as follower)
+      await cluster.resurrect();
+      await delay(5_000);
+

Evidence
Cluster.resurrect() returns after waiting for /health/live only, whereas leader election
readiness is a separate check (leaderElected) used during initial up(). The new tests use
hardcoded sleeps after resurrection instead of waiting for leader election or retrying
leader-dependent operations.
packages/test/src/utils/Cluster.ts[385-392]
packages/test/src/utils/Cluster.ts[503-542]
packages/test/src/connection/reconnect/leader-failover.test.ts[68-71]
packages/test/src/connection/reconnect/leader-failover.test.ts[232-235]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
After `cluster.resurrect()`, tests use fixed sleeps before proceeding. Since `resurrect()` only checks liveness, leader election may still be in progress, making tests timing-dependent.

### Issue Context
`Cluster.up()` waits for leader election for multi-node clusters, but `Cluster.resurrect()` does not.

### Fix Focus Areas
- packages/test/src/utils/Cluster.ts[385-396]
- packages/test/src/connection/reconnect/leader-failover.test.ts[68-82]
- packages/test/src/connection/reconnect/leader-failover.test.ts[232-254]

### Suggested changes
- Prefer polling/attempt loops (as already done later in this file) for post-resurrect operations rather than a single fixed `delay(5_000)`.
- Optionally extend `Cluster.resurrect()` to wait for leader election when `count > 1` (e.g., call an internal/public leader-election wait) so all tests get consistent semantics.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

4. Concurrent reconnect race in test 🐞 ☼

Description

The concurrent write test assumes reconnection has completed immediately after the first NotLeader
failure, but reconnection is triggered asynchronously (handleError is not awaited), so parallel
appends can race with reconnection and intermittently reject.

Code

packages/test/src/connection/reconnect/leader-failover.test.ts[R331-379]

+      // Trigger NotLeader error with requiresLeader
+      try {
+        await client.appendToStream(
+          "concurrent-stream",
+          jsonEvent({ type: "trigger-not-leader", data: {} }),
+          {
+            requiresLeader: true,
+            credentials: { username: "admin", password: "changeit" },
+          }
+        );
+        expect("should not reach").toBe("here");
+      } catch (error) {
+        // Expected: NotLeaderError triggers reconnection
+      }
+
+      // Now fire multiple operations rapidly in parallel
+      // If handleError has a race condition, some of these could fail
+      const results = await Promise.allSettled([
+        client.appendToStream(
+          "concurrent-stream",
+          jsonEvent({ type: "concurrent-1", data: {} }),
+          {
+            requiresLeader: true,
+            credentials: { username: "admin", password: "changeit" },
+          }
+        ),
+        client.appendToStream(
+          "concurrent-stream",
+          jsonEvent({ type: "concurrent-2", data: {} }),
+          {
+            requiresLeader: true,
+            credentials: { username: "admin", password: "changeit" },
+          }
+        ),
+        client.appendToStream(
+          "concurrent-stream",
+          jsonEvent({ type: "concurrent-3", data: {} }),
+          {
+            requiresLeader: true,
+            credentials: { username: "admin", password: "changeit" },
+          }
+        ),
+      ]);
+
+      // All operations should succeed since the client should have
+      // reconnected to the leader after the first NotLeader error
+      for (const result of results) {
+        expect(result.status).toBe("fulfilled");
+      }

Evidence
After the NotLeader-triggering append, the test immediately starts parallel leader-required appends
and expects all fulfilled. In the client, execute() calls handleError() without await, meaning
the reconnection state update can still be in flight when subsequent operations begin.
packages/test/src/connection/reconnect/leader-failover.test.ts[331-379]
packages/db-client/src/Client/index.ts[466-478]
packages/db-client/src/Client/index.ts[530-564]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
The test fires concurrent `requiresLeader` appends immediately after a NotLeader-triggering call and asserts all are fulfilled. Because reconnection is initiated asynchronously, this can race and produce intermittent failures.

### Issue Context
`Client.execute()` triggers `handleError()` but does not await it, so reconnection may not be fully applied before the next operations run.

### Fix Focus Areas
- packages/test/src/connection/reconnect/leader-failover.test.ts[314-380]
- packages/db-client/src/Client/index.ts[466-478]

### Suggested changes
- Add a small “reconnection gate” before launching parallel writes, e.g. retry a single `appendToStream(... requiresLeader: true ...)` until it succeeds, then run `Promise.allSettled`.
- Alternatively, relax the assertion to allow transient failures and retry rejected results (if the goal is to validate eventual success under concurrency).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

ⓘ The new review experience is currently in Beta. Learn more

github-actions

@w1am 👉 Created pull request targeting release/v1.1: #477

github-actions

@w1am 👉 Created pull request targeting release/v1.2: #478

qodo-code-review · 2026-04-13T10:25:59Z

+      // First operation should fail
+      try {
+        await client.appendToStream(
+          "resurrect-stream",
+          jsonEvent({ type: "should-fail", data: { message: "test" } }),
+          { credentials: { username: "admin", password: "changeit" } }
+        );
+      } catch (error) {
+        // Expected failure
+      }
+
+      // Resurrect the killed leader (it comes back as follower)
+      await cluster.resurrect();
+      await delay(5_000);
+
+      // Subsequent operations should succeed on the new leader
+      const afterResurrect = await client.appendToStream(
+        "resurrect-stream",
+        jsonEvent({
+          type: "after-resurrect",
+          data: { message: "test" },
+        }),
+        { credentials: { username: "admin", password: "changeit" } }
+      );
+      expect(afterResurrect).toBeDefined();


1. Missing failover assertions 🐞 Bug ≡ Correctness

The leader-resurrection write test never asserts that the post-kill append actually failed and never verifies the client did not reconnect to the resurrected (now follower) node, so it can pass while not validating the behavior described in the test comments.

Agent Prompt

### Issue description The test `should reconnect after leader is killed and resurrected as follower` can pass even if the post-kill operation does not fail and even if the client reconnects back to the resurrected node, because it does not assert either condition. ### Issue Context The test’s comment requires: "Client should NOT reconnect to the old (now follower) node", but the test only checks that a later append returns a defined result. ### Fix Focus Areas - packages/test/src/connection/reconnect/leader-failover.test.ts[29-82] ### Suggested changes - Replace the post-kill `try/catch` with an explicit rejection assertion (e.g., `await expect(client.appendToStream(...)).rejects.toBeDefined()` or `rejects.toBeInstanceOf(UnavailableError)` if appropriate). - After the successful post-resurrect append, call `getCurrentConnection(client)` again and assert it is **not** equal to the original `leaderConnection` (or assert it matches the newly elected leader if you can derive that reliably).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

qodo-code-review · 2026-04-13T10:25:59Z

+      // Read with requiresLeader: true should fail on follower
+      try {
+        const events = await collect(
+          client.readStream("read-reconnect-stream", {
+            maxCount: 10,
+            fromRevision: START,
+            requiresLeader: true,
+          })
+        );
+        expect(events).toBe("unreachable");
+      } catch (error) {
+        // Expected: NotLeaderError from the Rust bridge
+      }


2. Caught errors not asserted 🐞 Bug ≡ Correctness

Multiple tests catch and ignore errors without asserting the expected error type (e.g., NotLeaderError) or even asserting that an error occurred, which can hide regressions and allow unintended success/failure modes to slip through.

Agent Prompt

### Issue description The tests currently swallow “expected” errors (NotLeader, node-down) without asserting they actually happened and/or without asserting the error type. This reduces the tests’ ability to detect regressions and makes failures harder to diagnose. ### Issue Context For Rust bridge reads, `convertBridgeError` maps bridge errors into typed JS errors (including `NotLeaderError`), so tests can validate behavior precisely. ### Fix Focus Areas - packages/test/src/connection/reconnect/leader-failover.test.ts[57-66] - packages/test/src/connection/reconnect/leader-failover.test.ts[119-131] - packages/test/src/connection/reconnect/leader-failover.test.ts[157-169] - packages/test/src/connection/reconnect/leader-failover.test.ts[220-230] - packages/test/src/connection/reconnect/leader-failover.test.ts[274-285] ### Suggested changes - Import and assert specific error classes where appropriate (e.g., `NotLeaderError`, `UnavailableError`). - Use `await expect(promise).rejects.toBeInstanceOf(NotLeaderError)` (or `.toThrow(NotLeaderError)`), rather than empty catch blocks. - Where a rejection is required for the test’s logic, add an assertion that ensures the failure path executed (e.g., `expect.assertions(n)` or a `let failed = false` guard).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

linear · 2026-04-13T11:19:48Z

DEV-1437 NodeJS | Increase reconnection test coverage

w1am force-pushed the w1am/test-leader-failover branch from e72d149 to 1170335 Compare April 9, 2026 09:24

w1am changed the title ~~add leader failover reconnection tests~~ Add leader failover reconnection tests Apr 13, 2026

w1am marked this pull request as ready for review April 13, 2026 10:21

w1am added cherry-pick:release/v1.1 Cherry picks PR into v1.1 release branch cherry-pick:release/v1.2 labels Apr 13, 2026

w1am merged commit 72f5dda into master Apr 13, 2026
30 checks passed

w1am deleted the w1am/test-leader-failover branch April 13, 2026 10:23

github-actions bot mentioned this pull request Apr 13, 2026

[release/v1.1] Add leader failover reconnection tests #477

Merged

github-actions bot reviewed Apr 13, 2026

View reviewed changes

github-actions bot mentioned this pull request Apr 13, 2026

[release/v1.2] Add leader failover reconnection tests #478

Merged

github-actions bot reviewed Apr 13, 2026

View reviewed changes

qodo-code-review bot reviewed Apr 13, 2026

View reviewed changes

w1am changed the title ~~Add leader failover reconnection tests~~ [DEV-1437] Add leader failover reconnection tests Apr 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DEV-1437] Add leader failover reconnection tests#475

[DEV-1437] Add leader failover reconnection tests#475
w1am merged 1 commit intomasterfrom
w1am/test-leader-failover

w1am commented Apr 6, 2026

Uh oh!

qodo-code-review bot commented Apr 13, 2026

Uh oh!

qodo-code-review bot commented Apr 13, 2026 •

edited

Loading

Uh oh!

Uh oh!

github-actions bot left a comment

Uh oh!

github-actions bot left a comment

Uh oh!

qodo-code-review bot Apr 13, 2026

Uh oh!

qodo-code-review bot Apr 13, 2026

Uh oh!

linear bot commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

w1am commented Apr 6, 2026

Uh oh!

qodo-code-review bot commented Apr 13, 2026

Review Summary by Qodo

Walkthroughs

File Changes

Uh oh!

qodo-code-review bot commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review by Qodo

Uh oh!

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

qodo-code-review bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

qodo-code-review bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

linear bot commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

qodo-code-review bot commented Apr 13, 2026 •

edited

Loading