fix: improve leader failover reconnection and error mapping by w1am · Pull Request #479 · kurrent-io/KurrentDB-Client-NodeJS

w1am · 2026-04-13T10:24:08Z

Summary

Add comprehensive reconnection tests for leader failover scenarios covering both gRPC write and Rust bridge read code paths, including concurrent operations and repeated reads
Map gRPC INTERNAL and DATA_LOSS status codes to UnavailableError so the client retries on these transient failures during leader failover
Improve bridge error conversion to handle UnavailableError, DeadlineExceededError, and UnknownError, use string literals for error name matching, and default unrecognized errors to UnknownError
Bump @kurrent/bridge from 0.1.3 to 0.1.5

Test plan

Verify new leader-failover.test.ts tests pass against a 3-node cluster
Confirm write operations reconnect after leader kill/resurrect
Confirm read operations (Rust bridge path) reconnect after NotLeaderError
Confirm concurrent mixed read/write operations recover after NotLeaderError
Verify existing reconnection tests still pass

test: fix leader failover tests for dual-connection architecture - Remove redundant write reconnection test (covered by UnavailableError.test.ts) - Remove test.only marker - Remove customer name references from comments - Fix "readStream after kill" test with retry loop for Rust client stabilization - Fix "mixed read/write" test to trigger NotLeader on both gRPC and bridge paths

Bump @kurrent/bridge from 0.1.3 to 0.1.5 in db-client and benchmark packages to pick up improved error type support for leader failover scenarios.

- Map gRPC INTERNAL and DATA_LOSS status codes to UnavailableError so the client retries on these transient failures during leader failover - Add UnavailableError, DeadlineExceededError, and UnknownError handling to bridge error conversion - Use string literals for error name matching instead of class .name references for reliability - Remove stale commented-out code in convertBridgeError - Default unrecognized bridge errors to UnknownError instead of passing through the raw error

qodo-code-review · 2026-04-13T10:24:25Z

Review Summary by Qodo

Add leader failover tests and improve error mapping for reconnection

🐞 Bug fix 🧪 Tests ✨ Enhancement

Walkthroughs

Description

• Add comprehensive leader failover reconnection tests covering gRPC writes and Rust bridge reads
• Map gRPC INTERNAL and DATA_LOSS status codes to UnavailableError for automatic client retry
• Improve bridge error conversion with string literals and handle UnavailableError,
  DeadlineExceededError, UnknownError
• Bump @kurrent/bridge dependency from 0.1.3 to 0.1.5

Diagram

flowchart LR
  A["gRPC Status Codes<br/>INTERNAL, DATA_LOSS"] -->|"Map to UnavailableError"| B["Client Retry Logic"]
  C["Bridge Error Conversion<br/>String Literals"] -->|"Handle Error Types"| D["UnavailableError<br/>DeadlineExceededError<br/>UnknownError"]
  E["Leader Failover Tests<br/>Write/Read/Concurrent"] -->|"Verify Reconnection"| F["Cluster Stability"]
  G["@kurrent/bridge 0.1.5"] -->|"Improved Error Support"| B

File Changes

1. packages/db-client/src/utils/CommandError.ts 🐞 Bug fix +2/-0

Map gRPC status codes to UnavailableError
• Map gRPC StatusCode.INTERNAL and StatusCode.DATA_LOSS to UnavailableError
• Enables automatic client retry on these transient failures during leader failover
packages/db-client/src/utils/CommandError.ts

2. packages/db-client/src/utils/convertBridgeError.ts ✨ Enhancement +14/-25

Improve bridge error conversion and handling

• Replace dynamic error class name references with string literals for reliability
• Add handling for UnavailableError, DeadlineExceededError, and UnknownError
• Default unrecognized errors to UnknownError instead of passing through raw error
• Remove stale commented-out code block

packages/db-client/src/utils/convertBridgeError.ts

3. packages/test/src/connection/reconnect/leader-failover.test.ts 🧪 Tests +441/-0

Add leader failover reconnection test suite

• Add comprehensive test suite for leader failover scenarios with 7 test cases
• Test write operations (gRPC path) with leader kill/resurrect cycles
• Test read operations (Rust bridge path) with NotLeader errors and reconnection
• Test concurrent mixed read/write operations during failover
• Include retry loops for Rust client stabilization after cluster changes

packages/test/src/connection/reconnect/leader-failover.test.ts

View more (2)

4. packages/benchmark/package.json Dependencies +1/-1

Update bridge dependency version
• Bump @kurrent/bridge from 0.1.3 to 0.1.5
packages/benchmark/package.json

5. packages/db-client/package.json Dependencies +1/-1

Update bridge dependency version
• Bump @kurrent/bridge from 0.1.3 to 0.1.5
packages/db-client/package.json

qodo-code-review · 2026-04-13T10:24:27Z

Code Review by Qodo

🐞 Bugs (3) 📘 Rule violations (0) 📎 Requirement gaps (0) 🖥 UI issues (0) 🎨 UX Issues (0)

🐞\ ≡ Correctness (1) ☼ Reliability (1) ⚙ Maintainability (1)

1. Unknown errors get masked 🐞 ≡

Description

convertBridgeError now converts any unrecognized thrown value into UnknownError, so non-bridge
exceptions (e.g., bugs thrown while converting/yielding events) get misclassified and lose their
original error type/stack. This is especially problematic because readStream/readAll catch broadly
and route all exceptions through convertBridgeError.

Code

packages/db-client/src/utils/convertBridgeError.ts[R16-33]

  switch (error.name) {
-    case StreamNotFoundError.name:
+    case "StreamNotFoundError":
      return new StreamNotFoundError(serviceError, stream);
-    case StreamDeletedError.name:
+    case "StreamDeletedError":
      return StreamDeletedError.fromStreamName(stream);
-    case NotLeaderError.name:
+    case "NotLeaderError":
      return new NotLeaderError(serviceError);
-    case AccessDeniedError.name:
+    case "AccessDeniedError":
      return new AccessDeniedError(serviceError);
+    case "UnavailableError":
+      return new UnavailableError(serviceError);
+    case "DeadlineExceededError":
+      return new DeadlineExceededError(serviceError);
+    case "UnknownError":
+      return new UnknownError(serviceError);
    default:
-      return error;
+      return new UnknownError(serviceError);
  }

Evidence
readStream/readAll catch *any* exception from the async iteration (including conversion/yield code
paths) and rethrow convertBridgeError(...). With the new default branch returning UnknownError,
unexpected runtime errors are now always wrapped instead of propagating as-is.
packages/db-client/src/utils/convertBridgeError.ts[12-33]
packages/db-client/src/streams/readStream.ts[114-126]
packages/db-client/src/streams/readAll.ts[84-96]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`convertBridgeError` now wraps *all* unrecognized errors into `UnknownError`. Because `readStream` / `readAll` catch broadly around async iteration, this change masks non-bridge exceptions (e.g., `TypeError`, conversion bugs) and discards the original error identity/stack.

## Issue Context
The bridge conversion helper should only translate *known* bridge-originated errors. Unexpected errors should propagate unchanged (or be wrapped while preserving the original error as `cause` and keeping the original stack).

## Fix Focus Areas
- packages/db-client/src/utils/convertBridgeError.ts[12-33]
- packages/db-client/src/streams/readStream.ts[114-126]
- packages/db-client/src/streams/readAll.ts[84-96]

## Suggested approach
- In `convertBridgeError`, before switching/wrapping, detect whether the error is a bridge/grpc-like error (e.g., has `code` and/or `metadata`) or matches a known bridge error name.
- If it’s not a recognized bridge error, return/throw the original `error` unchanged.
- If you still want to map unknown *bridge* errors, wrap them but preserve the original error via `cause` (and/or attach the original error object on the wrapper) so debugging remains possible.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

2. INTERNAL treated unavailable 🐞 ☼

Description

convertToCommandError now maps gRPC INTERNAL and DATA_LOSS to UnavailableError, which the client
explicitly treats as “server unavailable” and uses to trigger channel teardown/reconnect. This
broadens reconnection/retry classification to errors that are not necessarily
connectivity/unavailability related.

Code

packages/db-client/src/utils/CommandError.ts[R640-645]

    case StatusCode.DEADLINE_EXCEEDED:
      return new DeadlineExceededError(error);
    case StatusCode.UNAVAILABLE:
+    case StatusCode.INTERNAL:
+    case StatusCode.DATA_LOSS:
      return new UnavailableError(error);

Evidence
convertToCommandError now returns UnavailableError for INTERNAL/DATA_LOSS. Client.shouldReconnect
interprets UnavailableError as “Server is unavailable to take request” and will reconnect/tear down
the channel when seen, changing behavior for these status codes across operations that call
convertToCommandError.
packages/db-client/src/utils/CommandError.ts[622-646]
packages/db-client/src/Client/index.ts[513-528]
packages/db-client/src/streams/appendToStream/append.ts[59-69]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`StatusCode.INTERNAL` and `StatusCode.DATA_LOSS` are now mapped to `UnavailableError`, which drives reconnection logic (channel teardown). This expands “unavailable” handling beyond the cases the code comment describes.

## Issue Context
`Client.shouldReconnect` explicitly treats `UnavailableError` as a reconnection signal. If INTERNAL/DATA_LOSS can represent non-unavailability failures in your system, this may cause unnecessary reconnections and retries.

## Fix Focus Areas
- packages/db-client/src/utils/CommandError.ts[622-646]
- packages/db-client/src/Client/index.ts[513-528]

## Suggested approach
- Add a guard so INTERNAL/DATA_LOSS are only mapped to `UnavailableError` when you can identify the specific transient failover signature (e.g., via details/metadata markers used during leader failover).
- Alternatively, consider mapping only one of the codes (or keeping them as `UnknownError`) and add targeted tests proving the intended leader-failover behavior for whichever code(s) you keep.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

3. Failover test lacks asserts 🐞 ⚙

Description

Some steps labeled “should fail” swallow errors without asserting the failure happened, so the test
can pass even if the expected failure does not occur. This reduces the reliability of the new
reconnection coverage.

Code

packages/test/src/connection/reconnect/leader-failover.test.ts[R57-66]

+      // First operation should fail
+      try {
+        await client.appendToStream(
+          "resurrect-stream",
+          jsonEvent({ type: "should-fail", data: { message: "test" } }),
+          { credentials: { username: "admin", password: "changeit" } }
+        );
+      } catch (error) {
+        // Expected failure
+      }

Evidence
In the write failover test, the “First operation should fail” block has no assertion if the append
unexpectedly succeeds. Similarly, in the read failover test after killing the leader, the “Read
should fail” block has no ‘unreachable’ assertion and will silently continue if the read succeeds.
packages/test/src/connection/reconnect/leader-failover.test.ts[57-66]
packages/test/src/connection/reconnect/leader-failover.test.ts[220-230]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The new leader failover tests contain “should fail” steps that do not assert that a failure actually occurred, which can allow false positives.

## Issue Context
Try/catch with an empty catch block only verifies behavior if there is an explicit assertion in the try block (e.g., an “unreachable” expectation) or a flag checked after the catch.

## Fix Focus Areas
- packages/test/src/connection/reconnect/leader-failover.test.ts[57-66]
- packages/test/src/connection/reconnect/leader-failover.test.ts[220-230]

## Suggested approach
- Replace the try/catch blocks with `await expect(promise).rejects.toThrow()` (and ideally assert the error type).
- Or set a boolean like `let failed = false;` and assert `expect(failed).toBe(true)` after the try/catch.
- Consider asserting the error class (e.g., `UnavailableError`, `NotLeaderError`) to ensure you’re exercising the intended failure mode.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

ⓘ The new review experience is currently in Beta. Learn more

qodo-code-review · 2026-04-13T10:32:24Z

  switch (error.name) {
-    case StreamNotFoundError.name:
+    case "StreamNotFoundError":
      return new StreamNotFoundError(serviceError, stream);
-    case StreamDeletedError.name:
+    case "StreamDeletedError":
      return StreamDeletedError.fromStreamName(stream);
-    case NotLeaderError.name:
+    case "NotLeaderError":
      return new NotLeaderError(serviceError);
-    case AccessDeniedError.name:
+    case "AccessDeniedError":
      return new AccessDeniedError(serviceError);
+    case "UnavailableError":
+      return new UnavailableError(serviceError);
+    case "DeadlineExceededError":
+      return new DeadlineExceededError(serviceError);
+    case "UnknownError":
+      return new UnknownError(serviceError);
    default:
-      return error;
+      return new UnknownError(serviceError);
  }


1. Unknown errors get masked 🐞 Bug ≡ Correctness

convertBridgeError now converts any unrecognized thrown value into UnknownError, so non-bridge exceptions (e.g., bugs thrown while converting/yielding events) get misclassified and lose their original error type/stack. This is especially problematic because readStream/readAll catch broadly and route all exceptions through convertBridgeError.

Agent Prompt

## Issue description `convertBridgeError` now wraps *all* unrecognized errors into `UnknownError`. Because `readStream` / `readAll` catch broadly around async iteration, this change masks non-bridge exceptions (e.g., `TypeError`, conversion bugs) and discards the original error identity/stack. ## Issue Context The bridge conversion helper should only translate *known* bridge-originated errors. Unexpected errors should propagate unchanged (or be wrapped while preserving the original error as `cause` and keeping the original stack). ## Fix Focus Areas - packages/db-client/src/utils/convertBridgeError.ts[12-33] - packages/db-client/src/streams/readStream.ts[114-126] - packages/db-client/src/streams/readAll.ts[84-96] ## Suggested approach - In `convertBridgeError`, before switching/wrapping, detect whether the error is a bridge/grpc-like error (e.g., has `code` and/or `metadata`) or matches a known bridge error name. - If it’s not a recognized bridge error, return/throw the original `error` unchanged. - If you still want to map unknown *bridge* errors, wrap them but preserve the original error via `cause` (and/or attach the original error object on the wrapper) so debugging remains possible.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

w1am added 3 commits April 9, 2026 13:23

chore(deps): update @kurrent/bridge to 0.1.5

a9f8bc0

Bump @kurrent/bridge from 0.1.3 to 0.1.5 in db-client and benchmark packages to pick up improved error type support for leader failover scenarios.

w1am closed this Apr 13, 2026

w1am deleted the w1am/test-leader-failover branch April 13, 2026 10:25

qodo-code-review bot reviewed Apr 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: improve leader failover reconnection and error mapping#479

fix: improve leader failover reconnection and error mapping#479
w1am wants to merge 3 commits intomasterfrom
w1am/test-leader-failover

w1am commented Apr 13, 2026

Uh oh!

qodo-code-review bot commented Apr 13, 2026

Uh oh!

qodo-code-review bot commented Apr 13, 2026 •

edited

Loading

Uh oh!

qodo-code-review bot Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

w1am commented Apr 13, 2026

Summary

Test plan

Uh oh!

qodo-code-review bot commented Apr 13, 2026

Review Summary by Qodo

Walkthroughs

File Changes

Uh oh!

qodo-code-review bot commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review by Qodo

Uh oh!

qodo-code-review bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

qodo-code-review bot commented Apr 13, 2026 •

edited

Loading