fix: improve leader failover reconnection and error mapping#479
fix: improve leader failover reconnection and error mapping#479
Conversation
test: fix leader failover tests for dual-connection architecture - Remove redundant write reconnection test (covered by UnavailableError.test.ts) - Remove test.only marker - Remove customer name references from comments - Fix "readStream after kill" test with retry loop for Rust client stabilization - Fix "mixed read/write" test to trigger NotLeader on both gRPC and bridge paths
Bump @kurrent/bridge from 0.1.3 to 0.1.5 in db-client and benchmark packages to pick up improved error type support for leader failover scenarios.
- Map gRPC INTERNAL and DATA_LOSS status codes to UnavailableError so the client retries on these transient failures during leader failover - Add UnavailableError, DeadlineExceededError, and UnknownError handling to bridge error conversion - Use string literals for error name matching instead of class .name references for reliability - Remove stale commented-out code in convertBridgeError - Default unrecognized bridge errors to UnknownError instead of passing through the raw error
Review Summary by QodoAdd leader failover tests and improve error mapping for reconnection
WalkthroughsDescription• Add comprehensive leader failover reconnection tests covering gRPC writes and Rust bridge reads • Map gRPC INTERNAL and DATA_LOSS status codes to UnavailableError for automatic client retry • Improve bridge error conversion with string literals and handle UnavailableError, DeadlineExceededError, UnknownError • Bump @kurrent/bridge dependency from 0.1.3 to 0.1.5 Diagramflowchart LR
A["gRPC Status Codes<br/>INTERNAL, DATA_LOSS"] -->|"Map to UnavailableError"| B["Client Retry Logic"]
C["Bridge Error Conversion<br/>String Literals"] -->|"Handle Error Types"| D["UnavailableError<br/>DeadlineExceededError<br/>UnknownError"]
E["Leader Failover Tests<br/>Write/Read/Concurrent"] -->|"Verify Reconnection"| F["Cluster Stability"]
G["@kurrent/bridge 0.1.5"] -->|"Improved Error Support"| B
File Changes1. packages/db-client/src/utils/CommandError.ts
|
Code Review by Qodo
|
| switch (error.name) { | ||
| case StreamNotFoundError.name: | ||
| case "StreamNotFoundError": | ||
| return new StreamNotFoundError(serviceError, stream); | ||
| case StreamDeletedError.name: | ||
| case "StreamDeletedError": | ||
| return StreamDeletedError.fromStreamName(stream); | ||
| case NotLeaderError.name: | ||
| case "NotLeaderError": | ||
| return new NotLeaderError(serviceError); | ||
| case AccessDeniedError.name: | ||
| case "AccessDeniedError": | ||
| return new AccessDeniedError(serviceError); | ||
| case "UnavailableError": | ||
| return new UnavailableError(serviceError); | ||
| case "DeadlineExceededError": | ||
| return new DeadlineExceededError(serviceError); | ||
| case "UnknownError": | ||
| return new UnknownError(serviceError); | ||
| default: | ||
| return error; | ||
| return new UnknownError(serviceError); | ||
| } |
There was a problem hiding this comment.
1. Unknown errors get masked 🐞 Bug ≡ Correctness
convertBridgeError now converts any unrecognized thrown value into UnknownError, so non-bridge exceptions (e.g., bugs thrown while converting/yielding events) get misclassified and lose their original error type/stack. This is especially problematic because readStream/readAll catch broadly and route all exceptions through convertBridgeError.
Agent Prompt
## Issue description
`convertBridgeError` now wraps *all* unrecognized errors into `UnknownError`. Because `readStream` / `readAll` catch broadly around async iteration, this change masks non-bridge exceptions (e.g., `TypeError`, conversion bugs) and discards the original error identity/stack.
## Issue Context
The bridge conversion helper should only translate *known* bridge-originated errors. Unexpected errors should propagate unchanged (or be wrapped while preserving the original error as `cause` and keeping the original stack).
## Fix Focus Areas
- packages/db-client/src/utils/convertBridgeError.ts[12-33]
- packages/db-client/src/streams/readStream.ts[114-126]
- packages/db-client/src/streams/readAll.ts[84-96]
## Suggested approach
- In `convertBridgeError`, before switching/wrapping, detect whether the error is a bridge/grpc-like error (e.g., has `code` and/or `metadata`) or matches a known bridge error name.
- If it’s not a recognized bridge error, return/throw the original `error` unchanged.
- If you still want to map unknown *bridge* errors, wrap them but preserve the original error via `cause` (and/or attach the original error object on the wrapper) so debugging remains possible.
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
Summary
INTERNALandDATA_LOSSstatus codes toUnavailableErrorso the client retries on these transient failures during leader failoverUnavailableError,DeadlineExceededError, andUnknownError, use string literals for error name matching, and default unrecognized errors toUnknownError@kurrent/bridgefrom 0.1.3 to 0.1.5Test plan
leader-failover.test.tstests pass against a 3-node clusterNotLeaderErrorNotLeaderError