Skip to content

Fix ResultSet.json() race condition on JSONEachRow streams#603

Open
Onyx2406 wants to merge 4 commits intoClickHouse:mainfrom
Onyx2406:fix/result-set-stream-consumed-race
Open

Fix ResultSet.json() race condition on JSONEachRow streams#603
Onyx2406 wants to merge 4 commits intoClickHouse:mainfrom
Onyx2406:fix/result-set-stream-consumed-race

Conversation

@Onyx2406
Copy link
Copy Markdown

@Onyx2406 Onyx2406 commented Mar 8, 2026

Summary

Fix ResultSet.json() throwing "Stream has been already consumed" on the first call for JSONEachRow format due to a race condition between two readableEnded checks.

Fixes #575

Root Cause

For streamable JSON formats (JSONEachRow, etc.), json() calls stream() internally:

  1. json() checks this._stream.readableEndedfalse
  2. json() calls this.stream()
  3. Race window: Stream receives all data and fires 'end' event → readableEnded becomes true
  4. stream() checks this._stream.readableEndedtrue
  5. Throws: "Stream has been already consumed" — on the first call!

This is triggered by fast/small responses (e.g. SELECT number FROM numbers(1)) where the response arrives fully before json() reaches the stream() call.

Fix (44 insertions, 2 files)

packages/client-node/src/result_set.ts:

  • Add _consumed boolean flag to ResultSet class
  • Add markAsConsumed() private method that checks _consumed || readableEnded and sets _consumed = true synchronously
  • text(), json(), stream() all call markAsConsumed() as their first operation
  • Split stream() into public stream() (with check) and private _streamImpl() (without check)
  • json() calls _streamImpl() after already calling markAsConsumed(), eliminating the double-check race

This matches the pattern already used by the web client's ResultSet which uses isAlreadyConsumed boolean.

packages/client-node/__tests__/unit/node_result_set.test.ts:

  • Add regression test that yields to the event loop (setImmediate) before calling json(), simulating the race condition

Test Plan

  • New unit test for the race condition scenario
  • CI (full test suite)
  • Existing consumption tests (text(), json(), stream() only-once) continue to pass since markAsConsumed() covers all the same cases

When calling json() on a JSONEachRow result, the method first checks
_stream.readableEnded and then calls stream() which checks
readableEnded again. If the stream ends between these two checks
(common with fast/small responses), stream() throws "Stream has been
already consumed" even though this is the first consumption call.

Fix by introducing a _consumed boolean flag that is set synchronously
when any consumption method (text/json/stream) is called. This
eliminates the race window between the two readableEnded checks.

The fix splits stream() into a public method (with consumption check)
and a private _streamImpl() (without check) that json() calls
internally after already marking as consumed.

This matches the pattern used by the web client's ResultSet which
uses isAlreadyConsumed boolean instead of readableEnded.

Fixes ClickHouse#575
Copilot AI review requested due to automatic review settings March 8, 2026 07:41
readableEnded=true just means the 'end' event fired, NOT that someone
already consumed the data. For fast/small responses, the stream can
end before json() is even called, making readableEnded=true while
data is still buffered and available. Checking readableEnded would
reject the first consumption call — exactly the bug reported in ClickHouse#575.

Only use the _consumed boolean flag, which tracks actual consumption
by our code, not stream lifecycle events.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a race condition in the Node.js client ResultSet.json() for streamable JSON formats (e.g. JSONEachRow) where the first call could incorrectly throw "Stream has been already consumed" due to separate readableEnded checks.

Changes:

  • Add synchronous consumption tracking (_consumed + markAsConsumed()) to ResultSet and apply it to text(), json(), and stream().
  • Refactor stream() to delegate to a private _streamImpl() so json() can stream rows without re-checking consumption.
  • Add a unit regression test intended to simulate the fast-end/race scenario for JSONEachRow.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
packages/client-node/src/result_set.ts Introduces a boolean consumption flag and refactors stream()/json() to avoid double-check races on fast-ending streams.
packages/client-node/tests/unit/node_result_set.test.ts Adds a regression test targeting the reported "already consumed" first-call failure.
Comments suppressed due to low confidence (1)

packages/client-node/src/result_set.ts:136

  • json() marks the ResultSet as consumed before verifying that the current format can actually be decoded as JSON. In the non-JSON case (Cannot decode ${this.format} as JSON), this is a behavior change: callers can no longer fall back to text()/stream() even though the underlying stream was never read. Consider checking the format first (and throwing) before calling markAsConsumed(), or only calling markAsConsumed() inside the JSON-capable branches right before consuming the stream.
    this.markAsConsumed()
    return (await getAsText(this._stream)).toString()
  }

  /** See {@link BaseResultSet.json}. */
  async json<T>(): Promise<ResultJSONType<T, Format>> {
    this.markAsConsumed()
    // JSONEachRow, etc.
    if (isStreamableJSONFamily(this.format as DataFormat)) {
      const result: T[] = []
      const stream = this._streamImpl<T>()
      for await (const rows of stream) {
        for (const row of rows) {
          result.push(row.json() as T)
        }
      }
      return result as any
    }
    // JSON, JSONObjectEachRow, etc.
    if (isNotStreamableJSONFamily(this.format as DataFormat)) {
      const text = await getAsText(this._stream)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread packages/client-node/src/result_set.ts Outdated
Comment thread packages/client-node/__tests__/unit/node_result_set.test.ts
Address Copilot review feedback:

1. Move validateStreamFormat() before markAsConsumed() in stream().
   Previously, if the format was invalid, the ResultSet was permanently
   marked as consumed even though nothing was actually read, preventing
   a subsequent text() call from working.

2. Make regression test deterministic by overriding readableEnded to
   always return true, simulating a fast response. The old code would
   throw on this; the new code only checks the _consumed flag.
Comment thread packages/client-node/src/result_set.ts
Comment thread packages/client-node/src/result_set.ts Outdated
Comment thread packages/client-node/src/result_set.ts
Copy link
Copy Markdown
Collaborator

@peter-leonov-ch peter-leonov-ch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a bunch for looking into this tricky issue and bringing a fresh look at the problem. Let's see if we can avoid making "dark" API changes here while addressing the issue in question.

Address reviewer feedback from peter-leonov-ch:

Move `markAsConsumed()` from the top of `json()` into each branch
that actually consumes the stream (streamable JSON and non-streamable
JSON). The unsupported-format path (CSV, etc.) no longer marks the
ResultSet as consumed, preserving the ability to call `text()`
afterwards — matching the pre-PR exception semantics.

Add tests verifying:
- json() on CSV throws without marking consumed, text() still works
- stream() on non-streamable format throws, text() still works
Copilot AI review requested due to automatic review settings March 19, 2026 18:16
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +95 to +112
/**
* Mark the result set as consumed and throw if it was already consumed.
* Uses a boolean flag instead of checking `readableEnded` to avoid a race
* condition where the stream's 'end' event fires between two separate
* `readableEnded` checks (e.g. when `json()` calls `stream()` internally
* for JSONEachRow). See: https://github.com/ClickHouse/clickhouse-js/issues/575
*
* We intentionally do NOT check `readableEnded` here. A stream can have
* `readableEnded=true` (the 'end' event fired) while its data is still
* buffered and available for reading. Checking readableEnded would falsely
* reject the first consumption call for fast/small responses.
*/
private markAsConsumed(): void {
if (this._consumed) {
throw Error(streamAlreadyConsumedMessage)
}
this._consumed = true
}
@Onyx2406
Copy link
Copy Markdown
Author

@peter-leonov-ch can you review this please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ResultSet.json() may throw "Stream has been already consumed" on first call (JSONEachRow)

3 participants