Skip to content

feat(ResultSet): add rawStream() to access underlying stream without row parsing#638

Open
renatocron wants to merge 2 commits intoClickHouse:mainfrom
renatocron:feat/raw-stream
Open

feat(ResultSet): add rawStream() to access underlying stream without row parsing#638
renatocron wants to merge 2 commits intoClickHouse:mainfrom
renatocron:feat/raw-stream

Conversation

@renatocron
Copy link
Copy Markdown

Summary

  • Adds rawStream() method to BaseResultSet interface and both Node.js/Web implementations
  • Returns the raw decompressed stream without the row-parsing Transform pipeline
  • Useful for bulk CSV/TSV export where you want to pipe directly to a file

Motivation

When using .query() with CSVWithNames format, the client enables HTTP gzip compression automatically (~15x smaller transfer). However, the only way to get a stream is .stream(), which wraps the response in a row-parsing Transform — unnecessary overhead when you just want the raw CSV bytes.

The alternative, .exec(), returns a raw stream but does not enable HTTP compression, resulting in ~3x slower transfers for large datasets.

rawStream() gives the best of both worlds: compressed transfer + raw byte stream.

Benchmark (1.4M rows, 121MB CSV)

Method Transfer time
.exec().stream (no compression) ~6.3s
.query().text() (compressed, buffered) ~2.0s
.query().rawStream() (compressed, streaming) ~1.8s

Test plan

  • Unit tests for Node.js ResultSet: rawStream returns data, throws after consumption, mutual exclusion with text/json
  • All existing tests pass

…ithout row parsing

Exposes the raw decompressed stream from ResultSet, useful when you need
to pipe CSV/TSV data directly to a file or another stream without the
overhead of row-by-row parsing via the Transform pipeline.

This is particularly valuable for bulk data export scenarios (e.g.,
loading ClickHouse CSV into DuckDB via temp files) where .query()
provides HTTP gzip compression (~15x smaller transfer) but the parsed
.stream() adds unnecessary overhead.

Implemented for both Node.js (Stream.Readable) and Web (ReadableStream).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 24, 2026 19:52
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Mar 24, 2026

CLA assistant check
All committers have signed the CLA.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a rawStream() API to result sets so callers can access the underlying decompressed response stream directly (bypassing the row-parsing transform), enabling efficient bulk CSV/TSV exports while still benefiting from HTTP compression.

Changes:

  • Extend BaseResultSet with a new rawStream() method and documentation.
  • Implement rawStream() in both Node.js and Web ResultSet implementations.
  • Add Node.js unit tests covering rawStream() basic behavior and consumption rules.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
packages/client-common/src/result.ts Adds rawStream() to the shared BaseResultSet interface with usage/behavior docs.
packages/client-web/src/result_set.ts Implements rawStream() for the web client by returning the underlying ReadableStream.
packages/client-node/src/result_set.ts Implements rawStream() for Node by returning the underlying stream.Readable.
packages/client-node/tests/unit/node_result_set.test.ts Adds unit tests for rawStream() and consumption interactions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread packages/client-node/src/result_set.ts
Comment thread packages/client-web/src/result_set.ts
Comment thread packages/client-common/src/result.ts
Comment thread packages/client-node/__tests__/unit/node_result_set.test.ts Outdated
Addresses review feedback:
- Node ResultSet now uses an explicit _consumed flag (like Web) instead of
  relying solely on readableEnded, ensuring mutual exclusion between
  rawStream()/text()/json()/stream() is enforced immediately.
- Web close() no longer throws if called after a consumer method, allowing
  safe cleanup after rawStream() or stream().
- Added test for calling text()/stream() immediately after rawStream()
  without draining.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@renatocron
Copy link
Copy Markdown
Author

I initially didn't want to include new prop _consumed, but to be 100% wrong api usage, it's needed. Otherwise the previous code was fine

@peter-leonov-ch
Copy link
Copy Markdown
Collaborator

I've looked at the PR more closely again. What makes me a bit hesitant to right away merge this IMO important feature:

  • we're changing semantics a bit which I'm not sure we're ready to invest in testing
  • we'd need to support exceptions that are in-bound in the HTTP/1

I'd like to be able to give away a really raw unfiltered stream and give an option for the consumer to bolt on the exception detection as a stream transformer. This might require some refactoring and documentation.

Having said this, the exception detection can wait, we can always just expose the raw response stream and document that it's ATM not supporting dealing with in-bound exceptions.

WDYT?

@renatocron
Copy link
Copy Markdown
Author

Thanks for the response, @peter-leonov-ch !

I agree on both points. The _consumed flag refactor (638e020) was mostly a response to Copilot's review comments and isn't strictly necessary - the original readableEnded check was fine for the existing methods. I'm happy to revert that commit to keep the PR scoped to the new feature only.

For the in-band exceptions, that's new to me, but I think the pragmatic path is to document that rawStream() does not handle them, and defer exception detection to a follow-up, maybe even mark them as rawStreamUnsafe (but I mean, raw is.. raw). We can always add an optional transform later without breaking the API.

So the plan would be:

  • Revert 638e020
  • Add a note to the rawStream() docstring that in-band ClickHouse exceptions are not detected
  • Keep everything else as-is

Is that right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants