feat(ResultSet): add rawStream() to access underlying stream without row parsing#638
feat(ResultSet): add rawStream() to access underlying stream without row parsing#638renatocron wants to merge 2 commits intoClickHouse:mainfrom
Conversation
…ithout row parsing Exposes the raw decompressed stream from ResultSet, useful when you need to pipe CSV/TSV data directly to a file or another stream without the overhead of row-by-row parsing via the Transform pipeline. This is particularly valuable for bulk data export scenarios (e.g., loading ClickHouse CSV into DuckDB via temp files) where .query() provides HTTP gzip compression (~15x smaller transfer) but the parsed .stream() adds unnecessary overhead. Implemented for both Node.js (Stream.Readable) and Web (ReadableStream). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR adds a rawStream() API to result sets so callers can access the underlying decompressed response stream directly (bypassing the row-parsing transform), enabling efficient bulk CSV/TSV exports while still benefiting from HTTP compression.
Changes:
- Extend
BaseResultSetwith a newrawStream()method and documentation. - Implement
rawStream()in both Node.js and WebResultSetimplementations. - Add Node.js unit tests covering
rawStream()basic behavior and consumption rules.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| packages/client-common/src/result.ts | Adds rawStream() to the shared BaseResultSet interface with usage/behavior docs. |
| packages/client-web/src/result_set.ts | Implements rawStream() for the web client by returning the underlying ReadableStream. |
| packages/client-node/src/result_set.ts | Implements rawStream() for Node by returning the underlying stream.Readable. |
| packages/client-node/tests/unit/node_result_set.test.ts | Adds unit tests for rawStream() and consumption interactions. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Addresses review feedback: - Node ResultSet now uses an explicit _consumed flag (like Web) instead of relying solely on readableEnded, ensuring mutual exclusion between rawStream()/text()/json()/stream() is enforced immediately. - Web close() no longer throws if called after a consumer method, allowing safe cleanup after rawStream() or stream(). - Added test for calling text()/stream() immediately after rawStream() without draining. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
I initially didn't want to include new prop |
|
I've looked at the PR more closely again. What makes me a bit hesitant to right away merge this IMO important feature:
I'd like to be able to give away a really raw unfiltered stream and give an option for the consumer to bolt on the exception detection as a stream transformer. This might require some refactoring and documentation. Having said this, the exception detection can wait, we can always just expose the raw response stream and document that it's ATM not supporting dealing with in-bound exceptions. WDYT? |
|
Thanks for the response, @peter-leonov-ch ! I agree on both points. The _consumed flag refactor (638e020) was mostly a response to Copilot's review comments and isn't strictly necessary - the original For the in-band exceptions, that's new to me, but I think the pragmatic path is to document that rawStream() does not handle them, and defer exception detection to a follow-up, maybe even mark them as rawStreamUnsafe (but I mean, raw is.. raw). We can always add an optional transform later without breaking the API. So the plan would be:
Is that right? |
Summary
rawStream()method toBaseResultSetinterface and both Node.js/Web implementationsMotivation
When using
.query()withCSVWithNamesformat, the client enables HTTP gzip compression automatically (~15x smaller transfer). However, the only way to get a stream is.stream(), which wraps the response in a row-parsing Transform — unnecessary overhead when you just want the raw CSV bytes.The alternative,
.exec(), returns a raw stream but does not enable HTTP compression, resulting in ~3x slower transfers for large datasets.rawStream()gives the best of both worlds: compressed transfer + raw byte stream.Benchmark (1.4M rows, 121MB CSV)
.exec().stream(no compression).query().text()(compressed, buffered).query().rawStream()(compressed, streaming)Test plan