refactor: unify ResultSet implementations on Arrow-backed path#175
refactor: unify ResultSet implementations on Arrow-backed path#175mkaufmann wants to merge 15 commits into
Conversation
…ArrowStreamReader Rework the ResultSet unification to address two reviewer requests on #175: 1. Share the vector-building code with the parameter-encoding path instead of having a dedicated MetadataArrowBuilder. VectorPopulator now exposes a row-indexed primitive (setCell) used by both callers. The existing single-row parameter-binding overload and a new many-row metadata overload both funnel through it, and all the individual vector setters are parameterised by row index. 2. Keep ArrowStreamReaderCursor on its original ArrowStreamReader-only interface. The metadata path now serialises a populated VSR to Arrow IPC bytes and wraps the result in a ByteArrayInputStream-backed ArrowStreamReader, so both streaming and metadata result sets travel through exactly the same reader/cursor plumbing. Supporting changes: - typeName overrides (e.g. "TEXT" for JDBC-spec metadata columns) now round-trip through Arrow via a jdbc:type_name field-metadata key rather than a columns-override parameter on StreamingResultSet. HyperTypeToArrow stamps the key on write; ArrowToHyperTypeMapper.toColumnMetadata reads it back. - StreamingResultSet drops the ofInMemory(...) factory and the columns override; callers construct an ArrowStreamReader + BufferAllocator pair and hand them to of(reader, allocator, queryId, zone). The cursor owns both and closes reader-then-allocator on close. - QueryResultArrowStream.toArrowStreamReader returns a simple Result holder (reader + allocator) instead of an AutoCloseable bundle. - MetadataResultSets is the single entry point for Arrow-backed metadata result sets; MetadataArrowBuilder is deleted. - Empty metadata results skip writeBatch() entirely so ArrowStreamReaderCursor doesn't interpret a zero-row batch as "at least one row available". - Tests updated to the new API; StreamingResultSetMethodTest builds its in-memory ResultSet the same way as the metadata path (IPC round-trip).
9ecba8a to
08d62bc
Compare
…ArrowStreamReader Rework the ResultSet unification to address two reviewer requests on #175: 1. Share the vector-building code with the parameter-encoding path instead of having a dedicated MetadataArrowBuilder. VectorPopulator now exposes a row-indexed primitive (setCell) used by both callers. The existing single-row parameter-binding overload and a new many-row metadata overload both funnel through it, and all the individual vector setters are parameterised by row index. 2. Keep ArrowStreamReaderCursor on its original ArrowStreamReader-only interface. The metadata path now serialises a populated VSR to Arrow IPC bytes and wraps the result in a ByteArrayInputStream-backed ArrowStreamReader, so both streaming and metadata result sets travel through exactly the same reader/cursor plumbing. Supporting changes: - typeName overrides (e.g. "TEXT" for JDBC-spec metadata columns) now round-trip through Arrow via a jdbc:type_name field-metadata key rather than a columns-override parameter on StreamingResultSet. HyperTypeToArrow stamps the key on write; ArrowToHyperTypeMapper.toColumnMetadata reads it back. - StreamingResultSet drops the ofInMemory(...) factory and the columns override; callers construct an ArrowStreamReader + BufferAllocator pair and hand them to of(reader, allocator, queryId, zone). The cursor owns both and closes reader-then-allocator on close. - QueryResultArrowStream.toArrowStreamReader returns a simple Result holder (reader + allocator) instead of an AutoCloseable bundle. - MetadataResultSets is the single entry point for Arrow-backed metadata result sets; MetadataArrowBuilder is deleted. - Empty metadata results skip writeBatch() entirely so ArrowStreamReaderCursor doesn't interpret a zero-row batch as "at least one row available". - Tests updated to the new API; StreamingResultSetMethodTest builds its in-memory ResultSet the same way as the metadata path (IPC round-trip).
b648940 to
07125b1
Compare
Codecov Report❌ Patch coverage is ❌ Your patch check has failed because the patch coverage (76.53%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #175 +/- ##
============================================
- Coverage 82.37% 80.64% -1.73%
+ Complexity 1867 1706 -161
============================================
Files 125 123 -2
Lines 5009 4939 -70
Branches 537 521 -16
============================================
- Hits 4126 3983 -143
- Misses 641 725 +84
+ Partials 242 231 -11
🚀 New features to boost your workflow:
|
| } catch (SQLException ex) { | ||
| // Accessor does not implement typed getObject — fall back to raw + isInstance check. | ||
| val raw = accessor.getObject(); | ||
| updateWasNull(accessor); | ||
| if (raw == null) { | ||
| return null; | ||
| } | ||
| if (type.isInstance(raw)) { | ||
| return type.cast(raw); | ||
| } | ||
| throw new SQLException( | ||
| "Cannot convert column value to " + type.getName() + "; actual type is " | ||
| + raw.getClass().getName(), | ||
| ex); | ||
| } |
There was a problem hiding this comment.
Isn't this putting a bandaid over bad behavior of the accessors? Shouldn't the accessor already implement this behavior so that we don't need the bandaid here? If true check if if this proper fix can be factored into a separate commit (before the others on this branch) so that I can maybe ship it independently
There was a problem hiding this comment.
Agreed — fixed in 322663e by giving QueryJDBCAccessor.getObject(Class) a default raw + isInstance implementation, so accessors that don't need typed conversion just inherit the right behavior. Then a6da614 collapses StreamingResultSet.getObject(int, Class) to a direct dispatch (no try/catch).
Per your "factor into a separate commit at the start of the branch" ask: 322663e is now the first commit on the branch (touches only QueryJDBCAccessor.java, applies cleanly on main), so it can be cherry-picked and shipped independently of the rest of this PR. The follow-up that drops the bandaid (a6da614) sits at the end since it depends on the rest of the unified StreamingResultSet.
| * <p>The column metadata (including any {@link ColumnMetadata#getTypeName()} override | ||
| * stamped under {@link com.salesforce.datacloud.jdbc.protocol.data.HyperTypeToArrow#JDBC_TYPE_NAME_METADATA_KEY}) | ||
| * is derived from the Arrow schema via {@link ArrowToHyperTypeMapper#toColumnMetadata(org.apache.arrow.vector.types.pojo.Field)}. |
There was a problem hiding this comment.
This is too low level comment, remove
There was a problem hiding this comment.
Removed in e329860 — the paragraph that mentioned JDBC_TYPE_NAME_METADATA_KEY is gone; the docstring now stays at the level of "this is what of(reader, allocator, queryId, sessionZone) does" and leaves the field-metadata details documented at HyperTypeToArrow / ColumnMetadata where they live.
| /** | ||
| * Hand the reader + allocator pair from {@link QueryResultArrowStream.Result} to {@link | ||
| * #of(ArrowStreamReader, BufferAllocator, String, ZoneId)} and close both on construction | ||
| * failure. Without this, an {@code of} call that throws (for example {@code SQLException} | ||
| * wrapping an unsupported Arrow type) would leak the 100 MB | ||
| * {@link org.apache.arrow.memory.RootAllocator} held by {@code Result}. | ||
| */ | ||
| public static StreamingResultSet ofClosingOnFailure( | ||
| QueryResultArrowStream.Result arrowStream, String queryId, ZoneId sessionZone) throws SQLException { | ||
| try { | ||
| return of(arrowStream.getReader(), arrowStream.getAllocator(), queryId, sessionZone); | ||
| } catch (SQLException | RuntimeException ex) { | ||
| try { | ||
| arrowStream.getReader().close(); | ||
| } catch (Exception suppressed) { | ||
| ex.addSuppressed(suppressed); | ||
| } | ||
| arrowStream.getAllocator().close(); | ||
| throw ex; | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
This looks like an independent fix, if yes factor (and the tests of course) into a seperate commit at the start of the branch so that I can potentially ship it independently
There was a problem hiding this comment.
Looked at this carefully — the fix is already in its own commit (26bdc37 fix: close allocator on StreamingResultSet.of failure in query path with its regression test in StreamingResultSetMethodTest), but I left it where it is on the branch rather than moving it to the very front. The reason: as coded, this fix is not independent of the unify refactor. Specifically, it adds StreamingResultSet.ofClosingOnFailure(QueryResultArrowStream.Result, ...) and switches the four call sites to it — and on main:
QueryResultArrowStreamdoesn't return aResultwrapping(reader, allocator);toArrowStreamReaderjust returns anArrowStreamReader.StreamingResultSet.of(...)only takes(reader, queryId)— there's no allocator to leak on construction failure, because the allocator isn't owned by the result set onmainat all.
So "pull this fix to the front of the branch" would mean rewriting the fix as a different fix against the pre-unify surface, which feels like it deserves its own decision rather than a sneaky reorder here. Two options if you'd like to ship the leak fix independently of this PR:
- I can carve out a minimal "pre-unify" leak fix against
maindirectly (different shape — likely closes the allocator that today lives insideQueryResultArrowStreamrather than inStreamingResultSet) and open it as its own PR. Happy to do this. - Land it as part of this PR and treat the unify+leak-fix as a unit.
Lean toward (1) if you want to ship it ahead of this PR, otherwise (2) is fine. Let me know.
There was a problem hiding this comment.
Try to move it to a separate commit (along with test coverage). Would it make sense to move the QueryResultArrowStream as well then?
There was a problem hiding this comment.
Honest answer: yes, that would make the leak fix sit on a clean base — but it's a non-trivial split because the unify refactor is what introduced the allocator-ownership concept (Result holder, cursor's BufferAllocator field, StreamingResultSet.of taking an allocator). To pull "QueryResultArrowStream introduces Result" + "StreamingResultSet/cursor own the allocator" + "ofClosingOnFailure leak fix" forward as a single commit before the unify refactor, I'd be effectively breaking the unify commit into two halves and rebuilding the unify on top of the pre-extracted half. Doable, but it means rewriting a ~800-line refactor.
Given (a) you have one substantive comment left to evaluate (whether commit 2088116 lands as a separate PR), and (b) the leak fix is already in its own commit (4d04fbb, regression test included), I'd propose:
- Now: I land 2088116 (cursor zero-row-skip fix + tests against main's signature) as the cherry-pickable first commit. Same for 093d692 (
getObject(Class)base-class fallback). - If you want the leak fix shipped independently: I cherry-pick 4d04fbb's intent into a new PR against
main, reshaping the fix to whatever the pre-unify code's leak surface actually is (likely a fix inQueryResultArrowStream.toArrowStreamReaderto wrap theArrowStreamReaderso its close also closes theRootAllocator— cleaner than threading the allocator throughStreamingResultSet.of). - This PR stays as-is for the leak fix: it lands as part of the unified refactor.
Lean toward not splitting the unify commit unless there's a strong reason. Want me to open the separate "main-targeted leak fix" PR per (2)?
| /** | ||
| * Load the next batch that has at least one row, skipping any zero-row batches in between. | ||
| * Empty IPC batches are valid per the Arrow spec and can be emitted by upstream producers | ||
| * (long-running queries returning empty initial chunks, schema-only metadata streams), so | ||
| * "successfully loaded" must not be conflated with "row available." | ||
| */ | ||
| private boolean loadNextNonEmptyBatch() throws IOException { |
There was a problem hiding this comment.
Why do we need to skip empty IPC batches if they are valid? Shouldn't callers handle them correctly?
There was a problem hiding this comment.
Fair question. The cursor is the caller for a JDBC ResultSet — there's no layer above it that has a sensible "I just consumed an empty batch, please advance" hook to expose. Sharpened the docstring on this method (e329860) to explain it directly:
Empty IPC batches are valid per the Arrow spec, and producers in the wild emit them — Hyper / query-federator can send a schema-only initial chunk, and async chunked-query paths use empty batches as a keep-alive between data chunks. JDBC has no notion of a "batch":
ResultSet.next()only knows "is there another row?". This cursor is the single place that translates batch-level signals fromArrowStreamReader.loadNextBatch()into row-level advances, so it must consume the empty batches itself rather than push that responsibility outward.
Same fix is also independently shipped as commit 4d2e47f (fix: skip zero-row batches in ArrowStreamReaderCursor). I left that commit where it is on the branch rather than moving it to the start, because as coded it also removes the now-redundant empty-batch workaround in MetadataResultSets.writeArrowStream — which doesn't exist on main, so the pre-unify version of the fix would have a different shape (just the cursor side, no metadata-side cleanup). Happy to carve that out if you want a main-targeted preview.
There was a problem hiding this comment.
Can you move this to an independent commit along with test coverage?
There was a problem hiding this comment.
Done — carved out as commit 2088116 fix: skip zero-row batches in ArrowStreamReaderCursor, now the first commit on the branch (touches only ArrowStreamReaderCursor.java + ArrowStreamReaderCursorTest.java, applies cleanly on main's pre-unify cursor signature (reader, sessionZone)). It can be cherry-picked and shipped independently.
The downstream commit 049e82d (formerly the same-named "skip zero-row batches" commit) now contains only the residual: removing the now-redundant MetadataResultSets.writeArrowStream "skip writeBatch when rowCount==0" workaround, since the cursor handles the empty-only case at the seam.
Test coverage that ships with the pre-unify commit:
skipsZeroRowBatchAndYieldsSubsequentNonEmptyRows— real Arrow IPC stream of {0-row, 1-row} → next() reports the one real row, then false.zeroRowOnlyBatchYieldsNoRows— real Arrow IPC stream of {0-row} → next() returns false.firstNextReturnsTrueWhenInitialBatchHasRows/firstNextReturnsFalseWhenStreamHasNoBatches— pure-mock pinning of the first-batch logic, replacing the oldforwardsLoadNextBatch(true|false)parameterised test that would loop forever under the new control flow.
…ArrowStreamReader Rework the ResultSet unification to address two reviewer requests on #175: 1. Share the vector-building code with the parameter-encoding path instead of having a dedicated MetadataArrowBuilder. VectorPopulator now exposes a row-indexed primitive (setCell) used by both callers. The existing single-row parameter-binding overload and a new many-row metadata overload both funnel through it, and all the individual vector setters are parameterised by row index. 2. Keep ArrowStreamReaderCursor on its original ArrowStreamReader-only interface. The metadata path now serialises a populated VSR to Arrow IPC bytes and wraps the result in a ByteArrayInputStream-backed ArrowStreamReader, so both streaming and metadata result sets travel through exactly the same reader/cursor plumbing. Supporting changes: - typeName overrides (e.g. "TEXT" for JDBC-spec metadata columns) now round-trip through Arrow via a jdbc:type_name field-metadata key rather than a columns-override parameter on StreamingResultSet. HyperTypeToArrow stamps the key on write; ArrowToHyperTypeMapper.toColumnMetadata reads it back. - StreamingResultSet drops the ofInMemory(...) factory and the columns override; callers construct an ArrowStreamReader + BufferAllocator pair and hand them to of(reader, allocator, queryId, zone). The cursor owns both and closes reader-then-allocator on close. - QueryResultArrowStream.toArrowStreamReader returns a simple Result holder (reader + allocator) instead of an AutoCloseable bundle. - MetadataResultSets is the single entry point for Arrow-backed metadata result sets; MetadataArrowBuilder is deleted. - Empty metadata results skip writeBatch() entirely so ArrowStreamReaderCursor doesn't interpret a zero-row batch as "at least one row available". - Tests updated to the new API; StreamingResultSetMethodTest builds its in-memory ResultSet the same way as the metadata path (IPC round-trip).
Now that QueryJDBCAccessor.getObject(Class) provides the raw + isInstance fallback as its base-class default, StreamingResultSet no longer needs the catch-and-retry path that worked around accessors which threw "Operation not supported." Collapse getObject(int, Class) to direct dispatch and update the regression test's WHY comment to point at the accessor base class as the load-bearing layer. Addresses: review comment on PR #175 line 388.
Three small follow-ups from PR #175 review: - StreamingResultSet.of: drop the paragraph that pointed at the HyperTypeToArrow.JDBC_TYPE_NAME_METADATA_KEY field-metadata key. The docstring spilled implementation detail of the metadata-stamping path into a generic "create a result set from a reader" entry-point; the type-name override is documented at HyperTypeToArrow / ColumnMetadata where it's relevant. - ArrowStreamReaderCursor.loadNextNonEmptyBatch: rewrite the rationale to answer "why does the cursor consume empty batches instead of the caller?" directly. Empty IPC batches are valid Arrow and producers emit them; JDBC's next() only knows rows, so this cursor is the seam that translates batch-level signals into row-level advances. - MetadataResultSetsTest: drop the JDBC ResultSet-shape slice (next / isClosed / getStatement / unwrap / isWrapperFor / getHoldability / getFetchSize / setFetchSize / getWarnings / getConcurrency / getType / getFetchDirection). Those test the StreamingResultSet plumbing shared by every result set on this branch and are already covered by StreamingResultSetMethodTest. Keep the arity-contract slice (short/long/right/null/empty rows) — that is the metadata-result-set-specific behavior. Addresses: review comments on PR #175.
07125b1 to
e329860
Compare
…ArrowStreamReader Rework the ResultSet unification to address two reviewer requests on #175: 1. Share the vector-building code with the parameter-encoding path instead of having a dedicated MetadataArrowBuilder. VectorPopulator now exposes a row-indexed primitive (setCell) used by both callers. The existing single-row parameter-binding overload and a new many-row metadata overload both funnel through it, and all the individual vector setters are parameterised by row index. 2. Keep ArrowStreamReaderCursor on its original ArrowStreamReader-only interface. The metadata path now serialises a populated VSR to Arrow IPC bytes and wraps the result in a ByteArrayInputStream-backed ArrowStreamReader, so both streaming and metadata result sets travel through exactly the same reader/cursor plumbing. Supporting changes: - typeName overrides (e.g. "TEXT" for JDBC-spec metadata columns) now round-trip through Arrow via a jdbc:type_name field-metadata key rather than a columns-override parameter on StreamingResultSet. HyperTypeToArrow stamps the key on write; ArrowToHyperTypeMapper.toColumnMetadata reads it back. - StreamingResultSet drops the ofInMemory(...) factory and the columns override; callers construct an ArrowStreamReader + BufferAllocator pair and hand them to of(reader, allocator, queryId, zone). The cursor owns both and closes reader-then-allocator on close. - QueryResultArrowStream.toArrowStreamReader returns a simple Result holder (reader + allocator) instead of an AutoCloseable bundle. - MetadataResultSets is the single entry point for Arrow-backed metadata result sets; MetadataArrowBuilder is deleted. - Empty metadata results skip writeBatch() entirely so ArrowStreamReaderCursor doesn't interpret a zero-row batch as "at least one row available". - Tests updated to the new API; StreamingResultSetMethodTest builds its in-memory ResultSet the same way as the metadata path (IPC round-trip).
Now that QueryJDBCAccessor.getObject(Class) provides the raw + isInstance fallback as its base-class default, StreamingResultSet no longer needs the catch-and-retry path that worked around accessors which threw "Operation not supported." Collapse getObject(int, Class) to direct dispatch and update the regression test's WHY comment to point at the accessor base class as the load-bearing layer. Addresses: review comment on PR #175 line 388.
Three small follow-ups from PR #175 review: - StreamingResultSet.of: drop the paragraph that pointed at the HyperTypeToArrow.JDBC_TYPE_NAME_METADATA_KEY field-metadata key. The docstring spilled implementation detail of the metadata-stamping path into a generic "create a result set from a reader" entry-point; the type-name override is documented at HyperTypeToArrow / ColumnMetadata where it's relevant. - ArrowStreamReaderCursor.loadNextNonEmptyBatch: rewrite the rationale to answer "why does the cursor consume empty batches instead of the caller?" directly. Empty IPC batches are valid Arrow and producers emit them; JDBC's next() only knows rows, so this cursor is the seam that translates batch-level signals into row-level advances. - MetadataResultSetsTest: drop the JDBC ResultSet-shape slice (next / isClosed / getStatement / unwrap / isWrapperFor / getHoldability / getFetchSize / setFetchSize / getWarnings / getConcurrency / getType / getFetchDirection). Those test the StreamingResultSet plumbing shared by every result set on this branch and are already covered by StreamingResultSetMethodTest. Keep the arity-contract slice (short/long/right/null/empty rows) — that is the metadata-result-set-specific behavior. Addresses: review comments on PR #175.
e329860 to
d17f1b0
Compare
|
Per the two review threads, split out the cherry-pickable fixes as their own PRs against
This PR (#175) keeps the same fixes as the first two commits — when #185 / #186 land, those commits will collapse to no-ops at rebase time. For the remaining "should QueryResultArrowStream allocator-ownership move pre-unify too?" thread (#175 review): waiting on your call before I do that split. As I noted there, it's a non-trivial surgery on the unify commit and I'd rather get your sign-off before rewriting ~800 lines of refactor. |
…ArrowStreamReader Rework the ResultSet unification to address two reviewer requests on #175: 1. Share the vector-building code with the parameter-encoding path instead of having a dedicated MetadataArrowBuilder. VectorPopulator now exposes a row-indexed primitive (setCell) used by both callers. The existing single-row parameter-binding overload and a new many-row metadata overload both funnel through it, and all the individual vector setters are parameterised by row index. 2. Keep ArrowStreamReaderCursor on its original ArrowStreamReader-only interface. The metadata path now serialises a populated VSR to Arrow IPC bytes and wraps the result in a ByteArrayInputStream-backed ArrowStreamReader, so both streaming and metadata result sets travel through exactly the same reader/cursor plumbing. Supporting changes: - typeName overrides (e.g. "TEXT" for JDBC-spec metadata columns) now round-trip through Arrow via a jdbc:type_name field-metadata key rather than a columns-override parameter on StreamingResultSet. HyperTypeToArrow stamps the key on write; ArrowToHyperTypeMapper.toColumnMetadata reads it back. - StreamingResultSet drops the ofInMemory(...) factory and the columns override; callers construct an ArrowStreamReader + BufferAllocator pair and hand them to of(reader, allocator, queryId, zone). The cursor owns both and closes reader-then-allocator on close. - QueryResultArrowStream.toArrowStreamReader returns a simple Result holder (reader + allocator) instead of an AutoCloseable bundle. - MetadataResultSets is the single entry point for Arrow-backed metadata result sets; MetadataArrowBuilder is deleted. - Empty metadata results skip writeBatch() entirely so ArrowStreamReaderCursor doesn't interpret a zero-row batch as "at least one row available". - Tests updated to the new API; StreamingResultSetMethodTest builds its in-memory ResultSet the same way as the metadata path (IPC round-trip).
Now that QueryJDBCAccessor.getObject(Class) provides the raw + isInstance fallback as its base-class default, StreamingResultSet no longer needs the catch-and-retry path that worked around accessors which threw "Operation not supported." Collapse getObject(int, Class) to direct dispatch and update the regression test's WHY comment to point at the accessor base class as the load-bearing layer. Addresses: review comment on PR #175 line 388.
Three small follow-ups from PR #175 review: - StreamingResultSet.of: drop the paragraph that pointed at the HyperTypeToArrow.JDBC_TYPE_NAME_METADATA_KEY field-metadata key. The docstring spilled implementation detail of the metadata-stamping path into a generic "create a result set from a reader" entry-point; the type-name override is documented at HyperTypeToArrow / ColumnMetadata where it's relevant. - ArrowStreamReaderCursor.loadNextNonEmptyBatch: rewrite the rationale to answer "why does the cursor consume empty batches instead of the caller?" directly. Empty IPC batches are valid Arrow and producers emit them; JDBC's next() only knows rows, so this cursor is the seam that translates batch-level signals into row-level advances. - MetadataResultSetsTest: drop the JDBC ResultSet-shape slice (next / isClosed / getStatement / unwrap / isWrapperFor / getHoldability / getFetchSize / setFetchSize / getWarnings / getConcurrency / getType / getFetchDirection). Those test the StreamingResultSet plumbing shared by every result set on this branch and are already covered by StreamingResultSetMethodTest. Keep the arity-contract slice (short/long/right/null/empty rows) — that is the metadata-result-set-specific behavior. Addresses: review comments on PR #175.
d17f1b0 to
f4cad29
Compare
…ArrowStreamReader Rework the ResultSet unification to address two reviewer requests on #175: 1. Share the vector-building code with the parameter-encoding path instead of having a dedicated MetadataArrowBuilder. VectorPopulator now exposes a row-indexed primitive (setCell) used by both callers. The existing single-row parameter-binding overload and a new many-row metadata overload both funnel through it, and all the individual vector setters are parameterised by row index. 2. Keep ArrowStreamReaderCursor on its original ArrowStreamReader-only interface. The metadata path now serialises a populated VSR to Arrow IPC bytes and wraps the result in a ByteArrayInputStream-backed ArrowStreamReader, so both streaming and metadata result sets travel through exactly the same reader/cursor plumbing. Supporting changes: - typeName overrides (e.g. "TEXT" for JDBC-spec metadata columns) now round-trip through Arrow via a jdbc:type_name field-metadata key rather than a columns-override parameter on StreamingResultSet. HyperTypeToArrow stamps the key on write; ArrowToHyperTypeMapper.toColumnMetadata reads it back. - StreamingResultSet drops the ofInMemory(...) factory and the columns override; callers construct an ArrowStreamReader + BufferAllocator pair and hand them to of(reader, allocator, queryId, zone). The cursor owns both and closes reader-then-allocator on close. - QueryResultArrowStream.toArrowStreamReader returns a simple Result holder (reader + allocator) instead of an AutoCloseable bundle. - MetadataResultSets is the single entry point for Arrow-backed metadata result sets; MetadataArrowBuilder is deleted. - Empty metadata results skip writeBatch() entirely so ArrowStreamReaderCursor doesn't interpret a zero-row batch as "at least one row available". - Tests updated to the new API; StreamingResultSetMethodTest builds its in-memory ResultSet the same way as the metadata path (IPC round-trip).
Now that QueryJDBCAccessor.getObject(Class) provides the raw + isInstance fallback as its base-class default, StreamingResultSet no longer needs the catch-and-retry path that worked around accessors which threw "Operation not supported." Collapse getObject(int, Class) to direct dispatch and update the regression test's WHY comment to point at the accessor base class as the load-bearing layer. Addresses: review comment on PR #175 line 388.
Three small follow-ups from PR #175 review: - StreamingResultSet.of: drop the paragraph that pointed at the HyperTypeToArrow.JDBC_TYPE_NAME_METADATA_KEY field-metadata key. The docstring spilled implementation detail of the metadata-stamping path into a generic "create a result set from a reader" entry-point; the type-name override is documented at HyperTypeToArrow / ColumnMetadata where it's relevant. - ArrowStreamReaderCursor.loadNextNonEmptyBatch: rewrite the rationale to answer "why does the cursor consume empty batches instead of the caller?" directly. Empty IPC batches are valid Arrow and producers emit them; JDBC's next() only knows rows, so this cursor is the seam that translates batch-level signals into row-level advances. - MetadataResultSetsTest: drop the JDBC ResultSet-shape slice (next / isClosed / getStatement / unwrap / isWrapperFor / getHoldability / getFetchSize / setFetchSize / getWarnings / getConcurrency / getType / getFetchDirection). Those test the StreamingResultSet plumbing shared by every result set on this branch and are already covered by StreamingResultSetMethodTest. Keep the arity-contract slice (short/long/right/null/empty rows) — that is the metadata-result-set-specific behavior. Addresses: review comments on PR #175.
67ddd24 to
b97abe2
Compare
Now that QueryJDBCAccessor.getObject(Class) provides the raw + isInstance fallback as its base-class default, StreamingResultSet no longer needs the catch-and-retry path that worked around accessors which threw "Operation not supported." Collapse getObject(int, Class) to direct dispatch and update the regression test's WHY comment to point at the accessor base class as the load-bearing layer. Addresses: review comment on PR #175 line 388.
Three small follow-ups from PR #175 review: - StreamingResultSet.of: drop the paragraph that pointed at the HyperTypeToArrow.JDBC_TYPE_NAME_METADATA_KEY field-metadata key. The docstring spilled implementation detail of the metadata-stamping path into a generic "create a result set from a reader" entry-point; the type-name override is documented at HyperTypeToArrow / ColumnMetadata where it's relevant. - ArrowStreamReaderCursor.loadNextNonEmptyBatch: rewrite the rationale to answer "why does the cursor consume empty batches instead of the caller?" directly. Empty IPC batches are valid Arrow and producers emit them; JDBC's next() only knows rows, so this cursor is the seam that translates batch-level signals into row-level advances. - MetadataResultSetsTest: drop the JDBC ResultSet-shape slice (next / isClosed / getStatement / unwrap / isWrapperFor / getHoldability / getFetchSize / setFetchSize / getWarnings / getConcurrency / getType / getFetchDirection). Those test the StreamingResultSet plumbing shared by every result set on this branch and are already covered by StreamingResultSetMethodTest. Keep the arity-contract slice (short/long/right/null/empty rows) — that is the metadata-result-set-specific behavior. Addresses: review comments on PR #175.
…ArrowStreamReader Rework the ResultSet unification to address two reviewer requests on #175: 1. Share the vector-building code with the parameter-encoding path instead of having a dedicated MetadataArrowBuilder. VectorPopulator now exposes a row-indexed primitive (setCell) used by both callers. The existing single-row parameter-binding overload and a new many-row metadata overload both funnel through it, and all the individual vector setters are parameterised by row index. 2. Keep ArrowStreamReaderCursor on its original ArrowStreamReader-only interface. The metadata path now serialises a populated VSR to Arrow IPC bytes and wraps the result in a ByteArrayInputStream-backed ArrowStreamReader, so both streaming and metadata result sets travel through exactly the same reader/cursor plumbing. Supporting changes: - typeName overrides (e.g. "TEXT" for JDBC-spec metadata columns) now round-trip through Arrow via a jdbc:type_name field-metadata key rather than a columns-override parameter on StreamingResultSet. HyperTypeToArrow stamps the key on write; ArrowToHyperTypeMapper.toColumnMetadata reads it back. - StreamingResultSet drops the ofInMemory(...) factory and the columns override; callers construct an ArrowStreamReader + BufferAllocator pair and hand them to of(reader, allocator, queryId, zone). The cursor owns both and closes reader-then-allocator on close. - QueryResultArrowStream.toArrowStreamReader returns a simple Result holder (reader + allocator) instead of an AutoCloseable bundle. - MetadataResultSets is the single entry point for Arrow-backed metadata result sets; MetadataArrowBuilder is deleted. - Empty metadata results skip writeBatch() entirely so ArrowStreamReaderCursor doesn't interpret a zero-row batch as "at least one row available". - Tests updated to the new API; StreamingResultSetMethodTest builds its in-memory ResultSet the same way as the metadata path (IPC round-trip).
Now that QueryJDBCAccessor.getObject(Class) provides the raw + isInstance fallback as its base-class default, StreamingResultSet no longer needs the catch-and-retry path that worked around accessors which threw "Operation not supported." Collapse getObject(int, Class) to direct dispatch and update the regression test's WHY comment to point at the accessor base class as the load-bearing layer. Addresses: review comment on PR #175 line 388.
Three small follow-ups from PR #175 review: - StreamingResultSet.of: drop the paragraph that pointed at the HyperTypeToArrow.JDBC_TYPE_NAME_METADATA_KEY field-metadata key. The docstring spilled implementation detail of the metadata-stamping path into a generic "create a result set from a reader" entry-point; the type-name override is documented at HyperTypeToArrow / ColumnMetadata where it's relevant. - ArrowStreamReaderCursor.loadNextNonEmptyBatch: rewrite the rationale to answer "why does the cursor consume empty batches instead of the caller?" directly. Empty IPC batches are valid Arrow and producers emit them; JDBC's next() only knows rows, so this cursor is the seam that translates batch-level signals into row-level advances. - MetadataResultSetsTest: drop the JDBC ResultSet-shape slice (next / isClosed / getStatement / unwrap / isWrapperFor / getHoldability / getFetchSize / setFetchSize / getWarnings / getConcurrency / getType / getFetchDirection). Those test the StreamingResultSet plumbing shared by every result set on this branch and are already covered by StreamingResultSetMethodTest. Keep the arity-contract slice (short/long/right/null/empty rows) — that is the metadata-result-set-specific behavior. Addresses: review comments on PR #175.
b97abe2 to
8c8254f
Compare
…ArrowStreamReader Rework the ResultSet unification to address two reviewer requests on #175: 1. Share the vector-building code with the parameter-encoding path instead of having a dedicated MetadataArrowBuilder. VectorPopulator now exposes a row-indexed primitive (setCell) used by both callers. The existing single-row parameter-binding overload and a new many-row metadata overload both funnel through it, and all the individual vector setters are parameterised by row index. 2. Keep ArrowStreamReaderCursor on its original ArrowStreamReader-only interface. The metadata path now serialises a populated VSR to Arrow IPC bytes and wraps the result in a ByteArrayInputStream-backed ArrowStreamReader, so both streaming and metadata result sets travel through exactly the same reader/cursor plumbing. Supporting changes: - typeName overrides (e.g. "TEXT" for JDBC-spec metadata columns) now round-trip through Arrow via a jdbc:type_name field-metadata key rather than a columns-override parameter on StreamingResultSet. HyperTypeToArrow stamps the key on write; ArrowToHyperTypeMapper.toColumnMetadata reads it back. - StreamingResultSet drops the ofInMemory(...) factory and the columns override; callers construct an ArrowStreamReader + BufferAllocator pair and hand them to of(reader, allocator, queryId, zone). The cursor owns both and closes reader-then-allocator on close. - QueryResultArrowStream.toArrowStreamReader returns a simple Result holder (reader + allocator) instead of an AutoCloseable bundle. - MetadataResultSets is the single entry point for Arrow-backed metadata result sets; MetadataArrowBuilder is deleted. - Empty metadata results skip writeBatch() entirely so ArrowStreamReaderCursor doesn't interpret a zero-row batch as "at least one row available". - Tests updated to the new API; StreamingResultSetMethodTest builds its in-memory ResultSet the same way as the metadata path (IPC round-trip).
Now that QueryJDBCAccessor.getObject(Class) provides the raw + isInstance fallback as its base-class default, StreamingResultSet no longer needs the catch-and-retry path that worked around accessors which threw "Operation not supported." Collapse getObject(int, Class) to direct dispatch and update the regression test's WHY comment to point at the accessor base class as the load-bearing layer. Addresses: review comment on PR #175 line 388.
Three small follow-ups from PR #175 review: - StreamingResultSet.of: drop the paragraph that pointed at the HyperTypeToArrow.JDBC_TYPE_NAME_METADATA_KEY field-metadata key. The docstring spilled implementation detail of the metadata-stamping path into a generic "create a result set from a reader" entry-point; the type-name override is documented at HyperTypeToArrow / ColumnMetadata where it's relevant. - ArrowStreamReaderCursor.loadNextNonEmptyBatch: rewrite the rationale to answer "why does the cursor consume empty batches instead of the caller?" directly. Empty IPC batches are valid Arrow and producers emit them; JDBC's next() only knows rows, so this cursor is the seam that translates batch-level signals into row-level advances. - MetadataResultSetsTest: drop the JDBC ResultSet-shape slice (next / isClosed / getStatement / unwrap / isWrapperFor / getHoldability / getFetchSize / setFetchSize / getWarnings / getConcurrency / getType / getFetchDirection). Those test the StreamingResultSet plumbing shared by every result set on this branch and are already covered by StreamingResultSetMethodTest. Keep the arity-contract slice (short/long/right/null/empty rows) — that is the metadata-result-set-specific behavior. Addresses: review comments on PR #175.
e40a2a8 to
84391cd
Compare
…ArrowStreamReader Rework the ResultSet unification to address two reviewer requests on #175: 1. Share the vector-building code with the parameter-encoding path instead of having a dedicated MetadataArrowBuilder. VectorPopulator now exposes a row-indexed primitive (setCell) used by both callers. The existing single-row parameter-binding overload and a new many-row metadata overload both funnel through it, and all the individual vector setters are parameterised by row index. 2. Keep ArrowStreamReaderCursor on its original ArrowStreamReader-only interface. The metadata path now serialises a populated VSR to Arrow IPC bytes and wraps the result in a ByteArrayInputStream-backed ArrowStreamReader, so both streaming and metadata result sets travel through exactly the same reader/cursor plumbing. Supporting changes: - typeName overrides (e.g. "TEXT" for JDBC-spec metadata columns) now round-trip through Arrow via a jdbc:type_name field-metadata key rather than a columns-override parameter on StreamingResultSet. HyperTypeToArrow stamps the key on write; ArrowToHyperTypeMapper.toColumnMetadata reads it back. - StreamingResultSet drops the ofInMemory(...) factory and the columns override; callers construct an ArrowStreamReader + BufferAllocator pair and hand them to of(reader, allocator, queryId, zone). The cursor owns both and closes reader-then-allocator on close. - QueryResultArrowStream.toArrowStreamReader returns a simple Result holder (reader + allocator) instead of an AutoCloseable bundle. - MetadataResultSets is the single entry point for Arrow-backed metadata result sets; MetadataArrowBuilder is deleted. - Empty metadata results skip writeBatch() entirely so ArrowStreamReaderCursor doesn't interpret a zero-row batch as "at least one row available". - Tests updated to the new API; StreamingResultSetMethodTest builds its in-memory ResultSet the same way as the metadata path (IPC round-trip).
Now that QueryJDBCAccessor.getObject(Class) provides the raw + isInstance fallback as its base-class default, StreamingResultSet no longer needs the catch-and-retry path that worked around accessors which threw "Operation not supported." Collapse getObject(int, Class) to direct dispatch and update the regression test's WHY comment to point at the accessor base class as the load-bearing layer. Addresses: review comment on PR #175 line 388.
Three small follow-ups from PR #175 review: - StreamingResultSet.of: drop the paragraph that pointed at the HyperTypeToArrow.JDBC_TYPE_NAME_METADATA_KEY field-metadata key. The docstring spilled implementation detail of the metadata-stamping path into a generic "create a result set from a reader" entry-point; the type-name override is documented at HyperTypeToArrow / ColumnMetadata where it's relevant. - ArrowStreamReaderCursor.loadNextNonEmptyBatch: rewrite the rationale to answer "why does the cursor consume empty batches instead of the caller?" directly. Empty IPC batches are valid Arrow and producers emit them; JDBC's next() only knows rows, so this cursor is the seam that translates batch-level signals into row-level advances. - MetadataResultSetsTest: drop the JDBC ResultSet-shape slice (next / isClosed / getStatement / unwrap / isWrapperFor / getHoldability / getFetchSize / setFetchSize / getWarnings / getConcurrency / getType / getFetchDirection). Those test the StreamingResultSet plumbing shared by every result set on this branch and are already covered by StreamingResultSetMethodTest. Keep the arity-contract slice (short/long/right/null/empty rows) — that is the metadata-result-set-specific behavior. Addresses: review comments on PR #175.
84391cd to
d90bb9a
Compare
QueryJDBCAccessor.getObject(Class) threw "Operation not supported" no matter what. That's a problem: JDBC says ResultSet.getObject(int, Class<T>) is supposed to return the value as T when the conversion is trivial, but every accessor that didn't override it would throw, even on the identity case like getObject(col, String.class) against a VARCHAR. Callers had to either know which accessors implement typed conversion or wrap calls in catch-and-retry. The new default covers the trivial cases: 1. null type -> SQLException with SQLState 22023. 2. String.class -> delegate to getString(). JDBC 4.2 Table B-5 mandates this conversion for every column type. 3. Otherwise raw + isInstance, accepting any supertype/interface match. wasNull is set defensively when the raw value is null so stale state from an earlier non-null read does not leak. 4. Anything left throws SQLFeatureNotSupportedException (matching the JDBC spec's expectation for unsupported conversions, and consistent with the existing TimeStampVectorAccessor override). Numeric narrowing (e.g. getObject(col, Integer.class) on a BIGINT column) is intentionally NOT handled here. The accessor-level primitive getters (getInt, getShort, getByte) use unchecked Java casts -- delegating to them would silently truncate Long.MAX_VALUE to -1 instead of refusing. Accessors that need lossless cross-type conversion override getObject(Class) (TimeStampVectorAccessor for the Instant / OffsetDateTime / LocalDateTime / etc. paths). Tests in StreamingResultSetMethodTest cover: - getObjectWithClassUsesAccessorBaseFallback: identity case (VARCHAR -> String) goes through the inherited String fast-path. - getObjectWithSupertypeOrInterfaceReturnsValue: Object.class and CharSequence.class both work via isInstance -- proves polymorphic callers (raw Object holders, generic tooling) aren't blocked. - getObjectWithNullClassThrows: null type parameter raises SQLException with the expected "must not be null" message. - getObjectWithIncompatibleClassThrows: requesting an unrelated type (String column, StringBuilder asked) raises the typed conversion error. - getObjectWithClassReturnsNullForNullValue: a null column value short-circuits and returns null regardless of the requested type; wasNull() is true afterwards.
Collapse the two ResultSet families (streaming Arrow + row-based metadata) into a single Arrow-backed implementation so there is one accessor pipeline, one set of type semantics, and one place to fix bugs. Changes: - ArrowStreamReaderCursor becomes source-agnostic: a BatchLoader drives a VectorSchemaRoot, whether sourced from an ArrowStreamReader or a pre-populated in-memory batch. The cursor also owns an AutoCloseable so it is responsible for releasing the allocator + reader on close — the old ArrowStreamReader.close() would only tear down vectors and leak the 100 MB RootAllocator. - QueryResultArrowStream.toArrowStreamReader returns a Result holder that pairs the reader with the allocator and closes both in the right order so Arrow's accounting invariants hold. - StreamingResultSet gains ofInMemory(root, owned, queryId, zone, cols) so metadata results funnel through the same result set. A columns override preserves the JDBC-spec typeName labels (e.g. TEXT) that would otherwise be lost when deriving from the Arrow schema. - MetadataArrowBuilder materialises List<List<Object>> metadata rows into a populated VectorSchemaRoot using the existing HyperTypeToArrow mapping; MetadataResultSets is the factory callers use. - QueryMetadataUtil and DataCloudDatabaseMetadata route getTables, getColumns, getSchemas, getTypeInfo and empty metadata results through the Arrow-backed StreamingResultSet. - DataCloudMetadataResultSet, SimpleResultSet, and ColumnAccessor are removed now that no caller depends on them. - StreamingResultSet.getObject(int, Class) gains an isInstance-based fallback so callers can retrieve String-typed VARCHAR columns without each accessor having to implement typed getObject. - Tests moved to the unified path; integer-accessor-only assertions in DataCloudDatabaseMetadataTest updated to reflect stricter Arrow accessor semantics.
…ArrowStreamReader Rework the ResultSet unification to address two reviewer requests on #175: 1. Share the vector-building code with the parameter-encoding path instead of having a dedicated MetadataArrowBuilder. VectorPopulator now exposes a row-indexed primitive (setCell) used by both callers. The existing single-row parameter-binding overload and a new many-row metadata overload both funnel through it, and all the individual vector setters are parameterised by row index. 2. Keep ArrowStreamReaderCursor on its original ArrowStreamReader-only interface. The metadata path now serialises a populated VSR to Arrow IPC bytes and wraps the result in a ByteArrayInputStream-backed ArrowStreamReader, so both streaming and metadata result sets travel through exactly the same reader/cursor plumbing. Supporting changes: - typeName overrides (e.g. "TEXT" for JDBC-spec metadata columns) now round-trip through Arrow via a jdbc:type_name field-metadata key rather than a columns-override parameter on StreamingResultSet. HyperTypeToArrow stamps the key on write; ArrowToHyperTypeMapper.toColumnMetadata reads it back. - StreamingResultSet drops the ofInMemory(...) factory and the columns override; callers construct an ArrowStreamReader + BufferAllocator pair and hand them to of(reader, allocator, queryId, zone). The cursor owns both and closes reader-then-allocator on close. - QueryResultArrowStream.toArrowStreamReader returns a simple Result holder (reader + allocator) instead of an AutoCloseable bundle. - MetadataResultSets is the single entry point for Arrow-backed metadata result sets; MetadataArrowBuilder is deleted. - Empty metadata results skip writeBatch() entirely so ArrowStreamReaderCursor doesn't interpret a zero-row batch as "at least one row available". - Tests updated to the new API; StreamingResultSetMethodTest builds its in-memory ResultSet the same way as the metadata path (IPC round-trip).
StreamingResultSet.of catches IOException and IllegalArgumentException from the Arrow schema decode and rewraps as SQLException. At all four query-path call sites (DataCloudConnection.getRowBasedResultSet, getChunkBasedResultSet, DataCloudStatement.executeQuery, getResultSet) the surrounding try-catch only catches StatusRuntimeException, so a SQLException thrown from of() bypasses it and leaks the 100 MB RootAllocator returned by QueryResultArrowStream.toArrowStreamReader. Introduce StreamingResultSet.ofClosingOnFailure(Result, queryId, sessionZone) that takes the reader+allocator pair and closes both on construction failure (reader first so its buffers release before the allocator's budget check). Switch all four call sites to it. The metadata path in MetadataResultSets.of already had this shape; this fixes the matching gap on the query side. Add a regression test that builds an Arrow IPC stream with an unsupported field type (LargeUtf8) and asserts the helper closes both the reader and the allocator on the resulting SQLException.
The Int/SmallInt/TinyInt setters widened from concrete boxed types (Integer/Short/Byte) to Number so metadata rows could pass long values, but lost the implicit "right boxed type" check at the call sites that went through DataCloudPreparedStatement.setObject for parameter binding. A user binding Long.MAX_VALUE to an INT32 parameter would silently get (int) Long.MAX_VALUE = -1 written to the vector. Add an explicit range check on Int/SmallInt/TinyInt setters before narrowing. Both the metadata path and the parameter-binding path go through these setters, so strict checks here mean strict on both paths. BigInt accepts the full long range and is unchanged. Pin the behavior with a focused unit test (IntegerVectorSetterRangeCheckTest).
The driver round-trips JDBC-spec type-name overrides (e.g. "TEXT" for metadata columns) through Arrow field metadata under a custom key. The previous key "jdbc:type_name" used an unprefixed namespace not reserved by the Arrow spec — Hyper, query-federator, or another Arrow producer could emit a same-named key in a future protocol version, in which case ArrowToHyperTypeMapper would silently override its own derived type name with whatever upstream stamped. Rename to "datacloud-jdbc:type_name" so the namespace is unambiguous, and expand the field's javadoc to document the namespace rationale.
The fallback in ArrowToHyperTypeMapper.toColumnMetadata — when a field
has no datacloud-jdbc:type_name override, ColumnMetadata.typeName is
null and the JDBC layer derives the column type-name from the
HyperType — was load-bearing but unasserted. Real Hyper Arrow streams
never stamp the override, so every functional query test exercised the
fallback implicitly; if a future refactor broke it, the regression
would not surface in the existing suite.
Two new pin tests:
- ArrowToHyperTypeMapperTest at the unit boundary: field with override
-> typeName matches; field without override (null metadata, empty
metadata) -> typeName is null.
- StreamingResultSetTest.getColumnTypeNameFallsBackToDerivedNameOnRealHyperStream
end-to-end against local Hyper: executeQuery on a select with INT,
VARCHAR, DECIMAL columns asserts ResultSetMetaData.getColumnTypeName
returns the derived names ("INTEGER", "VARCHAR", "DECIMAL").
Drive-by pin test: StreamingResultSet.getObject(int, Map<String,Class<?>>) with a null or empty type map should behave like plain getObject(int) per the JDBC spec. Previously not asserted anywhere. The companion getObject(Class) fallback test landed earlier on this branch, bundled into the QueryJDBCAccessor base-class fix commit so the fix and its end-to-end coverage ship as a single cherry-pickable unit.
Previously a row with the wrong number of elements would silently leave the trailing columns as Arrow null (interpreted as missing values). Today every caller routes through MetadataSchemas so the sizes match by construction, but a future caller bug would surface only inside vector population, far from the boundary. Add an explicit arity check at the of(...) entrypoint: each non-null row must have exactly columns.size() elements. Null rows are accepted as the all-nulls row (matching the legacy coerceRows convention of turning null into emptyList). Empty rows are accepted only when the schema is also empty. Pin behavior with MetadataResultSetsTest covering short, long, correct-arity, null-row, and empty-rows cases.
Now that ArrowStreamReaderCursor.loadNextNonEmptyBatch (introduced earlier on this branch as a pre-unify cursor fix) consumes empty batches at the cursor seam, MetadataResultSets.writeArrowStream no longer needs its own "skip writeBatch when rowCount==0" workaround: the cursor handles the empty-only case correctly. Remove the special case and always emit a batch. Tightens the zeroRowOnlyBatchYieldsNoRows test docstring to match.
DataCloudMetadataResultSet was deleted in this PR, but the test file retained the old name and lived in the wrong package. Merge its empty- result-set JDBC-shape smoke tests into the new MetadataResultSetsTest under the .core.metadata package and delete the legacy file. No behavior change.
Now that QueryJDBCAccessor.getObject(Class) provides the raw + isInstance fallback as its base-class default, StreamingResultSet no longer needs the catch-and-retry path that worked around accessors which threw "Operation not supported." Collapse getObject(int, Class) to direct dispatch and update the regression test's WHY comment to point at the accessor base class as the load-bearing layer. Addresses: review comment on PR #175 line 388.
Three small follow-ups from PR #175 review: - StreamingResultSet.of: drop the paragraph that pointed at the HyperTypeToArrow.JDBC_TYPE_NAME_METADATA_KEY field-metadata key. The docstring spilled implementation detail of the metadata-stamping path into a generic "create a result set from a reader" entry-point; the type-name override is documented at HyperTypeToArrow / ColumnMetadata where it's relevant. - ArrowStreamReaderCursor.loadNextNonEmptyBatch: rewrite the rationale to answer "why does the cursor consume empty batches instead of the caller?" directly. Empty IPC batches are valid Arrow and producers emit them; JDBC's next() only knows rows, so this cursor is the seam that translates batch-level signals into row-level advances. - MetadataResultSetsTest: drop the JDBC ResultSet-shape slice (next / isClosed / getStatement / unwrap / isWrapperFor / getHoldability / getFetchSize / setFetchSize / getWarnings / getConcurrency / getType / getFetchDirection). Those test the StreamingResultSet plumbing shared by every result set on this branch and are already covered by StreamingResultSetMethodTest. Keep the arity-contract slice (short/long/right/null/empty rows) — that is the metadata-result-set-specific behavior. Addresses: review comments on PR #175.
StreamingResultSet had two public factories — of(reader, allocator, queryId[, zone]) (4 callers) and ofClosingOnFailure(Result, queryId, zone) (5 callers). Every production caller wanted the close-on-failure behavior; only tests and the metadata helper used the bare of(). Two factories with overlapping responsibilities is one too many — a caller hitting the bare of() and not knowing about ofClosingOnFailure would silently leak the 100 MB RootAllocator on construction failure. Collapse to one public factory: - of(QueryResultArrowStream.Result, queryId, sessionZone) — the only factory callers see, always closes both reader and allocator on failure. Name is the unambiguous "of" because there is no other. - create(reader, allocator, queryId, sessionZone) — private; just the construction body the factory wraps. Production call sites (DataCloudConnection, DataCloudStatement) and MetadataResultSets were already passing a (reader, allocator) pair, so the call shape collapses to passing the Result holder. Tests that were building the pair locally now wrap it in a Result the same way.
…r interface Pre-unify there were three result-set implementations: StreamingResultSet (streaming Arrow query results), DataCloudMetadataResultSet (metadata), SimpleResultSet (in-memory rows). The DataCloudResultSet interface — a one-method (getQueryId) extension over java.sql.ResultSet — was the common "implements" the public API surfaced; StreamingResultSet was the only one that ever implemented it as a non-trivial impl. The unify refactor collapsed all three implementations into StreamingResultSet, but kept the interface and the "Streaming" name. Two problems fall out: - The "Streaming" name now lies. Metadata results flow through the same class but they're a one-shot in-memory IPC blob — nothing streaming about them. MetadataResultSets.of even passes /*queryId=*/ null because there is no query. - The DataCloudResultSet interface has one implementer and one method. Layering an interface for one impl is just a reader trap: callers instinctively look for "what other implementations exist" and find none. Collapse the two: - Rename the class StreamingResultSet -> DataCloudResultSet. - Delete the old DataCloudResultSet interface (the public method getQueryId() now lives directly on the class via @Getter). - Update all production and test references; rename the affected test files to match (StreamingResultSet*Test -> DataCloudResultSet*Test). The public API surface is unchanged in source for the common cases: DataCloudConnection.getRowBasedResultSet / getChunkBasedResultSet still return DataCloudResultSet, just as a class instead of an interface. This is binary-incompatible for any caller that ever cast to or implemented the old interface; in practice only StreamingResultSet implemented it on the read side, and no code outside the driver implemented it on the write side.
d90bb9a to
b83e8b7
Compare
NOTE: I'm not happy with the PR and the approach, this is for me to make reviews and steer the agent
Summary
Collapse the two ResultSet families (streaming Arrow + row-based metadata) into a single Arrow-backed implementation so there is one accessor pipeline, one set of type semantics, and one place to fix bugs. Also tightens root-allocator hygiene.
Built on top of #
moritz/centralize-types-via-hypertype.What changed
Unified result set.
DataCloudMetadataResultSet,SimpleResultSet, andColumnAccessorare removed. JDBC metadata calls (getTables,getColumns,getSchemas,getTypeInfo, and all empty-metadata helpers) now funnel throughStreamingResultSetvia a newMetadataArrowBuilderthat materialisesList<List<Object>>metadata rows into a populated ArrowVectorSchemaRoot.MetadataResultSetsis the factory callers use.Source-agnostic cursor.
ArrowStreamReaderCursornow accepts either a streamingArrowStreamReaderor a pre-populated in-memoryVectorSchemaRoot, driven by a pluggableBatchLoader. The cursor owns anAutoCloseableholding the backing resources and closes it on cursor close.Root allocator hygiene.
QueryResultArrowStream.toArrowStreamReaderpreviously leaked a 100 MBRootAllocator—ArrowStreamReader.close()only tears down vectors, not the allocator. It now returns aResultholder that pairs the reader with the allocator and closes both in the correct order (reader first, so ArrowBuf accounting clears before the allocator's budget check).StreamingResultSet.ofInMemory(root, owned, queryId, zone, cols)similarly takes ownership of the allocator + VSR through anAutoCloseable, so every code path closes its allocator.typeName preservation.
ofInMemoryaccepts an optional columns override so JDBC-spec labels like\"TEXT\"/\"INTEGER\"/\"SHORT\"survive a round-trip through Arrow (the derived HyperType names would otherwise be\"VARCHAR\"etc.).StreamingResultSet.getObject(int, Class) gains an
isInstancefallback sogetObject(col, String.class)on a VARCHAR works without each accessor implementing typedgetObject.Behavior changes worth calling out
Accessor semantics on metadata rows are now the same as on query results, which is stricter than the old row-based
SimpleResultSet:getBoolean/getDate/getTime/getTimestampon an integer column throwSQLExceptioninstead of loose-coercing.getByteon an integer column is now supported (previously threw in the metadata path).DataCloudDatabaseMetadataTestassertions were updated accordingly.Test plan
./gradlew clean buildpasses (includesspotlessCheck, all tests, JaCoCo coverage, verification)../gradlew :jdbc-core:test— 1222 tests, 0 failed.