Skip to content

refactor: unify ResultSet implementations on Arrow-backed path#175

Draft
mkaufmann wants to merge 15 commits into
mainfrom
moritz/unify-result-sets
Draft

refactor: unify ResultSet implementations on Arrow-backed path#175
mkaufmann wants to merge 15 commits into
mainfrom
moritz/unify-result-sets

Conversation

@mkaufmann
Copy link
Copy Markdown
Member

@mkaufmann mkaufmann commented Apr 24, 2026

NOTE: I'm not happy with the PR and the approach, this is for me to make reviews and steer the agent

Summary

Collapse the two ResultSet families (streaming Arrow + row-based metadata) into a single Arrow-backed implementation so there is one accessor pipeline, one set of type semantics, and one place to fix bugs. Also tightens root-allocator hygiene.

Built on top of #moritz/centralize-types-via-hypertype.

What changed

Unified result set. DataCloudMetadataResultSet, SimpleResultSet, and ColumnAccessor are removed. JDBC metadata calls (getTables, getColumns, getSchemas, getTypeInfo, and all empty-metadata helpers) now funnel through StreamingResultSet via a new MetadataArrowBuilder that materialises List<List<Object>> metadata rows into a populated Arrow VectorSchemaRoot. MetadataResultSets is the factory callers use.

Source-agnostic cursor. ArrowStreamReaderCursor now accepts either a streaming ArrowStreamReader or a pre-populated in-memory VectorSchemaRoot, driven by a pluggable BatchLoader. The cursor owns an AutoCloseable holding the backing resources and closes it on cursor close.

Root allocator hygiene.

  • QueryResultArrowStream.toArrowStreamReader previously leaked a 100 MB RootAllocatorArrowStreamReader.close() only tears down vectors, not the allocator. It now returns a Result holder that pairs the reader with the allocator and closes both in the correct order (reader first, so ArrowBuf accounting clears before the allocator's budget check).
  • StreamingResultSet.ofInMemory(root, owned, queryId, zone, cols) similarly takes ownership of the allocator + VSR through an AutoCloseable, so every code path closes its allocator.

typeName preservation. ofInMemory accepts an optional columns override so JDBC-spec labels like \"TEXT\" / \"INTEGER\" / \"SHORT\" survive a round-trip through Arrow (the derived HyperType names would otherwise be \"VARCHAR\" etc.).

StreamingResultSet.getObject(int, Class) gains an isInstance fallback so getObject(col, String.class) on a VARCHAR works without each accessor implementing typed getObject.

Behavior changes worth calling out

Accessor semantics on metadata rows are now the same as on query results, which is stricter than the old row-based SimpleResultSet:

  • getBoolean / getDate / getTime / getTimestamp on an integer column throw SQLException instead of loose-coercing.
  • getByte on an integer column is now supported (previously threw in the metadata path).

DataCloudDatabaseMetadataTest assertions were updated accordingly.

Test plan

  • ./gradlew clean build passes (includes spotlessCheck, all tests, JaCoCo coverage, verification).
  • ./gradlew :jdbc-core:test — 1222 tests, 0 failed.
  • Spot-check downstream: Spark datasource still compiles (covered by full build).

mkaufmann added a commit that referenced this pull request Apr 24, 2026
…ArrowStreamReader

Rework the ResultSet unification to address two reviewer requests on #175:

1. Share the vector-building code with the parameter-encoding path instead
   of having a dedicated MetadataArrowBuilder. VectorPopulator now exposes a
   row-indexed primitive (setCell) used by both callers. The existing
   single-row parameter-binding overload and a new many-row metadata
   overload both funnel through it, and all the individual vector setters
   are parameterised by row index.

2. Keep ArrowStreamReaderCursor on its original ArrowStreamReader-only
   interface. The metadata path now serialises a populated VSR to Arrow
   IPC bytes and wraps the result in a ByteArrayInputStream-backed
   ArrowStreamReader, so both streaming and metadata result sets travel
   through exactly the same reader/cursor plumbing.

Supporting changes:

- typeName overrides (e.g. "TEXT" for JDBC-spec metadata columns) now
  round-trip through Arrow via a jdbc:type_name field-metadata key rather
  than a columns-override parameter on StreamingResultSet. HyperTypeToArrow
  stamps the key on write; ArrowToHyperTypeMapper.toColumnMetadata reads it
  back.
- StreamingResultSet drops the ofInMemory(...) factory and the columns
  override; callers construct an ArrowStreamReader + BufferAllocator pair
  and hand them to of(reader, allocator, queryId, zone). The cursor owns
  both and closes reader-then-allocator on close.
- QueryResultArrowStream.toArrowStreamReader returns a simple Result
  holder (reader + allocator) instead of an AutoCloseable bundle.
- MetadataResultSets is the single entry point for Arrow-backed metadata
  result sets; MetadataArrowBuilder is deleted.
- Empty metadata results skip writeBatch() entirely so ArrowStreamReaderCursor
  doesn't interpret a zero-row batch as "at least one row available".
- Tests updated to the new API; StreamingResultSetMethodTest builds its
  in-memory ResultSet the same way as the metadata path (IPC round-trip).
@mkaufmann mkaufmann force-pushed the moritz/centralize-types-via-hypertype branch 2 times, most recently from 9ecba8a to 08d62bc Compare May 7, 2026 19:10
Base automatically changed from moritz/centralize-types-via-hypertype to main May 7, 2026 19:25
mkaufmann added a commit that referenced this pull request May 11, 2026
…ArrowStreamReader

Rework the ResultSet unification to address two reviewer requests on #175:

1. Share the vector-building code with the parameter-encoding path instead
   of having a dedicated MetadataArrowBuilder. VectorPopulator now exposes a
   row-indexed primitive (setCell) used by both callers. The existing
   single-row parameter-binding overload and a new many-row metadata
   overload both funnel through it, and all the individual vector setters
   are parameterised by row index.

2. Keep ArrowStreamReaderCursor on its original ArrowStreamReader-only
   interface. The metadata path now serialises a populated VSR to Arrow
   IPC bytes and wraps the result in a ByteArrayInputStream-backed
   ArrowStreamReader, so both streaming and metadata result sets travel
   through exactly the same reader/cursor plumbing.

Supporting changes:

- typeName overrides (e.g. "TEXT" for JDBC-spec metadata columns) now
  round-trip through Arrow via a jdbc:type_name field-metadata key rather
  than a columns-override parameter on StreamingResultSet. HyperTypeToArrow
  stamps the key on write; ArrowToHyperTypeMapper.toColumnMetadata reads it
  back.
- StreamingResultSet drops the ofInMemory(...) factory and the columns
  override; callers construct an ArrowStreamReader + BufferAllocator pair
  and hand them to of(reader, allocator, queryId, zone). The cursor owns
  both and closes reader-then-allocator on close.
- QueryResultArrowStream.toArrowStreamReader returns a simple Result
  holder (reader + allocator) instead of an AutoCloseable bundle.
- MetadataResultSets is the single entry point for Arrow-backed metadata
  result sets; MetadataArrowBuilder is deleted.
- Empty metadata results skip writeBatch() entirely so ArrowStreamReaderCursor
  doesn't interpret a zero-row batch as "at least one row available".
- Tests updated to the new API; StreamingResultSetMethodTest builds its
  in-memory ResultSet the same way as the metadata path (IPC round-trip).
@mkaufmann mkaufmann force-pushed the moritz/unify-result-sets branch from b648940 to 07125b1 Compare May 11, 2026 13:38
@codecov
Copy link
Copy Markdown

codecov Bot commented May 11, 2026

Codecov Report

❌ Patch coverage is 76.53631% with 84 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.64%. Comparing base (1bda7d4) to head (b83e8b7).

Files with missing lines Patch % Lines
...sforce/datacloud/jdbc/core/DataCloudResultSet.java 76.28% 45 Missing and 1 partial ⚠️
.../datacloud/jdbc/protocol/data/VectorPopulator.java 72.05% 12 Missing and 7 partials ⚠️
...tacloud/jdbc/core/metadata/MetadataResultSets.java 72.91% 9 Missing and 4 partials ⚠️
...atacloud/jdbc/core/accessor/QueryJDBCAccessor.java 75.00% 2 Missing and 1 partial ⚠️
...datacloud/jdbc/protocol/data/HyperTypeToArrow.java 81.81% 1 Missing and 1 partial ⚠️
...oud/jdbc/protocol/data/ArrowToHyperTypeMapper.java 75.00% 0 Missing and 1 partial ⚠️

❌ Your patch check has failed because the patch coverage (76.53%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@             Coverage Diff              @@
##               main     #175      +/-   ##
============================================
- Coverage     82.37%   80.64%   -1.73%     
+ Complexity     1867     1706     -161     
============================================
  Files           125      123       -2     
  Lines          5009     4939      -70     
  Branches        537      521      -16     
============================================
- Hits           4126     3983     -143     
- Misses          641      725      +84     
+ Partials        242      231      -11     
Components Coverage Δ
JDBC Core 81.02% <76.53%> (-2.13%) ⬇️
JDBC Main 40.69% <ø> (ø)
JDBC HTTP 90.30% <ø> (ø)
JDBC Utilities 65.25% <ø> (ø)
Spark Datasource ∅ <ø> (∅)
Files with missing lines Coverage Δ
...e/datacloud/jdbc/core/ArrowStreamReaderCursor.java 91.17% <100.00%> (+0.55%) ⬆️
...force/datacloud/jdbc/core/DataCloudConnection.java 57.04% <100.00%> (ø)
...datacloud/jdbc/core/DataCloudDatabaseMetadata.java 98.34% <100.00%> (-0.01%) ⬇️
...sforce/datacloud/jdbc/core/DataCloudStatement.java 81.09% <100.00%> (ø)
...esforce/datacloud/jdbc/core/QueryMetadataUtil.java 95.40% <100.00%> (ø)
...oud/jdbc/core/SQLExceptionQueryResultIterator.java 75.00% <ø> (ø)
...atacloud/jdbc/protocol/QueryResultArrowStream.java 87.50% <100.00%> (+0.83%) ⬆️
...oud/jdbc/protocol/data/ArrowToHyperTypeMapper.java 67.79% <75.00%> (+5.97%) ⬆️
...datacloud/jdbc/protocol/data/HyperTypeToArrow.java 76.92% <81.81%> (-1.34%) ⬇️
...atacloud/jdbc/core/accessor/QueryJDBCAccessor.java 93.87% <75.00%> (-3.50%) ⬇️
... and 3 more

... and 5 files with indirect coverage changes

Impacted file tree graph

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment on lines +374 to +388
} catch (SQLException ex) {
// Accessor does not implement typed getObject — fall back to raw + isInstance check.
val raw = accessor.getObject();
updateWasNull(accessor);
if (raw == null) {
return null;
}
if (type.isInstance(raw)) {
return type.cast(raw);
}
throw new SQLException(
"Cannot convert column value to " + type.getName() + "; actual type is "
+ raw.getClass().getName(),
ex);
}
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this putting a bandaid over bad behavior of the accessors? Shouldn't the accessor already implement this behavior so that we don't need the bandaid here? If true check if if this proper fix can be factored into a separate commit (before the others on this branch) so that I can maybe ship it independently

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — fixed in 322663e by giving QueryJDBCAccessor.getObject(Class) a default raw + isInstance implementation, so accessors that don't need typed conversion just inherit the right behavior. Then a6da614 collapses StreamingResultSet.getObject(int, Class) to a direct dispatch (no try/catch).

Per your "factor into a separate commit at the start of the branch" ask: 322663e is now the first commit on the branch (touches only QueryJDBCAccessor.java, applies cleanly on main), so it can be cherry-picked and shipped independently of the rest of this PR. The follow-up that drops the bandaid (a6da614) sits at the end since it depends on the rest of the unified StreamingResultSet.

Comment on lines +90 to +92
* <p>The column metadata (including any {@link ColumnMetadata#getTypeName()} override
* stamped under {@link com.salesforce.datacloud.jdbc.protocol.data.HyperTypeToArrow#JDBC_TYPE_NAME_METADATA_KEY})
* is derived from the Arrow schema via {@link ArrowToHyperTypeMapper#toColumnMetadata(org.apache.arrow.vector.types.pojo.Field)}.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is too low level comment, remove

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in e329860 — the paragraph that mentioned JDBC_TYPE_NAME_METADATA_KEY is gone; the docstring now stays at the level of "this is what of(reader, allocator, queryId, sessionZone) does" and leaves the field-metadata details documented at HyperTypeToArrow / ColumnMetadata where they live.

Comment on lines +120 to +141
/**
* Hand the reader + allocator pair from {@link QueryResultArrowStream.Result} to {@link
* #of(ArrowStreamReader, BufferAllocator, String, ZoneId)} and close both on construction
* failure. Without this, an {@code of} call that throws (for example {@code SQLException}
* wrapping an unsupported Arrow type) would leak the 100 MB
* {@link org.apache.arrow.memory.RootAllocator} held by {@code Result}.
*/
public static StreamingResultSet ofClosingOnFailure(
QueryResultArrowStream.Result arrowStream, String queryId, ZoneId sessionZone) throws SQLException {
try {
return of(arrowStream.getReader(), arrowStream.getAllocator(), queryId, sessionZone);
} catch (SQLException | RuntimeException ex) {
try {
arrowStream.getReader().close();
} catch (Exception suppressed) {
ex.addSuppressed(suppressed);
}
arrowStream.getAllocator().close();
throw ex;
}
}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like an independent fix, if yes factor (and the tests of course) into a seperate commit at the start of the branch so that I can potentially ship it independently

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looked at this carefully — the fix is already in its own commit (26bdc37 fix: close allocator on StreamingResultSet.of failure in query path with its regression test in StreamingResultSetMethodTest), but I left it where it is on the branch rather than moving it to the very front. The reason: as coded, this fix is not independent of the unify refactor. Specifically, it adds StreamingResultSet.ofClosingOnFailure(QueryResultArrowStream.Result, ...) and switches the four call sites to it — and on main:

  • QueryResultArrowStream doesn't return a Result wrapping (reader, allocator); toArrowStreamReader just returns an ArrowStreamReader.
  • StreamingResultSet.of(...) only takes (reader, queryId) — there's no allocator to leak on construction failure, because the allocator isn't owned by the result set on main at all.

So "pull this fix to the front of the branch" would mean rewriting the fix as a different fix against the pre-unify surface, which feels like it deserves its own decision rather than a sneaky reorder here. Two options if you'd like to ship the leak fix independently of this PR:

  1. I can carve out a minimal "pre-unify" leak fix against main directly (different shape — likely closes the allocator that today lives inside QueryResultArrowStream rather than in StreamingResultSet) and open it as its own PR. Happy to do this.
  2. Land it as part of this PR and treat the unify+leak-fix as a unit.

Lean toward (1) if you want to ship it ahead of this PR, otherwise (2) is fine. Let me know.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try to move it to a separate commit (along with test coverage). Would it make sense to move the QueryResultArrowStream as well then?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honest answer: yes, that would make the leak fix sit on a clean base — but it's a non-trivial split because the unify refactor is what introduced the allocator-ownership concept (Result holder, cursor's BufferAllocator field, StreamingResultSet.of taking an allocator). To pull "QueryResultArrowStream introduces Result" + "StreamingResultSet/cursor own the allocator" + "ofClosingOnFailure leak fix" forward as a single commit before the unify refactor, I'd be effectively breaking the unify commit into two halves and rebuilding the unify on top of the pre-extracted half. Doable, but it means rewriting a ~800-line refactor.

Given (a) you have one substantive comment left to evaluate (whether commit 2088116 lands as a separate PR), and (b) the leak fix is already in its own commit (4d04fbb, regression test included), I'd propose:

  1. Now: I land 2088116 (cursor zero-row-skip fix + tests against main's signature) as the cherry-pickable first commit. Same for 093d692 (getObject(Class) base-class fallback).
  2. If you want the leak fix shipped independently: I cherry-pick 4d04fbb's intent into a new PR against main, reshaping the fix to whatever the pre-unify code's leak surface actually is (likely a fix in QueryResultArrowStream.toArrowStreamReader to wrap the ArrowStreamReader so its close also closes the RootAllocator — cleaner than threading the allocator through StreamingResultSet.of).
  3. This PR stays as-is for the leak fix: it lands as part of the unified refactor.

Lean toward not splitting the unify commit unless there's a strong reason. Want me to open the separate "main-targeted leak fix" PR per (2)?

Comment on lines +69 to +75
/**
* Load the next batch that has at least one row, skipping any zero-row batches in between.
* Empty IPC batches are valid per the Arrow spec and can be emitted by upstream producers
* (long-running queries returning empty initial chunks, schema-only metadata streams), so
* "successfully loaded" must not be conflated with "row available."
*/
private boolean loadNextNonEmptyBatch() throws IOException {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to skip empty IPC batches if they are valid? Shouldn't callers handle them correctly?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair question. The cursor is the caller for a JDBC ResultSet — there's no layer above it that has a sensible "I just consumed an empty batch, please advance" hook to expose. Sharpened the docstring on this method (e329860) to explain it directly:

Empty IPC batches are valid per the Arrow spec, and producers in the wild emit them — Hyper / query-federator can send a schema-only initial chunk, and async chunked-query paths use empty batches as a keep-alive between data chunks. JDBC has no notion of a "batch": ResultSet.next() only knows "is there another row?". This cursor is the single place that translates batch-level signals from ArrowStreamReader.loadNextBatch() into row-level advances, so it must consume the empty batches itself rather than push that responsibility outward.

Same fix is also independently shipped as commit 4d2e47f (fix: skip zero-row batches in ArrowStreamReaderCursor). I left that commit where it is on the branch rather than moving it to the start, because as coded it also removes the now-redundant empty-batch workaround in MetadataResultSets.writeArrowStream — which doesn't exist on main, so the pre-unify version of the fix would have a different shape (just the cursor side, no metadata-side cleanup). Happy to carve that out if you want a main-targeted preview.

Copy link
Copy Markdown
Member Author

@mkaufmann mkaufmann May 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move this to an independent commit along with test coverage?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — carved out as commit 2088116 fix: skip zero-row batches in ArrowStreamReaderCursor, now the first commit on the branch (touches only ArrowStreamReaderCursor.java + ArrowStreamReaderCursorTest.java, applies cleanly on main's pre-unify cursor signature (reader, sessionZone)). It can be cherry-picked and shipped independently.

The downstream commit 049e82d (formerly the same-named "skip zero-row batches" commit) now contains only the residual: removing the now-redundant MetadataResultSets.writeArrowStream "skip writeBatch when rowCount==0" workaround, since the cursor handles the empty-only case at the seam.

Test coverage that ships with the pre-unify commit:

  • skipsZeroRowBatchAndYieldsSubsequentNonEmptyRows — real Arrow IPC stream of {0-row, 1-row} → next() reports the one real row, then false.
  • zeroRowOnlyBatchYieldsNoRows — real Arrow IPC stream of {0-row} → next() returns false.
  • firstNextReturnsTrueWhenInitialBatchHasRows / firstNextReturnsFalseWhenStreamHasNoBatches — pure-mock pinning of the first-batch logic, replacing the old forwardsLoadNextBatch(true|false) parameterised test that would loop forever under the new control flow.

mkaufmann added a commit that referenced this pull request May 11, 2026
…ArrowStreamReader

Rework the ResultSet unification to address two reviewer requests on #175:

1. Share the vector-building code with the parameter-encoding path instead
   of having a dedicated MetadataArrowBuilder. VectorPopulator now exposes a
   row-indexed primitive (setCell) used by both callers. The existing
   single-row parameter-binding overload and a new many-row metadata
   overload both funnel through it, and all the individual vector setters
   are parameterised by row index.

2. Keep ArrowStreamReaderCursor on its original ArrowStreamReader-only
   interface. The metadata path now serialises a populated VSR to Arrow
   IPC bytes and wraps the result in a ByteArrayInputStream-backed
   ArrowStreamReader, so both streaming and metadata result sets travel
   through exactly the same reader/cursor plumbing.

Supporting changes:

- typeName overrides (e.g. "TEXT" for JDBC-spec metadata columns) now
  round-trip through Arrow via a jdbc:type_name field-metadata key rather
  than a columns-override parameter on StreamingResultSet. HyperTypeToArrow
  stamps the key on write; ArrowToHyperTypeMapper.toColumnMetadata reads it
  back.
- StreamingResultSet drops the ofInMemory(...) factory and the columns
  override; callers construct an ArrowStreamReader + BufferAllocator pair
  and hand them to of(reader, allocator, queryId, zone). The cursor owns
  both and closes reader-then-allocator on close.
- QueryResultArrowStream.toArrowStreamReader returns a simple Result
  holder (reader + allocator) instead of an AutoCloseable bundle.
- MetadataResultSets is the single entry point for Arrow-backed metadata
  result sets; MetadataArrowBuilder is deleted.
- Empty metadata results skip writeBatch() entirely so ArrowStreamReaderCursor
  doesn't interpret a zero-row batch as "at least one row available".
- Tests updated to the new API; StreamingResultSetMethodTest builds its
  in-memory ResultSet the same way as the metadata path (IPC round-trip).
mkaufmann added a commit that referenced this pull request May 11, 2026
Now that QueryJDBCAccessor.getObject(Class) provides the raw + isInstance
fallback as its base-class default, StreamingResultSet no longer needs
the catch-and-retry path that worked around accessors which threw
"Operation not supported." Collapse getObject(int, Class) to direct
dispatch and update the regression test's WHY comment to point at the
accessor base class as the load-bearing layer.

Addresses: review comment on PR #175 line 388.
mkaufmann added a commit that referenced this pull request May 11, 2026
Three small follow-ups from PR #175 review:

- StreamingResultSet.of: drop the paragraph that pointed at the
  HyperTypeToArrow.JDBC_TYPE_NAME_METADATA_KEY field-metadata key. The
  docstring spilled implementation detail of the metadata-stamping path
  into a generic "create a result set from a reader" entry-point; the
  type-name override is documented at HyperTypeToArrow / ColumnMetadata
  where it's relevant.

- ArrowStreamReaderCursor.loadNextNonEmptyBatch: rewrite the rationale
  to answer "why does the cursor consume empty batches instead of the
  caller?" directly. Empty IPC batches are valid Arrow and producers
  emit them; JDBC's next() only knows rows, so this cursor is the seam
  that translates batch-level signals into row-level advances.

- MetadataResultSetsTest: drop the JDBC ResultSet-shape slice (next /
  isClosed / getStatement / unwrap / isWrapperFor / getHoldability /
  getFetchSize / setFetchSize / getWarnings / getConcurrency / getType
  / getFetchDirection). Those test the StreamingResultSet plumbing
  shared by every result set on this branch and are already covered by
  StreamingResultSetMethodTest. Keep the arity-contract slice
  (short/long/right/null/empty rows) — that is the
  metadata-result-set-specific behavior.

Addresses: review comments on PR #175.
@mkaufmann mkaufmann force-pushed the moritz/unify-result-sets branch from 07125b1 to e329860 Compare May 11, 2026 15:02
mkaufmann added a commit that referenced this pull request May 11, 2026
…ArrowStreamReader

Rework the ResultSet unification to address two reviewer requests on #175:

1. Share the vector-building code with the parameter-encoding path instead
   of having a dedicated MetadataArrowBuilder. VectorPopulator now exposes a
   row-indexed primitive (setCell) used by both callers. The existing
   single-row parameter-binding overload and a new many-row metadata
   overload both funnel through it, and all the individual vector setters
   are parameterised by row index.

2. Keep ArrowStreamReaderCursor on its original ArrowStreamReader-only
   interface. The metadata path now serialises a populated VSR to Arrow
   IPC bytes and wraps the result in a ByteArrayInputStream-backed
   ArrowStreamReader, so both streaming and metadata result sets travel
   through exactly the same reader/cursor plumbing.

Supporting changes:

- typeName overrides (e.g. "TEXT" for JDBC-spec metadata columns) now
  round-trip through Arrow via a jdbc:type_name field-metadata key rather
  than a columns-override parameter on StreamingResultSet. HyperTypeToArrow
  stamps the key on write; ArrowToHyperTypeMapper.toColumnMetadata reads it
  back.
- StreamingResultSet drops the ofInMemory(...) factory and the columns
  override; callers construct an ArrowStreamReader + BufferAllocator pair
  and hand them to of(reader, allocator, queryId, zone). The cursor owns
  both and closes reader-then-allocator on close.
- QueryResultArrowStream.toArrowStreamReader returns a simple Result
  holder (reader + allocator) instead of an AutoCloseable bundle.
- MetadataResultSets is the single entry point for Arrow-backed metadata
  result sets; MetadataArrowBuilder is deleted.
- Empty metadata results skip writeBatch() entirely so ArrowStreamReaderCursor
  doesn't interpret a zero-row batch as "at least one row available".
- Tests updated to the new API; StreamingResultSetMethodTest builds its
  in-memory ResultSet the same way as the metadata path (IPC round-trip).
mkaufmann added a commit that referenced this pull request May 11, 2026
Now that QueryJDBCAccessor.getObject(Class) provides the raw + isInstance
fallback as its base-class default, StreamingResultSet no longer needs
the catch-and-retry path that worked around accessors which threw
"Operation not supported." Collapse getObject(int, Class) to direct
dispatch and update the regression test's WHY comment to point at the
accessor base class as the load-bearing layer.

Addresses: review comment on PR #175 line 388.
mkaufmann added a commit that referenced this pull request May 11, 2026
Three small follow-ups from PR #175 review:

- StreamingResultSet.of: drop the paragraph that pointed at the
  HyperTypeToArrow.JDBC_TYPE_NAME_METADATA_KEY field-metadata key. The
  docstring spilled implementation detail of the metadata-stamping path
  into a generic "create a result set from a reader" entry-point; the
  type-name override is documented at HyperTypeToArrow / ColumnMetadata
  where it's relevant.

- ArrowStreamReaderCursor.loadNextNonEmptyBatch: rewrite the rationale
  to answer "why does the cursor consume empty batches instead of the
  caller?" directly. Empty IPC batches are valid Arrow and producers
  emit them; JDBC's next() only knows rows, so this cursor is the seam
  that translates batch-level signals into row-level advances.

- MetadataResultSetsTest: drop the JDBC ResultSet-shape slice (next /
  isClosed / getStatement / unwrap / isWrapperFor / getHoldability /
  getFetchSize / setFetchSize / getWarnings / getConcurrency / getType
  / getFetchDirection). Those test the StreamingResultSet plumbing
  shared by every result set on this branch and are already covered by
  StreamingResultSetMethodTest. Keep the arity-contract slice
  (short/long/right/null/empty rows) — that is the
  metadata-result-set-specific behavior.

Addresses: review comments on PR #175.
@mkaufmann mkaufmann force-pushed the moritz/unify-result-sets branch from e329860 to d17f1b0 Compare May 11, 2026 16:20
@mkaufmann
Copy link
Copy Markdown
Member Author

Per the two review threads, split out the cherry-pickable fixes as their own PRs against main:

This PR (#175) keeps the same fixes as the first two commits — when #185 / #186 land, those commits will collapse to no-ops at rebase time.

For the remaining "should QueryResultArrowStream allocator-ownership move pre-unify too?" thread (#175 review): waiting on your call before I do that split. As I noted there, it's a non-trivial surgery on the unify commit and I'd rather get your sign-off before rewriting ~800 lines of refactor.

mkaufmann added a commit that referenced this pull request May 11, 2026
…ArrowStreamReader

Rework the ResultSet unification to address two reviewer requests on #175:

1. Share the vector-building code with the parameter-encoding path instead
   of having a dedicated MetadataArrowBuilder. VectorPopulator now exposes a
   row-indexed primitive (setCell) used by both callers. The existing
   single-row parameter-binding overload and a new many-row metadata
   overload both funnel through it, and all the individual vector setters
   are parameterised by row index.

2. Keep ArrowStreamReaderCursor on its original ArrowStreamReader-only
   interface. The metadata path now serialises a populated VSR to Arrow
   IPC bytes and wraps the result in a ByteArrayInputStream-backed
   ArrowStreamReader, so both streaming and metadata result sets travel
   through exactly the same reader/cursor plumbing.

Supporting changes:

- typeName overrides (e.g. "TEXT" for JDBC-spec metadata columns) now
  round-trip through Arrow via a jdbc:type_name field-metadata key rather
  than a columns-override parameter on StreamingResultSet. HyperTypeToArrow
  stamps the key on write; ArrowToHyperTypeMapper.toColumnMetadata reads it
  back.
- StreamingResultSet drops the ofInMemory(...) factory and the columns
  override; callers construct an ArrowStreamReader + BufferAllocator pair
  and hand them to of(reader, allocator, queryId, zone). The cursor owns
  both and closes reader-then-allocator on close.
- QueryResultArrowStream.toArrowStreamReader returns a simple Result
  holder (reader + allocator) instead of an AutoCloseable bundle.
- MetadataResultSets is the single entry point for Arrow-backed metadata
  result sets; MetadataArrowBuilder is deleted.
- Empty metadata results skip writeBatch() entirely so ArrowStreamReaderCursor
  doesn't interpret a zero-row batch as "at least one row available".
- Tests updated to the new API; StreamingResultSetMethodTest builds its
  in-memory ResultSet the same way as the metadata path (IPC round-trip).
mkaufmann added a commit that referenced this pull request May 11, 2026
Now that QueryJDBCAccessor.getObject(Class) provides the raw + isInstance
fallback as its base-class default, StreamingResultSet no longer needs
the catch-and-retry path that worked around accessors which threw
"Operation not supported." Collapse getObject(int, Class) to direct
dispatch and update the regression test's WHY comment to point at the
accessor base class as the load-bearing layer.

Addresses: review comment on PR #175 line 388.
mkaufmann added a commit that referenced this pull request May 11, 2026
Three small follow-ups from PR #175 review:

- StreamingResultSet.of: drop the paragraph that pointed at the
  HyperTypeToArrow.JDBC_TYPE_NAME_METADATA_KEY field-metadata key. The
  docstring spilled implementation detail of the metadata-stamping path
  into a generic "create a result set from a reader" entry-point; the
  type-name override is documented at HyperTypeToArrow / ColumnMetadata
  where it's relevant.

- ArrowStreamReaderCursor.loadNextNonEmptyBatch: rewrite the rationale
  to answer "why does the cursor consume empty batches instead of the
  caller?" directly. Empty IPC batches are valid Arrow and producers
  emit them; JDBC's next() only knows rows, so this cursor is the seam
  that translates batch-level signals into row-level advances.

- MetadataResultSetsTest: drop the JDBC ResultSet-shape slice (next /
  isClosed / getStatement / unwrap / isWrapperFor / getHoldability /
  getFetchSize / setFetchSize / getWarnings / getConcurrency / getType
  / getFetchDirection). Those test the StreamingResultSet plumbing
  shared by every result set on this branch and are already covered by
  StreamingResultSetMethodTest. Keep the arity-contract slice
  (short/long/right/null/empty rows) — that is the
  metadata-result-set-specific behavior.

Addresses: review comments on PR #175.
@mkaufmann mkaufmann force-pushed the moritz/unify-result-sets branch from d17f1b0 to f4cad29 Compare May 11, 2026 17:16
mkaufmann added a commit that referenced this pull request May 11, 2026
…ArrowStreamReader

Rework the ResultSet unification to address two reviewer requests on #175:

1. Share the vector-building code with the parameter-encoding path instead
   of having a dedicated MetadataArrowBuilder. VectorPopulator now exposes a
   row-indexed primitive (setCell) used by both callers. The existing
   single-row parameter-binding overload and a new many-row metadata
   overload both funnel through it, and all the individual vector setters
   are parameterised by row index.

2. Keep ArrowStreamReaderCursor on its original ArrowStreamReader-only
   interface. The metadata path now serialises a populated VSR to Arrow
   IPC bytes and wraps the result in a ByteArrayInputStream-backed
   ArrowStreamReader, so both streaming and metadata result sets travel
   through exactly the same reader/cursor plumbing.

Supporting changes:

- typeName overrides (e.g. "TEXT" for JDBC-spec metadata columns) now
  round-trip through Arrow via a jdbc:type_name field-metadata key rather
  than a columns-override parameter on StreamingResultSet. HyperTypeToArrow
  stamps the key on write; ArrowToHyperTypeMapper.toColumnMetadata reads it
  back.
- StreamingResultSet drops the ofInMemory(...) factory and the columns
  override; callers construct an ArrowStreamReader + BufferAllocator pair
  and hand them to of(reader, allocator, queryId, zone). The cursor owns
  both and closes reader-then-allocator on close.
- QueryResultArrowStream.toArrowStreamReader returns a simple Result
  holder (reader + allocator) instead of an AutoCloseable bundle.
- MetadataResultSets is the single entry point for Arrow-backed metadata
  result sets; MetadataArrowBuilder is deleted.
- Empty metadata results skip writeBatch() entirely so ArrowStreamReaderCursor
  doesn't interpret a zero-row batch as "at least one row available".
- Tests updated to the new API; StreamingResultSetMethodTest builds its
  in-memory ResultSet the same way as the metadata path (IPC round-trip).
mkaufmann added a commit that referenced this pull request May 11, 2026
Now that QueryJDBCAccessor.getObject(Class) provides the raw + isInstance
fallback as its base-class default, StreamingResultSet no longer needs
the catch-and-retry path that worked around accessors which threw
"Operation not supported." Collapse getObject(int, Class) to direct
dispatch and update the regression test's WHY comment to point at the
accessor base class as the load-bearing layer.

Addresses: review comment on PR #175 line 388.
mkaufmann added a commit that referenced this pull request May 11, 2026
Three small follow-ups from PR #175 review:

- StreamingResultSet.of: drop the paragraph that pointed at the
  HyperTypeToArrow.JDBC_TYPE_NAME_METADATA_KEY field-metadata key. The
  docstring spilled implementation detail of the metadata-stamping path
  into a generic "create a result set from a reader" entry-point; the
  type-name override is documented at HyperTypeToArrow / ColumnMetadata
  where it's relevant.

- ArrowStreamReaderCursor.loadNextNonEmptyBatch: rewrite the rationale
  to answer "why does the cursor consume empty batches instead of the
  caller?" directly. Empty IPC batches are valid Arrow and producers
  emit them; JDBC's next() only knows rows, so this cursor is the seam
  that translates batch-level signals into row-level advances.

- MetadataResultSetsTest: drop the JDBC ResultSet-shape slice (next /
  isClosed / getStatement / unwrap / isWrapperFor / getHoldability /
  getFetchSize / setFetchSize / getWarnings / getConcurrency / getType
  / getFetchDirection). Those test the StreamingResultSet plumbing
  shared by every result set on this branch and are already covered by
  StreamingResultSetMethodTest. Keep the arity-contract slice
  (short/long/right/null/empty rows) — that is the
  metadata-result-set-specific behavior.

Addresses: review comments on PR #175.
@mkaufmann mkaufmann force-pushed the moritz/unify-result-sets branch 2 times, most recently from 67ddd24 to b97abe2 Compare May 11, 2026 18:37
mkaufmann added a commit that referenced this pull request May 11, 2026
Now that QueryJDBCAccessor.getObject(Class) provides the raw + isInstance
fallback as its base-class default, StreamingResultSet no longer needs
the catch-and-retry path that worked around accessors which threw
"Operation not supported." Collapse getObject(int, Class) to direct
dispatch and update the regression test's WHY comment to point at the
accessor base class as the load-bearing layer.

Addresses: review comment on PR #175 line 388.
mkaufmann added a commit that referenced this pull request May 11, 2026
Three small follow-ups from PR #175 review:

- StreamingResultSet.of: drop the paragraph that pointed at the
  HyperTypeToArrow.JDBC_TYPE_NAME_METADATA_KEY field-metadata key. The
  docstring spilled implementation detail of the metadata-stamping path
  into a generic "create a result set from a reader" entry-point; the
  type-name override is documented at HyperTypeToArrow / ColumnMetadata
  where it's relevant.

- ArrowStreamReaderCursor.loadNextNonEmptyBatch: rewrite the rationale
  to answer "why does the cursor consume empty batches instead of the
  caller?" directly. Empty IPC batches are valid Arrow and producers
  emit them; JDBC's next() only knows rows, so this cursor is the seam
  that translates batch-level signals into row-level advances.

- MetadataResultSetsTest: drop the JDBC ResultSet-shape slice (next /
  isClosed / getStatement / unwrap / isWrapperFor / getHoldability /
  getFetchSize / setFetchSize / getWarnings / getConcurrency / getType
  / getFetchDirection). Those test the StreamingResultSet plumbing
  shared by every result set on this branch and are already covered by
  StreamingResultSetMethodTest. Keep the arity-contract slice
  (short/long/right/null/empty rows) — that is the
  metadata-result-set-specific behavior.

Addresses: review comments on PR #175.
mkaufmann added a commit that referenced this pull request May 11, 2026
…ArrowStreamReader

Rework the ResultSet unification to address two reviewer requests on #175:

1. Share the vector-building code with the parameter-encoding path instead
   of having a dedicated MetadataArrowBuilder. VectorPopulator now exposes a
   row-indexed primitive (setCell) used by both callers. The existing
   single-row parameter-binding overload and a new many-row metadata
   overload both funnel through it, and all the individual vector setters
   are parameterised by row index.

2. Keep ArrowStreamReaderCursor on its original ArrowStreamReader-only
   interface. The metadata path now serialises a populated VSR to Arrow
   IPC bytes and wraps the result in a ByteArrayInputStream-backed
   ArrowStreamReader, so both streaming and metadata result sets travel
   through exactly the same reader/cursor plumbing.

Supporting changes:

- typeName overrides (e.g. "TEXT" for JDBC-spec metadata columns) now
  round-trip through Arrow via a jdbc:type_name field-metadata key rather
  than a columns-override parameter on StreamingResultSet. HyperTypeToArrow
  stamps the key on write; ArrowToHyperTypeMapper.toColumnMetadata reads it
  back.
- StreamingResultSet drops the ofInMemory(...) factory and the columns
  override; callers construct an ArrowStreamReader + BufferAllocator pair
  and hand them to of(reader, allocator, queryId, zone). The cursor owns
  both and closes reader-then-allocator on close.
- QueryResultArrowStream.toArrowStreamReader returns a simple Result
  holder (reader + allocator) instead of an AutoCloseable bundle.
- MetadataResultSets is the single entry point for Arrow-backed metadata
  result sets; MetadataArrowBuilder is deleted.
- Empty metadata results skip writeBatch() entirely so ArrowStreamReaderCursor
  doesn't interpret a zero-row batch as "at least one row available".
- Tests updated to the new API; StreamingResultSetMethodTest builds its
  in-memory ResultSet the same way as the metadata path (IPC round-trip).
mkaufmann added a commit that referenced this pull request May 11, 2026
Now that QueryJDBCAccessor.getObject(Class) provides the raw + isInstance
fallback as its base-class default, StreamingResultSet no longer needs
the catch-and-retry path that worked around accessors which threw
"Operation not supported." Collapse getObject(int, Class) to direct
dispatch and update the regression test's WHY comment to point at the
accessor base class as the load-bearing layer.

Addresses: review comment on PR #175 line 388.
mkaufmann added a commit that referenced this pull request May 11, 2026
Three small follow-ups from PR #175 review:

- StreamingResultSet.of: drop the paragraph that pointed at the
  HyperTypeToArrow.JDBC_TYPE_NAME_METADATA_KEY field-metadata key. The
  docstring spilled implementation detail of the metadata-stamping path
  into a generic "create a result set from a reader" entry-point; the
  type-name override is documented at HyperTypeToArrow / ColumnMetadata
  where it's relevant.

- ArrowStreamReaderCursor.loadNextNonEmptyBatch: rewrite the rationale
  to answer "why does the cursor consume empty batches instead of the
  caller?" directly. Empty IPC batches are valid Arrow and producers
  emit them; JDBC's next() only knows rows, so this cursor is the seam
  that translates batch-level signals into row-level advances.

- MetadataResultSetsTest: drop the JDBC ResultSet-shape slice (next /
  isClosed / getStatement / unwrap / isWrapperFor / getHoldability /
  getFetchSize / setFetchSize / getWarnings / getConcurrency / getType
  / getFetchDirection). Those test the StreamingResultSet plumbing
  shared by every result set on this branch and are already covered by
  StreamingResultSetMethodTest. Keep the arity-contract slice
  (short/long/right/null/empty rows) — that is the
  metadata-result-set-specific behavior.

Addresses: review comments on PR #175.
@mkaufmann mkaufmann force-pushed the moritz/unify-result-sets branch from b97abe2 to 8c8254f Compare May 11, 2026 19:08
mkaufmann added a commit that referenced this pull request May 12, 2026
…ArrowStreamReader

Rework the ResultSet unification to address two reviewer requests on #175:

1. Share the vector-building code with the parameter-encoding path instead
   of having a dedicated MetadataArrowBuilder. VectorPopulator now exposes a
   row-indexed primitive (setCell) used by both callers. The existing
   single-row parameter-binding overload and a new many-row metadata
   overload both funnel through it, and all the individual vector setters
   are parameterised by row index.

2. Keep ArrowStreamReaderCursor on its original ArrowStreamReader-only
   interface. The metadata path now serialises a populated VSR to Arrow
   IPC bytes and wraps the result in a ByteArrayInputStream-backed
   ArrowStreamReader, so both streaming and metadata result sets travel
   through exactly the same reader/cursor plumbing.

Supporting changes:

- typeName overrides (e.g. "TEXT" for JDBC-spec metadata columns) now
  round-trip through Arrow via a jdbc:type_name field-metadata key rather
  than a columns-override parameter on StreamingResultSet. HyperTypeToArrow
  stamps the key on write; ArrowToHyperTypeMapper.toColumnMetadata reads it
  back.
- StreamingResultSet drops the ofInMemory(...) factory and the columns
  override; callers construct an ArrowStreamReader + BufferAllocator pair
  and hand them to of(reader, allocator, queryId, zone). The cursor owns
  both and closes reader-then-allocator on close.
- QueryResultArrowStream.toArrowStreamReader returns a simple Result
  holder (reader + allocator) instead of an AutoCloseable bundle.
- MetadataResultSets is the single entry point for Arrow-backed metadata
  result sets; MetadataArrowBuilder is deleted.
- Empty metadata results skip writeBatch() entirely so ArrowStreamReaderCursor
  doesn't interpret a zero-row batch as "at least one row available".
- Tests updated to the new API; StreamingResultSetMethodTest builds its
  in-memory ResultSet the same way as the metadata path (IPC round-trip).
mkaufmann added a commit that referenced this pull request May 12, 2026
Now that QueryJDBCAccessor.getObject(Class) provides the raw + isInstance
fallback as its base-class default, StreamingResultSet no longer needs
the catch-and-retry path that worked around accessors which threw
"Operation not supported." Collapse getObject(int, Class) to direct
dispatch and update the regression test's WHY comment to point at the
accessor base class as the load-bearing layer.

Addresses: review comment on PR #175 line 388.
mkaufmann added a commit that referenced this pull request May 12, 2026
Three small follow-ups from PR #175 review:

- StreamingResultSet.of: drop the paragraph that pointed at the
  HyperTypeToArrow.JDBC_TYPE_NAME_METADATA_KEY field-metadata key. The
  docstring spilled implementation detail of the metadata-stamping path
  into a generic "create a result set from a reader" entry-point; the
  type-name override is documented at HyperTypeToArrow / ColumnMetadata
  where it's relevant.

- ArrowStreamReaderCursor.loadNextNonEmptyBatch: rewrite the rationale
  to answer "why does the cursor consume empty batches instead of the
  caller?" directly. Empty IPC batches are valid Arrow and producers
  emit them; JDBC's next() only knows rows, so this cursor is the seam
  that translates batch-level signals into row-level advances.

- MetadataResultSetsTest: drop the JDBC ResultSet-shape slice (next /
  isClosed / getStatement / unwrap / isWrapperFor / getHoldability /
  getFetchSize / setFetchSize / getWarnings / getConcurrency / getType
  / getFetchDirection). Those test the StreamingResultSet plumbing
  shared by every result set on this branch and are already covered by
  StreamingResultSetMethodTest. Keep the arity-contract slice
  (short/long/right/null/empty rows) — that is the
  metadata-result-set-specific behavior.

Addresses: review comments on PR #175.
@mkaufmann mkaufmann force-pushed the moritz/unify-result-sets branch from e40a2a8 to 84391cd Compare May 12, 2026 14:28
mkaufmann added a commit that referenced this pull request May 12, 2026
…ArrowStreamReader

Rework the ResultSet unification to address two reviewer requests on #175:

1. Share the vector-building code with the parameter-encoding path instead
   of having a dedicated MetadataArrowBuilder. VectorPopulator now exposes a
   row-indexed primitive (setCell) used by both callers. The existing
   single-row parameter-binding overload and a new many-row metadata
   overload both funnel through it, and all the individual vector setters
   are parameterised by row index.

2. Keep ArrowStreamReaderCursor on its original ArrowStreamReader-only
   interface. The metadata path now serialises a populated VSR to Arrow
   IPC bytes and wraps the result in a ByteArrayInputStream-backed
   ArrowStreamReader, so both streaming and metadata result sets travel
   through exactly the same reader/cursor plumbing.

Supporting changes:

- typeName overrides (e.g. "TEXT" for JDBC-spec metadata columns) now
  round-trip through Arrow via a jdbc:type_name field-metadata key rather
  than a columns-override parameter on StreamingResultSet. HyperTypeToArrow
  stamps the key on write; ArrowToHyperTypeMapper.toColumnMetadata reads it
  back.
- StreamingResultSet drops the ofInMemory(...) factory and the columns
  override; callers construct an ArrowStreamReader + BufferAllocator pair
  and hand them to of(reader, allocator, queryId, zone). The cursor owns
  both and closes reader-then-allocator on close.
- QueryResultArrowStream.toArrowStreamReader returns a simple Result
  holder (reader + allocator) instead of an AutoCloseable bundle.
- MetadataResultSets is the single entry point for Arrow-backed metadata
  result sets; MetadataArrowBuilder is deleted.
- Empty metadata results skip writeBatch() entirely so ArrowStreamReaderCursor
  doesn't interpret a zero-row batch as "at least one row available".
- Tests updated to the new API; StreamingResultSetMethodTest builds its
  in-memory ResultSet the same way as the metadata path (IPC round-trip).
mkaufmann added a commit that referenced this pull request May 12, 2026
Now that QueryJDBCAccessor.getObject(Class) provides the raw + isInstance
fallback as its base-class default, StreamingResultSet no longer needs
the catch-and-retry path that worked around accessors which threw
"Operation not supported." Collapse getObject(int, Class) to direct
dispatch and update the regression test's WHY comment to point at the
accessor base class as the load-bearing layer.

Addresses: review comment on PR #175 line 388.
mkaufmann added a commit that referenced this pull request May 12, 2026
Three small follow-ups from PR #175 review:

- StreamingResultSet.of: drop the paragraph that pointed at the
  HyperTypeToArrow.JDBC_TYPE_NAME_METADATA_KEY field-metadata key. The
  docstring spilled implementation detail of the metadata-stamping path
  into a generic "create a result set from a reader" entry-point; the
  type-name override is documented at HyperTypeToArrow / ColumnMetadata
  where it's relevant.

- ArrowStreamReaderCursor.loadNextNonEmptyBatch: rewrite the rationale
  to answer "why does the cursor consume empty batches instead of the
  caller?" directly. Empty IPC batches are valid Arrow and producers
  emit them; JDBC's next() only knows rows, so this cursor is the seam
  that translates batch-level signals into row-level advances.

- MetadataResultSetsTest: drop the JDBC ResultSet-shape slice (next /
  isClosed / getStatement / unwrap / isWrapperFor / getHoldability /
  getFetchSize / setFetchSize / getWarnings / getConcurrency / getType
  / getFetchDirection). Those test the StreamingResultSet plumbing
  shared by every result set on this branch and are already covered by
  StreamingResultSetMethodTest. Keep the arity-contract slice
  (short/long/right/null/empty rows) — that is the
  metadata-result-set-specific behavior.

Addresses: review comments on PR #175.
@mkaufmann mkaufmann force-pushed the moritz/unify-result-sets branch from 84391cd to d90bb9a Compare May 12, 2026 14:57
mkaufmann added 15 commits May 12, 2026 21:11
QueryJDBCAccessor.getObject(Class) threw "Operation not supported" no
matter what. That's a problem: JDBC says
ResultSet.getObject(int, Class<T>) is supposed to return the value as
T when the conversion is trivial, but every accessor that didn't
override it would throw, even on the identity case like
getObject(col, String.class) against a VARCHAR. Callers had to either
know which accessors implement typed conversion or wrap calls in
catch-and-retry.

The new default covers the trivial cases:

1. null type -> SQLException with SQLState 22023.
2. String.class -> delegate to getString(). JDBC 4.2 Table B-5
   mandates this conversion for every column type.
3. Otherwise raw + isInstance, accepting any supertype/interface
   match. wasNull is set defensively when the raw value is null so
   stale state from an earlier non-null read does not leak.
4. Anything left throws SQLFeatureNotSupportedException (matching the
   JDBC spec's expectation for unsupported conversions, and consistent
   with the existing TimeStampVectorAccessor override).

Numeric narrowing (e.g. getObject(col, Integer.class) on a BIGINT
column) is intentionally NOT handled here. The accessor-level
primitive getters (getInt, getShort, getByte) use unchecked Java
casts -- delegating to them would silently truncate Long.MAX_VALUE
to -1 instead of refusing. Accessors that need lossless cross-type
conversion override getObject(Class) (TimeStampVectorAccessor for
the Instant / OffsetDateTime / LocalDateTime / etc. paths).

Tests in StreamingResultSetMethodTest cover:
- getObjectWithClassUsesAccessorBaseFallback: identity case
  (VARCHAR -> String) goes through the inherited String fast-path.
- getObjectWithSupertypeOrInterfaceReturnsValue: Object.class and
  CharSequence.class both work via isInstance -- proves polymorphic
  callers (raw Object holders, generic tooling) aren't blocked.
- getObjectWithNullClassThrows: null type parameter raises
  SQLException with the expected "must not be null" message.
- getObjectWithIncompatibleClassThrows: requesting an unrelated type
  (String column, StringBuilder asked) raises the typed conversion
  error.
- getObjectWithClassReturnsNullForNullValue: a null column value
  short-circuits and returns null regardless of the requested type;
  wasNull() is true afterwards.
Collapse the two ResultSet families (streaming Arrow + row-based
metadata) into a single Arrow-backed implementation so there is one
accessor pipeline, one set of type semantics, and one place to fix bugs.

Changes:

- ArrowStreamReaderCursor becomes source-agnostic: a BatchLoader drives
  a VectorSchemaRoot, whether sourced from an ArrowStreamReader or a
  pre-populated in-memory batch. The cursor also owns an AutoCloseable
  so it is responsible for releasing the allocator + reader on close —
  the old ArrowStreamReader.close() would only tear down vectors and
  leak the 100 MB RootAllocator.
- QueryResultArrowStream.toArrowStreamReader returns a Result holder
  that pairs the reader with the allocator and closes both in the
  right order so Arrow's accounting invariants hold.
- StreamingResultSet gains ofInMemory(root, owned, queryId, zone, cols)
  so metadata results funnel through the same result set. A columns
  override preserves the JDBC-spec typeName labels (e.g. TEXT) that
  would otherwise be lost when deriving from the Arrow schema.
- MetadataArrowBuilder materialises List<List<Object>> metadata rows
  into a populated VectorSchemaRoot using the existing HyperTypeToArrow
  mapping; MetadataResultSets is the factory callers use.
- QueryMetadataUtil and DataCloudDatabaseMetadata route getTables,
  getColumns, getSchemas, getTypeInfo and empty metadata results
  through the Arrow-backed StreamingResultSet.
- DataCloudMetadataResultSet, SimpleResultSet, and ColumnAccessor are
  removed now that no caller depends on them.
- StreamingResultSet.getObject(int, Class) gains an isInstance-based
  fallback so callers can retrieve String-typed VARCHAR columns without
  each accessor having to implement typed getObject.
- Tests moved to the unified path; integer-accessor-only assertions in
  DataCloudDatabaseMetadataTest updated to reflect stricter Arrow
  accessor semantics.
…ArrowStreamReader

Rework the ResultSet unification to address two reviewer requests on #175:

1. Share the vector-building code with the parameter-encoding path instead
   of having a dedicated MetadataArrowBuilder. VectorPopulator now exposes a
   row-indexed primitive (setCell) used by both callers. The existing
   single-row parameter-binding overload and a new many-row metadata
   overload both funnel through it, and all the individual vector setters
   are parameterised by row index.

2. Keep ArrowStreamReaderCursor on its original ArrowStreamReader-only
   interface. The metadata path now serialises a populated VSR to Arrow
   IPC bytes and wraps the result in a ByteArrayInputStream-backed
   ArrowStreamReader, so both streaming and metadata result sets travel
   through exactly the same reader/cursor plumbing.

Supporting changes:

- typeName overrides (e.g. "TEXT" for JDBC-spec metadata columns) now
  round-trip through Arrow via a jdbc:type_name field-metadata key rather
  than a columns-override parameter on StreamingResultSet. HyperTypeToArrow
  stamps the key on write; ArrowToHyperTypeMapper.toColumnMetadata reads it
  back.
- StreamingResultSet drops the ofInMemory(...) factory and the columns
  override; callers construct an ArrowStreamReader + BufferAllocator pair
  and hand them to of(reader, allocator, queryId, zone). The cursor owns
  both and closes reader-then-allocator on close.
- QueryResultArrowStream.toArrowStreamReader returns a simple Result
  holder (reader + allocator) instead of an AutoCloseable bundle.
- MetadataResultSets is the single entry point for Arrow-backed metadata
  result sets; MetadataArrowBuilder is deleted.
- Empty metadata results skip writeBatch() entirely so ArrowStreamReaderCursor
  doesn't interpret a zero-row batch as "at least one row available".
- Tests updated to the new API; StreamingResultSetMethodTest builds its
  in-memory ResultSet the same way as the metadata path (IPC round-trip).
StreamingResultSet.of catches IOException and IllegalArgumentException
from the Arrow schema decode and rewraps as SQLException. At all four
query-path call sites (DataCloudConnection.getRowBasedResultSet,
getChunkBasedResultSet, DataCloudStatement.executeQuery, getResultSet)
the surrounding try-catch only catches StatusRuntimeException, so a
SQLException thrown from of() bypasses it and leaks the 100 MB
RootAllocator returned by QueryResultArrowStream.toArrowStreamReader.

Introduce StreamingResultSet.ofClosingOnFailure(Result, queryId,
sessionZone) that takes the reader+allocator pair and closes both on
construction failure (reader first so its buffers release before the
allocator's budget check). Switch all four call sites to it.

The metadata path in MetadataResultSets.of already had this shape; this
fixes the matching gap on the query side.

Add a regression test that builds an Arrow IPC stream with an
unsupported field type (LargeUtf8) and asserts the helper closes both
the reader and the allocator on the resulting SQLException.
The Int/SmallInt/TinyInt setters widened from concrete boxed types
(Integer/Short/Byte) to Number so metadata rows could pass long values,
but lost the implicit "right boxed type" check at the call sites that
went through DataCloudPreparedStatement.setObject for parameter
binding. A user binding Long.MAX_VALUE to an INT32 parameter would
silently get (int) Long.MAX_VALUE = -1 written to the vector.

Add an explicit range check on Int/SmallInt/TinyInt setters before
narrowing. Both the metadata path and the parameter-binding path go
through these setters, so strict checks here mean strict on both
paths. BigInt accepts the full long range and is unchanged.

Pin the behavior with a focused unit test (IntegerVectorSetterRangeCheckTest).
The driver round-trips JDBC-spec type-name overrides (e.g. "TEXT" for
metadata columns) through Arrow field metadata under a custom key. The
previous key "jdbc:type_name" used an unprefixed namespace not reserved
by the Arrow spec — Hyper, query-federator, or another Arrow producer
could emit a same-named key in a future protocol version, in which case
ArrowToHyperTypeMapper would silently override its own derived type
name with whatever upstream stamped.

Rename to "datacloud-jdbc:type_name" so the namespace is unambiguous,
and expand the field's javadoc to document the namespace rationale.
The fallback in ArrowToHyperTypeMapper.toColumnMetadata — when a field
has no datacloud-jdbc:type_name override, ColumnMetadata.typeName is
null and the JDBC layer derives the column type-name from the
HyperType — was load-bearing but unasserted. Real Hyper Arrow streams
never stamp the override, so every functional query test exercised the
fallback implicitly; if a future refactor broke it, the regression
would not surface in the existing suite.

Two new pin tests:
- ArrowToHyperTypeMapperTest at the unit boundary: field with override
  -> typeName matches; field without override (null metadata, empty
  metadata) -> typeName is null.
- StreamingResultSetTest.getColumnTypeNameFallsBackToDerivedNameOnRealHyperStream
  end-to-end against local Hyper: executeQuery on a select with INT,
  VARCHAR, DECIMAL columns asserts ResultSetMetaData.getColumnTypeName
  returns the derived names ("INTEGER", "VARCHAR", "DECIMAL").
Drive-by pin test: StreamingResultSet.getObject(int, Map<String,Class<?>>)
with a null or empty type map should behave like plain getObject(int)
per the JDBC spec. Previously not asserted anywhere.

The companion getObject(Class) fallback test landed earlier on this
branch, bundled into the QueryJDBCAccessor base-class fix commit so
the fix and its end-to-end coverage ship as a single cherry-pickable
unit.
Previously a row with the wrong number of elements would silently leave
the trailing columns as Arrow null (interpreted as missing values).
Today every caller routes through MetadataSchemas so the sizes match by
construction, but a future caller bug would surface only inside vector
population, far from the boundary.

Add an explicit arity check at the of(...) entrypoint: each non-null
row must have exactly columns.size() elements. Null rows are accepted
as the all-nulls row (matching the legacy coerceRows convention of
turning null into emptyList). Empty rows are accepted only when the
schema is also empty.

Pin behavior with MetadataResultSetsTest covering short, long,
correct-arity, null-row, and empty-rows cases.
Now that ArrowStreamReaderCursor.loadNextNonEmptyBatch (introduced
earlier on this branch as a pre-unify cursor fix) consumes empty
batches at the cursor seam, MetadataResultSets.writeArrowStream no
longer needs its own "skip writeBatch when rowCount==0" workaround:
the cursor handles the empty-only case correctly. Remove the special
case and always emit a batch.

Tightens the zeroRowOnlyBatchYieldsNoRows test docstring to match.
DataCloudMetadataResultSet was deleted in this PR, but the test file
retained the old name and lived in the wrong package. Merge its empty-
result-set JDBC-shape smoke tests into the new MetadataResultSetsTest
under the .core.metadata package and delete the legacy file. No
behavior change.
Now that QueryJDBCAccessor.getObject(Class) provides the raw + isInstance
fallback as its base-class default, StreamingResultSet no longer needs
the catch-and-retry path that worked around accessors which threw
"Operation not supported." Collapse getObject(int, Class) to direct
dispatch and update the regression test's WHY comment to point at the
accessor base class as the load-bearing layer.

Addresses: review comment on PR #175 line 388.
Three small follow-ups from PR #175 review:

- StreamingResultSet.of: drop the paragraph that pointed at the
  HyperTypeToArrow.JDBC_TYPE_NAME_METADATA_KEY field-metadata key. The
  docstring spilled implementation detail of the metadata-stamping path
  into a generic "create a result set from a reader" entry-point; the
  type-name override is documented at HyperTypeToArrow / ColumnMetadata
  where it's relevant.

- ArrowStreamReaderCursor.loadNextNonEmptyBatch: rewrite the rationale
  to answer "why does the cursor consume empty batches instead of the
  caller?" directly. Empty IPC batches are valid Arrow and producers
  emit them; JDBC's next() only knows rows, so this cursor is the seam
  that translates batch-level signals into row-level advances.

- MetadataResultSetsTest: drop the JDBC ResultSet-shape slice (next /
  isClosed / getStatement / unwrap / isWrapperFor / getHoldability /
  getFetchSize / setFetchSize / getWarnings / getConcurrency / getType
  / getFetchDirection). Those test the StreamingResultSet plumbing
  shared by every result set on this branch and are already covered by
  StreamingResultSetMethodTest. Keep the arity-contract slice
  (short/long/right/null/empty rows) — that is the
  metadata-result-set-specific behavior.

Addresses: review comments on PR #175.
StreamingResultSet had two public factories — of(reader, allocator,
queryId[, zone]) (4 callers) and ofClosingOnFailure(Result, queryId,
zone) (5 callers). Every production caller wanted the close-on-failure
behavior; only tests and the metadata helper used the bare of(). Two
factories with overlapping responsibilities is one too many — a caller
hitting the bare of() and not knowing about ofClosingOnFailure would
silently leak the 100 MB RootAllocator on construction failure.

Collapse to one public factory:
- of(QueryResultArrowStream.Result, queryId, sessionZone) — the only
  factory callers see, always closes both reader and allocator on
  failure. Name is the unambiguous "of" because there is no other.
- create(reader, allocator, queryId, sessionZone) — private; just the
  construction body the factory wraps.

Production call sites (DataCloudConnection, DataCloudStatement) and
MetadataResultSets were already passing a (reader, allocator) pair,
so the call shape collapses to passing the Result holder. Tests that
were building the pair locally now wrap it in a Result the same way.
…r interface

Pre-unify there were three result-set implementations: StreamingResultSet
(streaming Arrow query results), DataCloudMetadataResultSet (metadata),
SimpleResultSet (in-memory rows). The DataCloudResultSet interface — a
one-method (getQueryId) extension over java.sql.ResultSet — was the
common "implements" the public API surfaced; StreamingResultSet was the
only one that ever implemented it as a non-trivial impl.

The unify refactor collapsed all three implementations into
StreamingResultSet, but kept the interface and the "Streaming" name. Two
problems fall out:

- The "Streaming" name now lies. Metadata results flow through the same
  class but they're a one-shot in-memory IPC blob — nothing streaming
  about them. MetadataResultSets.of even passes /*queryId=*/ null
  because there is no query.
- The DataCloudResultSet interface has one implementer and one method.
  Layering an interface for one impl is just a reader trap: callers
  instinctively look for "what other implementations exist" and find
  none.

Collapse the two:
- Rename the class StreamingResultSet -> DataCloudResultSet.
- Delete the old DataCloudResultSet interface (the public method
  getQueryId() now lives directly on the class via @Getter).
- Update all production and test references; rename the affected test
  files to match (StreamingResultSet*Test -> DataCloudResultSet*Test).

The public API surface is unchanged in source for the common cases:
DataCloudConnection.getRowBasedResultSet / getChunkBasedResultSet still
return DataCloudResultSet, just as a class instead of an interface.
This is binary-incompatible for any caller that ever cast to or
implemented the old interface; in practice only StreamingResultSet
implemented it on the read side, and no code outside the driver
implemented it on the write side.
@mkaufmann mkaufmann force-pushed the moritz/unify-result-sets branch from d90bb9a to b83e8b7 Compare May 12, 2026 19:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant