Add Iceberg snapshot-id time travel support (version=)#4231
Add Iceberg snapshot-id time travel support (version=)#4231sfc-gh-igarish wants to merge 4 commits into
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #4231 +/- ##
==========================================
- Coverage 95.40% 95.39% -0.02%
==========================================
Files 171 171
Lines 44205 44235 +30
Branches 7548 7557 +9
==========================================
+ Hits 42174 42198 +24
- Misses 1247 1250 +3
- Partials 784 787 +3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
sfc-gh-mayliu
left a comment
There was a problem hiding this comment.
Is there a Snowflake doc explaining that the AT(VERSION=> <id>) syntax is officially supported? Is VERSION/snapshot-id specific to iceberg tables only? It'd be great to get more context for the review.
Also, could you add some integ tests in test_dataframe.py for E2E validation?
Yes this is for unmanaged Iceberg table only. https://docs.google.com/document/d/1Z7GlEWbqZRCYr4NDjh-7klgvJJAp0jF45p9GH1MbQ4M/edit?tab=t.0 |
sfc-gh-mayliu
left a comment
There was a problem hiding this comment.
As per offline discussions, please look into:
- add parameter protection for this change to isolate downstream SCOS use and Snowpark-python usage
- add integ tests to validate E2E behaviors; if Iceberg credentials cannot be resolved systematically, mark it with
pytest.skipbut reusable for manual testing
…enabled
Adds an umbrella session flag (default ``False``) that gates the
``version=`` / ``snapshot-id`` time-travel surface added by the previous
commit:
* ``Session.iceberg_features_enabled`` — public bool property + setter
(locking + bool coercion). Mirrors the existing pattern used by
``sql_simplifier_enabled`` / ``cte_optimization_enabled``.
* ``Session._require_iceberg_features_enabled(feature=...)`` — internal
helper that raises ``SnowparkClientException`` pointing users at the
flag.
Gates are placed at all three entry points so the kwarg and the Spark
Iceberg ``snapshot-id`` / ``snapshot_id`` reader option are both
rejected when the flag is off:
* ``Session.table(name, version=...)``
* ``DataFrameReader.table(name, version=...)`` and
``.option("snapshot-id", N).table(...)`` (also catches ``snapshot_id``)
* ``Table(name, ..., version=...)`` direct construction
Default ``False`` because the underlying Snowflake feature
(``AT(VERSION => N)`` on unmanaged Iceberg tables) is itself gated
behind the ``FEATURE_ICEBERG_TIME_TRAVEL`` account parameter and
currently scoped to Catalog-Linked Databases. Snowpark Connect (SAS)
flips the flag per-session when its
``snowpark.connect.iceberg.timeTravel.enabled`` config is set.
Tests
-----
``tests/unit/test_iceberg_features_flag.py`` (new, 13 tests):
* Default + setter + helper semantics.
* Each of the three entry-point gates raises when the flag is off and
is bypassed cleanly when the flag is on.
* Three AST-emission coverage tests patch ``with_src_position`` to
return a stand-in with a ``version`` field, so the
``hasattr(ast, "version")``-guarded ``ast.version.value = version``
line in ``session.py`` / ``dataframe_reader.py`` / ``table.py``
actually executes. This addresses the codecov gap (3 missing lines
flagged by the patch report on PR #4231).
``tests/integ/test_dataframe.py`` (3 new tests, all
``@pytest.mark.skip``):
* End-to-end ``Session.table(..., version=N)`` against a multi-snapshot
Iceberg table.
* End-to-end ``session.read.option('snapshot-id', N).table(...)``.
* End-to-end gate verification (flag off + ``version=`` → raises).
All three are skipped because they need a Catalog-Linked Database
(cldUnity / cldglue) with an unmanaged Iceberg table that has multiple
snapshots AND ``FEATURE_ICEBERG_TIME_TRAVEL`` enabled on the account
— neither of which the standard snowpark-python integ accounts have
today. They run manually against sfctest0. A TODO above the block
tracks wiring them into CI once an Iceberg-capable integ account
exists.
Addresses @sfc-gh-mayliu review comments on PR #4231 (integ test
coverage) and the codecov patch-coverage drop.
Co-authored-by: Cursor <cursoragent@cursor.com>
@sfc-gh-mayliu thanks for the review. Pushed commit 1. Snowflake doc / scope of Yes — this syntax is the supported way to do snapshot-id time travel on unmanaged Iceberg tables in Catalog-Linked Databases (Unity / Glue). Spec lives in this design doc and is gated server-side behind the 2. Gating the Snowpark surface Because the underlying Snowflake feature is itself account-gated, this PR now also gates the Snowpark API behind a session flag, default ```python With the flag off, all three entry points ( 3. Integ tests in Added three E2E tests (
Skipped because they need a CLD ( 4. Codecov patch coverage Added |
sfc-gh-mayliu
left a comment
There was a problem hiding this comment.
Thanks for protecting the code with flag, it makes the change much safer. Have a few more thoughts below. Also, could you please attach a manual test result for the skipped integ test that requires CLD?
…enabled
Adds an umbrella session flag (default ``False``) that gates the
``version=`` / ``snapshot-id`` time-travel surface added by the previous
commit:
* ``Session.iceberg_features_enabled`` — public bool property + setter
(locking + bool coercion). Mirrors the existing pattern used by
``sql_simplifier_enabled`` / ``cte_optimization_enabled``.
* ``Session._require_iceberg_features_enabled(feature=...)`` — internal
helper that raises ``SnowparkClientException`` pointing users at the
flag.
Gates are placed at all three entry points so the kwarg and the Spark
Iceberg ``snapshot-id`` / ``snapshot_id`` reader option are both
rejected when the flag is off:
* ``Session.table(name, version=...)``
* ``DataFrameReader.table(name, version=...)`` and
``.option("snapshot-id", N).table(...)`` (also catches ``snapshot_id``)
* ``Table(name, ..., version=...)`` direct construction
Default ``False`` because the underlying Snowflake feature
(``AT(VERSION => N)`` on unmanaged Iceberg tables) is itself gated
behind the ``FEATURE_ICEBERG_TIME_TRAVEL`` account parameter and
currently scoped to Catalog-Linked Databases. Snowpark Connect (SAS)
flips the flag per-session when its
``snowpark.connect.iceberg.timeTravel.enabled`` config is set.
Tests
-----
``tests/unit/test_iceberg_features_flag.py`` (new, 13 tests):
* Default + setter + helper semantics.
* Each of the three entry-point gates raises when the flag is off and
is bypassed cleanly when the flag is on.
* Three AST-emission coverage tests patch ``with_src_position`` to
return a stand-in with a ``version`` field, so the
``hasattr(ast, "version")``-guarded ``ast.version.value = version``
line in ``session.py`` / ``dataframe_reader.py`` / ``table.py``
actually executes. This addresses the codecov gap (3 missing lines
flagged by the patch report on PR #4231).
``tests/integ/test_dataframe.py`` (3 new tests, all
``@pytest.mark.skip``):
* End-to-end ``Session.table(..., version=N)`` against a multi-snapshot
Iceberg table.
* End-to-end ``session.read.option('snapshot-id', N).table(...)``.
* End-to-end gate verification (flag off + ``version=`` → raises).
All three are skipped because they need a Catalog-Linked Database
(cldUnity / cldglue) with an unmanaged Iceberg table that has multiple
snapshots AND ``FEATURE_ICEBERG_TIME_TRAVEL`` enabled on the account
— neither of which the standard snowpark-python integ accounts have
today. They run manually against sfctest0. A TODO above the block
tracks wiring them into CI once an Iceberg-capable integ account
exists.
Addresses @sfc-gh-mayliu review comments on PR #4231 (integ test
coverage) and the codecov patch-coverage drop.
Co-authored-by: Cursor <cursoragent@cursor.com>
1c72e04 to
ce01976
Compare
|
Pushed 7563e82 to address review feedback from @sfc-gh-aling and @sfc-gh-mayliu. 1. Drop 2. Move 3. Duplicate copyright header in 4. Tests
Local: Diff is now |
sfc-gh-mayliu
left a comment
There was a problem hiding this comment.
LGTM, thanks for supporting the new time travel options for iceberg
|
CI note — 6 failing tests are unrelated to this PR The three failing
All six raise the same SQL compilation error: This is an environmental issue with the CI Snowflake account (missing / unauthorized
I've triggered a re-run of the three failed jobs in case the underlying object access was transient. If they fail the same way again, this is firmly a |
Adds a new ``version`` field to ``TimeTravelConfig`` that emits the
Snowflake ``AT(VERSION => <snapshot_id>)`` SQL clause for Iceberg snapshot
id time travel. The new kwarg is wired into:
- ``snowflake.snowpark.Session.table(version=...)``
- ``snowflake.snowpark.DataFrameReader.table(version=...)``
- ``snowflake.snowpark.Table(version=...)``
For Spark Iceberg compatibility, ``DataFrameReader.option("snapshot-id", N)``
(and the ``snapshot_id`` variant) is also accepted: the helper
``_extract_time_travel_from_options`` aliases it to the ``version`` kwarg
and auto-sets ``time_travel_mode="at"``. String snapshot ids are coerced
to int.
Validation:
- ``version`` requires ``time_travel_mode='at'`` (matches Snowflake
grammar and Spark Iceberg semantics of "snapshot N", not "before N")
- ``version`` must be ``int`` (bool explicitly rejected)
- exactly one of statement/offset/timestamp/stream/version
Mirrors the existing ``iceberg_tag`` pattern. Tests in
``tests/unit/test_utils.py`` cover SQL emission, validation errors, and
the Spark-compat option aliases.
Co-authored-by: Cursor <cursoragent@cursor.com>
…enabled
Adds an umbrella session flag (default ``False``) that gates the
``version=`` / ``snapshot-id`` time-travel surface added by the previous
commit:
* ``Session.iceberg_features_enabled`` — public bool property + setter
(locking + bool coercion). Mirrors the existing pattern used by
``sql_simplifier_enabled`` / ``cte_optimization_enabled``.
* ``Session._require_iceberg_features_enabled(feature=...)`` — internal
helper that raises ``SnowparkClientException`` pointing users at the
flag.
Gates are placed at all three entry points so the kwarg and the Spark
Iceberg ``snapshot-id`` / ``snapshot_id`` reader option are both
rejected when the flag is off:
* ``Session.table(name, version=...)``
* ``DataFrameReader.table(name, version=...)`` and
``.option("snapshot-id", N).table(...)`` (also catches ``snapshot_id``)
* ``Table(name, ..., version=...)`` direct construction
Default ``False`` because the underlying Snowflake feature
(``AT(VERSION => N)`` on unmanaged Iceberg tables) is itself gated
behind the ``FEATURE_ICEBERG_TIME_TRAVEL`` account parameter and
currently scoped to Catalog-Linked Databases. Snowpark Connect (SAS)
flips the flag per-session when its
``snowpark.connect.iceberg.timeTravel.enabled`` config is set.
Tests
-----
``tests/unit/test_iceberg_features_flag.py`` (new, 13 tests):
* Default + setter + helper semantics.
* Each of the three entry-point gates raises when the flag is off and
is bypassed cleanly when the flag is on.
* Three AST-emission coverage tests patch ``with_src_position`` to
return a stand-in with a ``version`` field, so the
``hasattr(ast, "version")``-guarded ``ast.version.value = version``
line in ``session.py`` / ``dataframe_reader.py`` / ``table.py``
actually executes. This addresses the codecov gap (3 missing lines
flagged by the patch report on PR #4231).
``tests/integ/test_dataframe.py`` (3 new tests, all
``@pytest.mark.skip``):
* End-to-end ``Session.table(..., version=N)`` against a multi-snapshot
Iceberg table.
* End-to-end ``session.read.option('snapshot-id', N).table(...)``.
* End-to-end gate verification (flag off + ``version=`` → raises).
All three are skipped because they need a Catalog-Linked Database
(cldUnity / cldglue) with an unmanaged Iceberg table that has multiple
snapshots AND ``FEATURE_ICEBERG_TIME_TRAVEL`` enabled on the account
— neither of which the standard snowpark-python integ accounts have
today. They run manually against sfctest0. A TODO above the block
tracks wiring them into CI once an Iceberg-capable integ account
exists.
Addresses @sfc-gh-mayliu review comments on PR #4231 (integ test
coverage) and the codecov patch-coverage drop.
Co-authored-by: Cursor <cursoragent@cursor.com>
- Make `option("version", ...)` auto-set `time_travel_mode="at"` so all
three aliases (snapshot-id / snapshot_id / version) share the same
semantics and the docstring claim "Automatically sets
time_travel_mode='at'" holds for every documented path. Reject
`option("version", N) + time_travel_mode="before"` for the same reason
Snowflake rejects it for snapshot-id: AT(VERSION => N) is the only
valid form.
- Extend the `iceberg_features_enabled` gate in `DataFrameReader.table`
to also cover `option("version", ...)`. Previously a user could
bypass the umbrella flag through the Snowpark-native reader option.
- Drop the dead `version` AST emission lines (and matching unit tests)
in `Session.table`, `DataFrameReader.table`, and `Table.__init__`.
The Table / ReadTable protos have no `version` field and the feature
is parameter-protected (gated behind `iceberg_features_enabled`,
consumed only by Snowpark Connect), so AST replay/telemetry is not
warranted today. Comments at each call site point to the one-line
restore when the proto is extended.
- Drop the Iceberg snapshot-id entry from `CHANGELOG.md` since the
feature is gated off by default and only Snowpark Connect flips it.
- Unskip `test_iceberg_snapshot_id_flag_gates_version_kwarg` — it only
exercises the client-side raise and needs neither a CLD nor an
Iceberg table. Also broaden it to cover the reader paths
(`session.read.table(version=...)`, `option("snapshot-id", ...)`,
`option("version", ...)`).
- Swap the `TODO(SNOW-NNNNNNN)` placeholder above the skipped CLD
integ tests for the actual parent JIRA, SNOW-3525585.
Co-authored-by: Cursor <cursoragent@cursor.com>
Addresses review feedback from May Lieu and Aling:
- Aling: drop the umbrella ``Session.iceberg_features_enabled`` property
and ``_require_iceberg_features_enabled`` helper. The flag added more
mental burden to direct Snowpark users without giving us a real
isolation benefit — the underlying ``AT(VERSION => N)`` syntax is
already account-gated by ``FEATURE_ICEBERG_TIME_TRAVEL`` server-side,
so any caller without it on will get a server error anyway.
- May Lieu: drop ``tests/unit/test_iceberg_features_flag.py`` entirely
(the only test for the removed flag), which also resolves the
duplicate copyright header she flagged in that file.
To keep the ``version=`` time-travel kwarg an internal-only surface
consumed by Snowpark Connect (rather than a first-class public API),
``version`` is no longer in the public signatures of
``Session.table`` / ``DataFrameReader.table`` / ``Table.__init__``.
Each method now accepts ``**kwargs`` and pops ``version`` inside, so
direct callers can still pass it but it's not advertised in the
docstring, doctest examples, or Sphinx-rendered signature.
For the same reason, the public ``option("version", N)`` alias is
dropped (``VERSION`` removed from ``_TIME_TRAVEL_OPTIONS_PARAMS_MAP``
and from ``_extract_time_travel_from_options``). The Spark Iceberg
``option("snapshot-id", N)`` / ``option("snapshot_id", N)`` aliases
remain — those are Spark-compat surface, not Snowpark-native, and they
auto-set ``time_travel_mode='at'`` exactly as before.
Tests
-----
- ``tests/unit/test_utils.py::test_extract_time_travel_snapshot_id_option``
is trimmed to cover only ``snapshot-id`` / ``snapshot_id``
(the ``option("version", ...)`` cases that no longer exist are
removed).
- ``tests/unit/test_utils.py::test_time_travel_version_snapshot_id``
is kept — ``version`` is still a valid ``TimeTravelConfig`` field,
just internal-only.
- Integ tests in ``tests/integ/test_dataframe.py``: the
``test_iceberg_snapshot_id_flag_gates_version_kwarg`` test is removed
(no flag to gate), and ``session.iceberg_features_enabled = True``
setup lines are dropped from the two remaining (skipped) CLD-required
E2E tests.
Local: ``tests/unit/test_utils.py`` (15 tests) and
``tests/unit/test_dataframe.py`` + ``tests/unit/test_session.py``
(88+1 skipped) all green.
Co-authored-by: Cursor <cursoragent@cursor.com>
7563e82 to
f519b75
Compare
Adds a new
versionfield toTimeTravelConfigthat emits the SnowflakeAT(VERSION => <snapshot_id>)SQL clause for Iceberg snapshot id time travel. The new kwarg is wired into:snowflake.snowpark.Session.table(version=...)snowflake.snowpark.DataFrameReader.table(version=...)snowflake.snowpark.Table(version=...)For Spark Iceberg compatibility,
DataFrameReader.option("snapshot-id", N)(and thesnapshot_idvariant) is also accepted: the helper_extract_time_travel_from_optionsaliases it to theversionkwarg and auto-setstime_travel_mode="at". String snapshot ids are coerced to int.Validation:
versionrequirestime_travel_mode='at'(matches Snowflake grammar and Spark Iceberg semantics of "snapshot N", not "before N")versionmust beint(bool explicitly rejected)Mirrors the existing
iceberg_tagpattern. Tests intests/unit/test_utils.pycover SQL emission, validation errors, and the Spark-compat option aliases.Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.
Fixes SNOW-NNNNNNN
Fill out the following pre-review checklist:
Please describe how your code solves the related issue.
Please write a short description of how your code change solves the related issue.