Skip to content

SNOW-3484790: initialize aggregation functions list during SCOS init#4217

Merged
sfc-gh-yuwang merged 15 commits into
mainfrom
SNOW-3484790
May 22, 2026
Merged

SNOW-3484790: initialize aggregation functions list during SCOS init#4217
sfc-gh-yuwang merged 15 commits into
mainfrom
SNOW-3484790

Conversation

@sfc-gh-yuwang
Copy link
Copy Markdown
Collaborator

@sfc-gh-yuwang sfc-gh-yuwang commented May 7, 2026

  1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

    Fixes SNOW-NNNNNNN

  2. Fill out the following pre-review checklist:

    • I am adding a new automated test(s) to verify correctness of my new code
      • If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing
    • I am adding new logging messages
    • I am adding a new telemetry message
    • I am adding new credentials
    • I am adding a new dependency
    • If this is a new feature/behavior, I'm adding the Local Testing parity changes.
    • I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: Thread-safe Developer Guidelines
    • If adding any arguments to public Snowpark APIs or creating new public Snowpark APIs, I acknowledge that I have ensured my changes include AST support. Follow the link for more information: AST Support Guidelines
  3. Please describe how your code solves the related issue.

    _retrieve_aggregation_function_list fires two blocking Snowflake queries on the first filter() call. The function is added to fix a bug in SCOS in previous PR. This PR meant to file async query instead at the beginning of snowpark session init.
    This change also remove the redundant select from infromation_schema.functions query as currently UDAF is not supported in scos and will not be used.

@sfc-gh-yuwang sfc-gh-yuwang requested review from a team as code owners May 7, 2026 18:35
@sfc-gh-yuwang sfc-gh-yuwang added the NO-CHANGELOG-UPDATES This pull request does not need to update CHANGELOG.md label May 7, 2026
@sfc-gh-yuwang
Copy link
Copy Markdown
Collaborator Author

the updated system function list is fetched with query:
SHOW FUNCTIONS ->> SELECT LISTAGG('"' || LOWER("name") || '"', ',\n') WITHIN GROUP (ORDER BY LOWER("name")) AS result FROM $1 WHERE "is_aggregate" = 'Y';

Comment thread src/snowflake/snowpark/session.py
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 7, 2026

Codecov Report

❌ Patch coverage is 93.33333% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 95.40%. Comparing base (0537409) to head (b951239).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
src/snowflake/snowpark/session.py 93.10% 2 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4217      +/-   ##
==========================================
+ Coverage   95.27%   95.40%   +0.13%     
==========================================
  Files         171      171              
  Lines       44158    44205      +47     
  Branches     7535     7548      +13     
==========================================
+ Hits        42071    42174     +103     
+ Misses       1295     1247      -48     
+ Partials      792      784       -8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sfc-gh-yuwang sfc-gh-yuwang changed the title SNOW-3484790: use local list instead of fetch from snowflake SNOW-3484790: initialize aggregation functions list during SCOS init May 11, 2026
Comment thread src/snowflake/snowpark/session.py Outdated
Comment thread src/snowflake/snowpark/session.py Outdated
Comment thread src/snowflake/snowpark/session.py Outdated
Comment thread src/snowflake/snowpark/session.py Outdated
self._client_telemetry = EventTableTelemetry(session=self)
self._agg_function_prefetch_job: Optional[AsyncJob] = None
# Guards the one-time atomic claim of _agg_function_prefetch_job.
self._agg_function_prefetch_lock = Lock()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's a way for the same thread to attempt to acquire this lock multiple times, but I think we should make this an RLock instead (which is already imported in this file) to be safe.

Comment thread src/snowflake/snowpark/session.py Outdated

with context._aggregation_function_set_lock:
context._aggregation_function_set.update(retrieved_set)
def _submit_internal_async_prefetch_query(self, query: str) -> Optional[AsyncJob]:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can we inline this method since it's only called once, and pretty short?

Comment thread tests/unit/test_session.py Outdated
Comment on lines +935 to +937
ctx._is_snowpark_connect_compatible_mode = True
ctx._snowpark_connect_flatten_select_after_sort = True
ctx._aggregation_function_set = set()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this instead of mocking these fields like in the other test?

Comment thread tests/unit/test_session.py Outdated
ctx._aggregation_function_set = orig[2]


def test_retrieve_agg_concurrent_waiters_see_result_not_sync_query():
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this different from test_concurrent_retrieve_agg_waiters_no_sync_query in the other test file?

set()
) # lower cased names of aggregation functions, used in sql simplification
_aggregation_function_set_lock = threading.RLock()
_aggregation_function_prefetch_state: dict[str, Any] = {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be a dict? Can we just use 3 variables or a singleton class instead?

Copy link
Copy Markdown
Collaborator Author

@sfc-gh-yuwang sfc-gh-yuwang May 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes we definitely can, I thought using a dictionary make it more clear that this is for the agg function prefetch

Comment thread src/snowflake/snowpark/session.py Outdated
Comment thread src/snowflake/snowpark/session.py
@sfc-gh-yuwang
Copy link
Copy Markdown
Collaborator Author

@sfc-gh-yuwang sfc-gh-yuwang merged commit 1253e7e into main May 22, 2026
31 checks passed
@sfc-gh-yuwang sfc-gh-yuwang deleted the SNOW-3484790 branch May 22, 2026 18:03
@github-actions github-actions Bot locked and limited conversation to collaborators May 22, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

NO-CHANGELOG-UPDATES This pull request does not need to update CHANGELOG.md

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants