Skip to content

fix serialize_template_field handling callable value in dict#63871

Open
wjddn279 wants to merge 3 commits into
apache:mainfrom
wjddn279:fix-serialize-template-field-handling-callable-in-dict
Open

fix serialize_template_field handling callable value in dict#63871
wjddn279 wants to merge 3 commits into
apache:mainfrom
wjddn279:fix-serialize-template-field-handling-callable-in-dict

Conversation

@wjddn279
Copy link
Copy Markdown
Contributor

@wjddn279 wjddn279 commented Mar 18, 2026

closed: #63334, #65705, #65674

Cause

The serialization result of the DAG is as follows.

{
  "__var": {
    "op_args": [],
    "task_id": "consume",
    "ui_color": "#ffefeb",
    "op_kwargs": "{'values': [3, 1, 2], 'sort_key': <function unstable_dag.<locals>.<lambda> at 0xf2dc11567e20>}",
    "task_type": "_PythonDecoratedOperator",
    "retry_delay": 300.0,
    "_task_module": "airflow.providers.standard.decorators.python",
    "_operator_name": "@task",
    "template_fields": [
      "templates_dict",
      "op_args",
      "op_kwargs"
    ],
    "_needs_expansion": false,
    "python_callable_name": "consume",
    "template_fields_renderers": {
      "op_args": "py",
      "op_kwargs": "py",
      "templates_dict": "json"
    }
  },

When serializing a dict containing lambda function definitions, the object's id value is reflected as-is, causing version changes on every serialization. The root cause is that when a dict contains callable objects, is_jsonable returns False, and since the dict itself is not callable, it falls through to that code path.

Solution

I added logic to filter out this case and convert callable objects consistently within the existing dict handling (sorting) logic. I have confirmed locally that this resolves the issue.


Was generative AI tooling used to co-author this PR?
  • Yes (please specify the tool below)

  • Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
  • For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
  • When adding dependency, check compliance with the ASF 3rd Party License Policy.
  • For significant user-facing changes create newsfragment: {pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.

@wjddn279 wjddn279 force-pushed the fix-serialize-template-field-handling-callable-in-dict branch from 4a10da9 to 6f6156f Compare March 18, 2026 12:30
@eladkal eladkal added this to the Airflow 3.1.9 milestone Mar 24, 2026
@potiuk potiuk added the ready for maintainer review Set after triaging when all criteria pass. label Apr 1, 2026
@kaxil kaxil requested a review from Copilot April 2, 2026 00:45
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes non-deterministic DAG serialization when templated dict fields contain callable values (e.g., lambdas), preventing serialized output from changing on every run due to function object addresses.

Changes:

  • Sanitizes callable values during recursive dict sorting in serialize_template_field.
  • Refactors callable stringification into a helper for consistent formatting.
  • Adds unit tests intended to validate stable serialization when dicts contain callables.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
airflow-core/src/airflow/serialization/helpers.py Makes callable values in dicts serialize to a stable string during recursive sorting.
airflow-core/tests/unit/serialization/test_helpers.py Adds regression tests around dicts containing callables to prevent unstable serialization.

Comment thread airflow-core/src/airflow/serialization/helpers.py
Comment thread airflow-core/src/airflow/serialization/helpers.py Outdated
Comment thread airflow-core/tests/unit/serialization/test_helpers.py Outdated
Comment thread airflow-core/src/airflow/serialization/helpers.py Outdated
@eladkal eladkal added backport-to-v3-2-test Mark PR with this label to backport to v3-2-test branch and removed backport-to-v3-1-test labels Apr 6, 2026
@eladkal eladkal modified the milestones: Airflow 3.1.9, Airflow 3.2.1 Apr 6, 2026
@wjddn279 wjddn279 force-pushed the fix-serialize-template-field-handling-callable-in-dict branch 6 times, most recently from 7c4b179 to 8eb05f8 Compare April 10, 2026 05:46
@wjddn279 wjddn279 requested a review from XD-DENG as a code owner April 10, 2026 05:46
@kaxil kaxil requested a review from Copilot April 10, 2026 19:55
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

Comment thread airflow-core/src/airflow/serialization/helpers.py Outdated
@wjddn279 wjddn279 force-pushed the fix-serialize-template-field-handling-callable-in-dict branch from e339121 to 6771c87 Compare April 16, 2026 00:10
@wjddn279 wjddn279 force-pushed the fix-serialize-template-field-handling-callable-in-dict branch from 6771c87 to aebce69 Compare April 30, 2026 06:33
Copy link
Copy Markdown
Contributor

@ephraimbuddy ephraimbuddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks — root-cause analysis is right and the <callable …> token is the correct deterministic fix. One blocker, plus a few nits.

Blocker: sorted(obj.items()) regresses mixed-key dicts.

The new dict branch in not is_jsonable(...) calls sort_and_make_static_dict_recursively, which does sorted(obj.items()) at helpers.py:73. A non-jsonable dict with mixed key types like {1: "a", "b": lambda: None} previously fell through to str(template_field) and serialized fine; now it raises TypeError: '<' not supported between instances of 'str' and 'int', which propagates (the try only catches AttributeError) and breaks DAG serialization.

>>> serialize_template_field({1: "a", "b": lambda: None}, "op_kwargs")
TypeError: '<' not supported between instances of 'str' and 'int'

The pre-existing sort_dict_recursively (now line 107) has the same bug today for jsonable mixed-key dicts — {1: "a", "b": "c"} already raises on main. One fix can cover both call sites:

sorted(obj.items(), key=lambda kv: (type(kv[0]).__name__, repr(kv[0])))

This keeps the determinism the sort_* helpers exist for, without crashing on heterogeneous keys. A try/except TypeError fallback to unsorted iteration would also work but loses the consistency guarantee. Please add a regression test covering the mixed-key + callable case.

Comment thread airflow-core/src/airflow/serialization/helpers.py Outdated
Comment thread airflow-core/src/airflow/serialization/helpers.py Outdated
Comment thread airflow-core/tests/unit/serialization/test_helpers.py Outdated
Comment thread airflow-core/tests/unit/serialization/test_helpers.py Outdated
@wjddn279 wjddn279 force-pushed the fix-serialize-template-field-handling-callable-in-dict branch from 4d13583 to 6b6ecbd Compare May 4, 2026 07:23
Copy link
Copy Markdown
Contributor

@amoghrajesh amoghrajesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delayed review, some comments.

Comment thread airflow-core/src/airflow/serialization/helpers.py Outdated
Comment thread airflow-core/src/airflow/serialization/helpers.py Outdated
Comment thread airflow-core/src/airflow/serialization/helpers.py Outdated
Comment thread airflow-core/tests/unit/serialization/test_dag_serialization.py
Comment thread airflow-core/tests/unit/dags/test_dag_decorator_version.py
@wjddn279
Copy link
Copy Markdown
Contributor Author

wjddn279 commented May 5, 2026

@ephraimbuddy

I've written logic that handles those issues all at once. (with #65705). Could you take another look?

@wjddn279
Copy link
Copy Markdown
Contributor Author

wjddn279 commented May 7, 2026

ping @ephraimbuddy !

@ephraimbuddy
Copy link
Copy Markdown
Contributor

@ephraimbuddy

I've written logic that handles those issues all at once. (with #65705). Could you take another look?

This now regressed well from earlier.

Comment thread airflow-core/src/airflow/serialization/helpers.py Outdated
Comment thread airflow-core/src/airflow/serialization/helpers.py Outdated
Comment thread airflow-core/tests/unit/serialization/test_dag_serialization.py
@wjddn279 wjddn279 requested a review from kaxil as a code owner May 8, 2026 08:09
@wjddn279 wjddn279 force-pushed the fix-serialize-template-field-handling-callable-in-dict branch from 5640041 to 6f46715 Compare May 9, 2026 05:20
pytest.param({}, {}, id="empty_dict"),
pytest.param((), [], id="empty_tuple"),
pytest.param(set(), "set()", id="empty_set"),
pytest.param(set(), [], id="empty_set"),
Copy link
Copy Markdown
Contributor Author

@wjddn279 wjddn279 May 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update tests to reflect logic change: set and frozenset are now converted to list instead of being string-cast.

related https://github.com/apache/airflow/pull/63871/changes#r3212530026

pytest.param({"foo": "bar"}, {"foo": "bar"}, id="dict"),
pytest.param(("foo", "bar"), ["foo", "bar"], id="tuple"),
pytest.param({"foo"}, "{'foo'}", id="set"),
pytest.param({"foo"}, ["foo"], id="set"),
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pytest.param(
{"my_tup": (1, 2), "my_set": {1, 2, 3}},
{"my_tup": [1, 2], "my_set": "{1, 2, 3}"},
{"my_tup": [1, 2], "my_set": [1, 2, 3]},
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if isinstance(obj, (list, tuple)):
return [serialize_object(item) for item in obj]

if isinstance(obj, (set, frozenset)):
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously set/frozenset values were converted via str(), which leaked memory addresses for custom objects and destabilized the DAG hash on every parse. The new logic recursively serializes each element and returns a sorted list, producing deterministic, JSON-encodable output.

Copy link
Copy Markdown
Contributor

@ephraimbuddy ephraimbuddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need to align this with the same function in task runner

Comment thread airflow-core/src/airflow/serialization/helpers.py Outdated
Comment thread airflow-core/src/airflow/serialization/helpers.py Outdated
@wjddn279 wjddn279 force-pushed the fix-serialize-template-field-handling-callable-in-dict branch from 6f46715 to 1c08c9c Compare May 12, 2026 08:37
@wjddn279 wjddn279 force-pushed the fix-serialize-template-field-handling-callable-in-dict branch from 1c08c9c to 8a4112c Compare May 12, 2026 08:41
# Serialize keys/values first so each key is a string and the output is hash-stable,
# then sort by the serialized key to prevents hash inconsistencies when dict ordering varies.
serialized_pairs = [(normalize_dict_key(k), serialize_object(v)) for k, v in obj.items()]
return dict(sorted(serialized_pairs, key=lambda kv: kv[0]))
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since keys are now fixed as strings by normalize_dict_key, the logic that sorted by key type alongside the key value has been removed. Sorting is now performed solely by the key value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:DAG-processing backport-to-v3-2-test Mark PR with this label to backport to v3-2-test branch ready for maintainer review Set after triaging when all criteria pass. type:bug-fix Changelog: Bug Fixes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DAG version churn when using unstable parse-time callable serialization

7 participants