Skip to content

Python: fix(core): coalesce streamed code_interpreter_tool_call chunks per call_id (fixes #5793)#6196

Open
hanhan761 wants to merge 1 commit into
microsoft:mainfrom
hanhan761:fix-5793-coalesce-ci-calls
Open

Python: fix(core): coalesce streamed code_interpreter_tool_call chunks per call_id (fixes #5793)#6196
hanhan761 wants to merge 1 commit into
microsoft:mainfrom
hanhan761:fix-5793-coalesce-ci-calls

Conversation

@hanhan761
Copy link
Copy Markdown

Summary

Coalesce streamed code_interpreter_tool_call chunks per call_id so that the finalized response contains one content item per logical code interpreter call instead of hundreds of incremental deltas.

Changes

_types.py

  • Added _coalesce_code_interpreter_tool_calls() — groups code_interpreter_tool_call items by call_id, keeps the chunk with the highest sequence_number (or longest text when sequence metadata is absent), and removes duplicates.
  • Added _code_interpreter_chunk_is_more_complete() — compares two CI call chunks using sequence_number from additional_properties, falling back to input text length.
  • Called _coalesce_code_interpreter_tool_calls() from _finalize_response() alongside the existing text and text_reasoning coalescing.

test_types.py

  • test_coalesce_code_interpreter_tool_calls_keeps_most_complete — streaming deltas + done collapse to the done event
  • test_coalesce_code_interpreter_tool_calls_groups_by_call_id — multiple distinct call_ids each keep their own winning chunk
  • test_coalesce_code_interpreter_tool_calls_preserves_non_ci_items — non-CI items are preserved
  • test_coalesce_code_interpreter_tool_calls_no_sequence_number — fallback to longest text
  • test_coalesce_code_interpreter_tool_calls_single_call_is_noop — single CI call unchanged

Issue

Fixes #5793

Verification

pytest packages/core/tests/core/test_types.py -k "test_coalesce_code_interpreter" -v

5/5 passed. Existing type tests also pass.

Copilot AI review requested due to automatic review settings May 30, 2026 07:13
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR adds post-processing to coalesce streamed code_interpreter_tool_call content chunks by call_id, keeping the most complete chunk, and introduces tests covering common coalescing scenarios.

Changes:

  • Add _coalesce_code_interpreter_tool_calls() and a completeness comparator for CI tool-call chunks.
  • Invoke CI coalescing during response finalization alongside existing text coalescing.
  • Add unit tests validating coalescing behavior across same/different call_ids, with and without sequence numbers.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 8 comments.

File Description
python/packages/core/agent_framework/_types.py Adds CI tool-call coalescing and applies it during response finalization.
python/packages/core/tests/core/test_types.py Adds tests for CI tool-call coalescing behavior.

Comment on lines +1976 to +1997
def _coalesce_code_interpreter_tool_calls(contents: list[Content]) -> None:
"""Coalesce code_interpreter_tool_call items with the same call_id, keeping the most complete chunk."""
best: dict[str, Content] = {}
first_pos: dict[str, int] = {}
drop_indices: set[int] = set()
for i, content in enumerate(contents):
if content.type != "code_interpreter_tool_call" or not content.call_id:
continue
cid = content.call_id
if cid in best:
if _code_interpreter_chunk_is_more_complete(content, best[cid]):
best[cid] = content
drop_indices.add(i)
else:
best[cid] = content
first_pos[cid] = i
if not drop_indices:
return
for cid, content in best.items():
contents[first_pos[cid]] = content
for idx in sorted(drop_indices, reverse=True):
contents.pop(idx)
Comment on lines +2000 to +2008
def _code_interpreter_chunk_is_more_complete(a: Content, b: Content) -> bool:
"""Return True if 'a' is more complete than 'b'."""
seq_a = a.additional_properties.get("sequence_number")
seq_b = b.additional_properties.get("sequence_number")
if seq_a is not None and seq_b is not None:
return seq_a > seq_b
len_a = len(a.inputs[0].text) if a.inputs else 0
len_b = len(b.inputs[0].text) if b.inputs else 0
return len_a > len_b
Comment on lines +2006 to +2007
len_a = len(a.inputs[0].text) if a.inputs else 0
len_b = len(b.inputs[0].text) if b.inputs else 0
assert contents[1].text == "Thinking B1 B2"


def test_coalesce_code_interpreter_tool_calls_keeps_most_complete():
assert contents[0].inputs[0].text == "import pandas"


def test_coalesce_code_interpreter_tool_calls_groups_by_call_id():
assert contents[1].inputs[0].text == "b1"


def test_coalesce_code_interpreter_tool_calls_preserves_non_ci_items():
assert contents[2].text == "after"


def test_coalesce_code_interpreter_tool_calls_no_sequence_number():
assert contents[0].inputs[0].text == "longer_script"


def test_coalesce_code_interpreter_tool_calls_single_call_is_noop():
…ll_id (fixes microsoft#5793)

- _coalesce_code_interpreter_tool_calls() groups by call_id, keeps winner at its original position
- _code_interpreter_chunk_is_more_complete() prefers valid sequence_number, coerces to int
- _get_ci_chunk_content_length() sums text across all inputs
- _try_parse_seq() handles string-typed sequence_number safely
- 8 regression tests covering edge cases
@hanhan761 hanhan761 force-pushed the fix-5793-coalesce-ci-calls branch from abd5f01 to bc6c61e Compare May 30, 2026 07:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Python: CosmosHistoryProvider Code interpreter tool calls are saved chunk by chunk

3 participants