Python: fix(core): coalesce streamed code_interpreter_tool_call chunks per call_id (fixes #5793)#6196
Open
hanhan761 wants to merge 1 commit into
Open
Python: fix(core): coalesce streamed code_interpreter_tool_call chunks per call_id (fixes #5793)#6196hanhan761 wants to merge 1 commit into
hanhan761 wants to merge 1 commit into
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR adds post-processing to coalesce streamed code_interpreter_tool_call content chunks by call_id, keeping the most complete chunk, and introduces tests covering common coalescing scenarios.
Changes:
- Add
_coalesce_code_interpreter_tool_calls()and a completeness comparator for CI tool-call chunks. - Invoke CI coalescing during response finalization alongside existing text coalescing.
- Add unit tests validating coalescing behavior across same/different
call_ids, with and without sequence numbers.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 8 comments.
| File | Description |
|---|---|
| python/packages/core/agent_framework/_types.py | Adds CI tool-call coalescing and applies it during response finalization. |
| python/packages/core/tests/core/test_types.py | Adds tests for CI tool-call coalescing behavior. |
Comment on lines
+1976
to
+1997
| def _coalesce_code_interpreter_tool_calls(contents: list[Content]) -> None: | ||
| """Coalesce code_interpreter_tool_call items with the same call_id, keeping the most complete chunk.""" | ||
| best: dict[str, Content] = {} | ||
| first_pos: dict[str, int] = {} | ||
| drop_indices: set[int] = set() | ||
| for i, content in enumerate(contents): | ||
| if content.type != "code_interpreter_tool_call" or not content.call_id: | ||
| continue | ||
| cid = content.call_id | ||
| if cid in best: | ||
| if _code_interpreter_chunk_is_more_complete(content, best[cid]): | ||
| best[cid] = content | ||
| drop_indices.add(i) | ||
| else: | ||
| best[cid] = content | ||
| first_pos[cid] = i | ||
| if not drop_indices: | ||
| return | ||
| for cid, content in best.items(): | ||
| contents[first_pos[cid]] = content | ||
| for idx in sorted(drop_indices, reverse=True): | ||
| contents.pop(idx) |
Comment on lines
+2000
to
+2008
| def _code_interpreter_chunk_is_more_complete(a: Content, b: Content) -> bool: | ||
| """Return True if 'a' is more complete than 'b'.""" | ||
| seq_a = a.additional_properties.get("sequence_number") | ||
| seq_b = b.additional_properties.get("sequence_number") | ||
| if seq_a is not None and seq_b is not None: | ||
| return seq_a > seq_b | ||
| len_a = len(a.inputs[0].text) if a.inputs else 0 | ||
| len_b = len(b.inputs[0].text) if b.inputs else 0 | ||
| return len_a > len_b |
Comment on lines
+2006
to
+2007
| len_a = len(a.inputs[0].text) if a.inputs else 0 | ||
| len_b = len(b.inputs[0].text) if b.inputs else 0 |
| assert contents[1].text == "Thinking B1 B2" | ||
|
|
||
|
|
||
| def test_coalesce_code_interpreter_tool_calls_keeps_most_complete(): |
| assert contents[0].inputs[0].text == "import pandas" | ||
|
|
||
|
|
||
| def test_coalesce_code_interpreter_tool_calls_groups_by_call_id(): |
| assert contents[1].inputs[0].text == "b1" | ||
|
|
||
|
|
||
| def test_coalesce_code_interpreter_tool_calls_preserves_non_ci_items(): |
| assert contents[2].text == "after" | ||
|
|
||
|
|
||
| def test_coalesce_code_interpreter_tool_calls_no_sequence_number(): |
| assert contents[0].inputs[0].text == "longer_script" | ||
|
|
||
|
|
||
| def test_coalesce_code_interpreter_tool_calls_single_call_is_noop(): |
…ll_id (fixes microsoft#5793) - _coalesce_code_interpreter_tool_calls() groups by call_id, keeps winner at its original position - _code_interpreter_chunk_is_more_complete() prefers valid sequence_number, coerces to int - _get_ci_chunk_content_length() sums text across all inputs - _try_parse_seq() handles string-typed sequence_number safely - 8 regression tests covering edge cases
abd5f01 to
bc6c61e
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Coalesce streamed
code_interpreter_tool_callchunks percall_idso that the finalized response contains one content item per logical code interpreter call instead of hundreds of incremental deltas.Changes
_types.py_coalesce_code_interpreter_tool_calls()— groupscode_interpreter_tool_callitems bycall_id, keeps the chunk with the highestsequence_number(or longest text when sequence metadata is absent), and removes duplicates._code_interpreter_chunk_is_more_complete()— compares two CI call chunks usingsequence_numberfromadditional_properties, falling back to input text length._coalesce_code_interpreter_tool_calls()from_finalize_response()alongside the existing text and text_reasoning coalescing.test_types.pytest_coalesce_code_interpreter_tool_calls_keeps_most_complete— streaming deltas + done collapse to the done eventtest_coalesce_code_interpreter_tool_calls_groups_by_call_id— multiple distinct call_ids each keep their own winning chunktest_coalesce_code_interpreter_tool_calls_preserves_non_ci_items— non-CI items are preservedtest_coalesce_code_interpreter_tool_calls_no_sequence_number— fallback to longest texttest_coalesce_code_interpreter_tool_calls_single_call_is_noop— single CI call unchangedIssue
Fixes #5793
Verification
5/5 passed. Existing type tests also pass.