Skip to content

fix: sanitize compacted tool history before Anthropic dispatch#64

Open
saxyguy81 wants to merge 1 commit into
CaddyGlow:mainfrom
saxyguy81:codex/tool-history-compaction-fix
Open

fix: sanitize compacted tool history before Anthropic dispatch#64
saxyguy81 wants to merge 1 commit into
CaddyGlow:mainfrom
saxyguy81:codex/tool-history-compaction-fix

Conversation

@saxyguy81
Copy link
Copy Markdown

Summary

  • sanitize both sides of Anthropic tool pairs before dispatch:
    • drop or convert orphaned tool_result blocks when their matching tool_use is no longer in the immediately preceding assistant message
    • drop or convert orphaned assistant tool_use blocks when the next user message no longer contains the corresponding tool_result
  • truncate oversized native Anthropic requests before the final sanitization pass, so truncation cannot leave invalid tool pairs in the provider payload
  • normalize Anthropic streaming errors so deferred stream failures still match the Anthropic SSE error envelope

Why

The earlier v0.2 sanitizer handles orphaned tool_result blocks, but current main can still send invalid compacted histories to Anthropic:

  • assistant tool_use left without a following tool_result
  • mixed user messages containing one valid tool_result plus another orphaned tool_result
  • context truncation after sanitization, which can create new orphaned blocks

I reproduced this against current main with local services:

  • a normal control /claude/v1/messages request returns 200 on both main and this branch
  • compacted history with orphaned assistant tool_use: main returns 400 (tool_use ids were found without tool_result blocks immediately after); this branch returns 200
  • mixed valid + orphaned tool_result: main returns 400 (unexpected tool_use_id found in tool_result blocks); this branch returns 200

Notes

This supersedes #33 for the current main branch. I rebased the change onto current main and removed the noisy/manual SDK investigation test from the older PR so this PR only contains deterministic unit/regression coverage.

Validation

  • uv run ruff check ccproxy/llms/formatters/openai_to_anthropic/requests.py ccproxy/llms/utils/__init__.py ccproxy/llms/utils/context_truncation.py ccproxy/llms/utils/token_estimation.py ccproxy/plugins/claude_api/adapter.py ccproxy/plugins/claude_sdk/adapter.py ccproxy/streaming/deferred.py tests/test_tool_result_sanitization.py tests/plugins/claude_api/unit/test_native_anthropic_sanitization.py tests/unit/llms/test_context_window_management.py tests/unit/llms/test_truncation_creates_orphans.py tests/unit/streaming/test_deferred_stream_errors.py
  • uv run ruff format --check ccproxy/llms/formatters/openai_to_anthropic/requests.py ccproxy/llms/utils/__init__.py ccproxy/llms/utils/context_truncation.py ccproxy/llms/utils/token_estimation.py ccproxy/plugins/claude_api/adapter.py ccproxy/plugins/claude_sdk/adapter.py ccproxy/streaming/deferred.py tests/test_tool_result_sanitization.py tests/plugins/claude_api/unit/test_native_anthropic_sanitization.py tests/unit/llms/test_context_window_management.py tests/unit/llms/test_truncation_creates_orphans.py tests/unit/streaming/test_deferred_stream_errors.py
  • uv run mypy ccproxy/llms/formatters/openai_to_anthropic/requests.py ccproxy/llms/utils/__init__.py ccproxy/llms/utils/context_truncation.py ccproxy/llms/utils/token_estimation.py ccproxy/plugins/claude_api/adapter.py ccproxy/plugins/claude_sdk/adapter.py ccproxy/streaming/deferred.py
  • uv run python scripts/check_import_boundaries.py
  • uv run pytest tests/test_tool_result_sanitization.py tests/plugins/claude_api/unit/test_native_anthropic_sanitization.py tests/unit/llms/test_context_window_management.py tests/unit/llms/test_truncation_creates_orphans.py tests/unit/streaming/test_deferred_stream_errors.py (69 passed)
  • uv run pytest --ignore=tests/plugins/copilot/integration -k 'not test_openai_chat_stream_to_anthropic_sample' (1249 passed, 6 skipped, 60 deselected, 1 warning)

@saxyguy81
Copy link
Copy Markdown
Author

Coordination note for reviewers:

This PR is intentionally parallel to #65 and #66.

There is no intended dependency between these PRs. They can be reviewed and merged independently; if one lands first, I will rebase the others only if GitHub reports a conflict.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant