Skip to content

fix: clamp Bedrock thinking effort to the LiteLLM-accepted ceiling (#737)#739

Open
ElegantLin wants to merge 1 commit into
test/599-bedrock-effort-parityfrom
fix/737-bedrock-effort-clamp
Open

fix: clamp Bedrock thinking effort to the LiteLLM-accepted ceiling (#737)#739
ElegantLin wants to merge 1 commit into
test/599-bedrock-effort-parityfrom
fix/737-bedrock-effort-clamp

Conversation

@ElegantLin

Copy link
Copy Markdown
Contributor

Stacked on #736 (base = test/599-bedrock-effort-parity) — it reuses the _wire_output_config_effort end-to-end helper introduced there. Will rebase onto main once #736 merges.

Summary

BenchFlow accepts BENCHFLOW_BEDROCK_THINKING_EFFORT values xhigh/max, but LiteLLM 1.88.0rc1's Bedrock Converse transform rejects them for adaptive Claude 4.8+ with a BadRequestError (issue #737). So the #598 "MAX thinking" config errored at request time instead of running at the highest supported effort.

Fix

Clamp a requested effort to the LiteLLM-accepted ceiling (high) in both injection paths, matching the existing garbage→high default so a MAX-thinking run proceeds at the real maximum rather than crashing:

  • Route config (litellm_config._bedrock_thinking_effort) — bakes the clamped value into config.yaml.
  • Standalone proxy patch (litellm_bedrock_patch) — clamps the os.environ override too. It is deployed into the sandbox alone (cannot import benchflow), so it carries its own in-sync copy of the ladder; a comment in each points at the other.

Test plan

  • Route param clamp: xhigh/maxhigh; updated the two pre-existing tests that encoded the bug (max threaded through unchanged) to assert the clamp.
  • Proxy-env path: a run-level max override clamps in the real wire payload without raising.
  • Drift guard: every requestable effort, after the clamp, is accepted by the REAL litellm Converse transform — if a future litellm changes its accepted set, this fails loudly instead of shipping a rejected value mid-run.
  • Both clamps verified fail-then-pass when individually neutered.
  • 87 passed across the bedrock/litellm slice; unrelated agent-CLI reasoning_effort tests unaffected; ruff + format + ty clean.

Closes #737

🤖 Generated with Claude Code

)

BenchFlow accepts BENCHFLOW_BEDROCK_THINKING_EFFORT values `xhigh`/`max`, but
LiteLLM 1.88.0rc1's Bedrock Converse transform rejects them for adaptive Claude
4.8+ with a BadRequestError. So the #598 "MAX thinking" config errored at
request time instead of running at the highest supported effort.

Clamp a requested effort to the accepted ceiling (`high`) in both injection
paths — the route config (litellm_config) and the standalone proxy patch
(litellm_bedrock_patch, which cannot import benchflow, so it carries its own
in-sync copy of the ladder). This matches the existing garbage->high default:
a MAX-thinking run now proceeds at the real maximum rather than crashing.

Tests:
- clamp asserted on the route param (xhigh/max -> high) and updated the two
  pre-existing tests that encoded the bug (effort `max` threaded through);
- proxy-env override `max` clamps in the real wire payload without raising;
- drift guard: every requestable effort, after the clamp, is accepted by the
  REAL litellm Converse transform — if a future litellm changes its accepted
  set this fails loudly instead of shipping a rejected value mid-run.
Both clamps verified to fail-then-pass when individually neutered.

Closes #737

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant