Skip to content

Bump confer / triangulate token budgets for OpenAI + Gemini#27

Merged
fxspeiser merged 1 commit into
mainfrom
feature/budget-bump-gpt-gemini
May 31, 2026
Merged

Bump confer / triangulate token budgets for OpenAI + Gemini#27
fxspeiser merged 1 commit into
mainfrom
feature/budget-bump-gpt-gemini

Conversation

@fxspeiser
Copy link
Copy Markdown
Owner

Summary

Follow-up to PR #26. Same MAX_TOKENS truncation pattern is hitting `triangulate` on gpt-5 and both `confer` + `triangulate` on gemini-2.5-pro.

New `_PROVIDER_TOKEN_BUDGETS` entries (all 6144):

  • `openai.triangulate` (was 2048 reasoning-default)
  • `gemini.confer` (was 1500 — see follow-up below)
  • `gemini.triangulate` (was 1500 — see follow-up below)

Follow-up worth tracking: gemini-2.5-pro is NOT currently tagged in `PROVIDER_CAPS["gemini"]["reasoning_prefixes"]`, so non-overridden purposes (`debate`, `synth`, `audit`) fall through to the smaller non-reasoning ceilings (1500 / 1024 / 768) instead of the 2048 reasoning default. This PR's per-provider override short-circuits that for confer + triangulate, but if cap-hitting shows up on gemini debate/synth/audit, the right fix is adding `"reasoning_prefixes": ("gemini-2.5-pro",)` to the gemini caps entry.

Test plan

  • Updated `scripts/test_provider_token_budgets.py` asserts the new values + explicitly documents the gemini.debate fall-through gap
  • Full suite (38 scripts) passes locally

🤖 Generated with Claude Code

Follow-up to PR #26. Same MAX_TOKENS truncation pattern is hitting
triangulate on gpt-5 and both confer + triangulate on gemini-2.5-pro:
reasoning tokens count against `max_completion_tokens`, the visible
answer gets cut mid-emission.

New _PROVIDER_TOKEN_BUDGETS entries (all 6144):
- openai.triangulate    (was 2048 reasoning-default)
- gemini.confer         (was 1500 non-reasoning ceiling)
- gemini.triangulate    (was 1500 non-reasoning ceiling)

Notable side-finding (logged as a follow-up, NOT fixed here):
gemini-2.5-pro is NOT currently tagged in
`PROVIDER_CAPS["gemini"]["reasoning_prefixes"]`, so non-overridden
purposes (debate, synth, audit) fall through to the SMALLER
non-reasoning ceilings (1500 / 1024 / 768) instead of the reasoning
default of 2048. The provider-specific override in this PR
short-circuits that for confer + triangulate, but if the user starts
hitting caps on gemini debate / synth / audit, the right fix is
adding `"reasoning_prefixes": ("gemini-2.5-pro",)` to the gemini
PROVIDER_CAPS entry. Out of scope for this PR.

Tests:
- openai.triangulate = 6144
- gemini.confer = 6144, gemini.triangulate = 6144
- gemini.debate falls through to 1500 (the non-reasoning ceiling,
  documenting the follow-up gap explicitly)

Full suite (38 scripts) passes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@fxspeiser fxspeiser merged commit 66e3bed into main May 31, 2026
1 check passed
@fxspeiser fxspeiser deleted the feature/budget-bump-gpt-gemini branch May 31, 2026 20:50
fxspeiser added a commit that referenced this pull request Jun 1, 2026
Closes the follow-up flagged in PR #27. gemini-2.5-pro burns reasoning
tokens just like gpt-5 and claude-opus-4-7 — it should have been in
PROVIDER_CAPS["gemini"]["reasoning_prefixes"] from day one.

Effects of the tag:
- `_is_reasoning_model("gemini", "gemini-2.5-pro")` now returns True.
- `_budget_for_purpose` picks the reasoning-class ceilings (2048) for
  every non-overridden purpose instead of falling through to the
  smaller non-reasoning ceilings:
    debate: 1500 -> 2048
    synth:  1024 -> 2048
    audit:   768 -> 2048
    moderator/orchestrate/plan/review/coordinate: already 2048
  confer + triangulate keep the PR #27 override at 6144.
- Prompt adapters now strip "think step by step" / "think out loud"
  preambles before sending to gemini-2.5-pro (already shipped in
  PR #19; matches behavior for the other reasoning families).

Future-proof: the prefix is "gemini-2.5-pro" so a hypothetical
gemini-1.5-flash / gemini-2.0-flash variant correctly falls through
to the non-reasoning ceilings (verified in the test).

No change to the Gemini API call itself — `generationConfig.temperature`
+ `maxOutputTokens` are both accepted by gemini-2.5-pro.

Tests (scripts/test_provider_token_budgets.py):
- `_is_reasoning_model("gemini", "gemini-2.5-pro")` is True
- gemini debate / synth / audit all = 2048 (reasoning default)
- gemini confer / triangulate stay = 6144 (PR #27 override)
- A hypothetical non-reasoning gemini still falls through correctly

Full suite (38 scripts) passes.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant