Bump OpenAI confer/debate token budget; add per-provider override layer#26
Merged
Conversation
Symptom: OpenAI gpt-5 routinely runs out of `max_completion_tokens`
mid-answer during `confer` and `debate`. Root cause: gpt-5 charges
reasoning tokens against `max_completion_tokens`. At the reasoning-
class default of 2048 it spends 1500-3000 tokens on internal thinking
and the visible reply gets MAX_TOKENS-truncated.
Fix: add a per-provider override layer in `_budget_for_purpose`. The
new precedence (highest to lowest) is:
1. `CFG.token_budgets[purpose]` — caller override
2. `CFG.token_budgets_by_provider[provider][purpose]` — per-provider
operator override (no code changes needed)
3. `_PROVIDER_TOKEN_BUDGETS[provider][purpose]` — shipped per-provider
overrides (this PR seeds openai.confer / openai.debate = 6144)
4. `_NON_REASONING_TOKEN_BUDGETS[purpose]` — non-reasoning model
5. `_DEFAULT_TOKEN_BUDGETS[purpose]` — reasoning default
Shipped seed values:
- openai.confer = 6144
- openai.debate = 6144
6144 = ~3k headroom for gpt-5 reasoning + ~3k room for the visible
answer. Other providers are unaffected (anthropic / xai / gemini /
mistral / groq / deepseek all fall through to the existing reasoning-
or non-reasoning ceilings).
Anti-foot-gun: per-provider override values <= 0 are ignored (fall
through to the next tier). The override is keyed on the provider name
(not model class) because the issue is specifically how OpenAI's API
accounts for reasoning tokens — even gpt-test (a non-reasoning OpenAI
stub) gets the higher ceiling, since the threat model is provider-
behavior rather than model-class.
Tests (scripts/test_provider_token_budgets.py):
- Shipped openai confer/debate = 6144 (reasoning + non-reasoning models)
- Non-overridden purposes still respect tier ceilings
(audit/synth fall to non-reasoning table for gpt-test, reasoning
default for gpt-5)
- Other providers unaffected by the openai override
- CFG.token_budgets[purpose] beats the per-provider override
- CFG.token_budgets_by_provider beats the shipped defaults; can also
add overrides for providers that don't have shipped ones
- Operator-supplied 0 / negative fall through to the next tier
- No provider/model -> reasoning-safe defaults unchanged
- End-to-end: openai confer wire payload carries
max_completion_tokens=6144 (not 2048); anthropic confer still 2048
Full suite (38 scripts) passes.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2 tasks
fxspeiser
added a commit
that referenced
this pull request
May 31, 2026
Follow-up to PR #26. Same MAX_TOKENS truncation pattern is hitting triangulate on gpt-5 and both confer + triangulate on gemini-2.5-pro: reasoning tokens count against `max_completion_tokens`, the visible answer gets cut mid-emission. New _PROVIDER_TOKEN_BUDGETS entries (all 6144): - openai.triangulate (was 2048 reasoning-default) - gemini.confer (was 1500 non-reasoning ceiling) - gemini.triangulate (was 1500 non-reasoning ceiling) Notable side-finding (logged as a follow-up, NOT fixed here): gemini-2.5-pro is NOT currently tagged in `PROVIDER_CAPS["gemini"]["reasoning_prefixes"]`, so non-overridden purposes (debate, synth, audit) fall through to the SMALLER non-reasoning ceilings (1500 / 1024 / 768) instead of the reasoning default of 2048. The provider-specific override in this PR short-circuits that for confer + triangulate, but if the user starts hitting caps on gemini debate / synth / audit, the right fix is adding `"reasoning_prefixes": ("gemini-2.5-pro",)` to the gemini PROVIDER_CAPS entry. Out of scope for this PR. Tests: - openai.triangulate = 6144 - gemini.confer = 6144, gemini.triangulate = 6144 - gemini.debate falls through to 1500 (the non-reasoning ceiling, documenting the follow-up gap explicitly) Full suite (38 scripts) passes. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes the recurring OpenAI MAX_TOKENS truncation on `confer` and `debate`.
Root cause: gpt-5 charges reasoning tokens against `max_completion_tokens`. At the reasoning-class default of 2048 it spends 1500-3000 on internal thinking and the visible reply gets truncated mid-sentence.
Fix: add a per-provider override layer in `_budget_for_purpose`. New precedence (highest to lowest):
6144 ≈ ~3k headroom for gpt-5 reasoning + ~3k room for the visible answer. Other providers (anthropic / xai / gemini / mistral / groq / deepseek) are unaffected.
Anti-foot-gun: per-provider override values <= 0 are ignored. The override is keyed on provider name (not model class) because the issue is specifically OpenAI's API accounting — every OpenAI model gets the higher ceiling.
Test plan
🤖 Generated with Claude Code