Summary
When using ProviderConfig with type: "anthropic" (BYOK), the CLI binary sends
max_tokens: 8192 to the Anthropic API regardless of any configuration. This value
is hardcoded internally and cannot be overridden from the SDK.
Claude Sonnet 4.6 supports up to 32,000 output tokens (per models.list capabilities),
but the CLI caps it at 8,192. When the model generates a long response (e.g., writing
a large file via the create tool), the response is silently truncated at 8,192 tokens
with stop_reason: "max_tokens", the tool call is incomplete, and the session transitions
to session.idle without any error event — a silent failure.
Environment
- Python SDK:
github-copilot-sdk>=0.2.1rc1
- Provider: Anthropic via Azure AI Foundry (
type: "anthropic")
- Model:
claude-sonnet-4.6
Reproduction
from copilot import CopilotClient
from copilot.client import SubprocessConfig
from copilot.session import PermissionHandler
client = CopilotClient(SubprocessConfig(cwd="/tmp/workspace", use_logged_in_user=False))
await client.start()
session = await client.create_session(
model="claude-sonnet-4.6",
provider={
"type": "anthropic",
"base_url": "https://your-endpoint.services.ai.azure.com/anthropic/",
"api_key": "your-key",
},
streaming=True,
on_permission_request=PermissionHandler.approve_all,
)
# Ask the model to generate a large document (triggers create tool with >8K tokens)
await session.send("Generate a comprehensive 5000-word analysis report and save it to output/report.md")
Key observations from log attached:
process-1774635480610-54170.log
- The
create tool call is incomplete — it has path but is missing file_text
(the content was truncated before the model could finish writing it)
stop_reason: "max_tokens" with exactly output_tokens: 8192 — a hard ceiling
[WARNING] Max tokens reached — the CLI is aware of the truncation
- No
session.error is emitted — only assistant.message → session.idle
- The session goes idle as if everything succeeded — the SDK consumer receives
session.idle and has no indication the response was truncated
- No
tool.execution_start — the CLI doesn't even attempt to execute the truncated
tool call, but also doesn't report the failure
For comparison, the previous turn (turn 18) completed successfully with output_tokens: 8147
and stop_reason: "tool_use" — just 45 tokens below the 8,192 ceiling. The model was
generating progressively longer tool calls (file contents) across turns, and the ceiling
was hit on the next one.
Summary
When using
ProviderConfigwithtype: "anthropic"(BYOK), the CLI binary sendsmax_tokens: 8192to the Anthropic API regardless of any configuration. This valueis hardcoded internally and cannot be overridden from the SDK.
Claude Sonnet 4.6 supports up to 32,000 output tokens (per
models.listcapabilities),but the CLI caps it at 8,192. When the model generates a long response (e.g., writing
a large file via the
createtool), the response is silently truncated at 8,192 tokenswith
stop_reason: "max_tokens", the tool call is incomplete, and the session transitionsto
session.idlewithout any error event — a silent failure.Environment
github-copilot-sdk>=0.2.1rc1type: "anthropic")claude-sonnet-4.6Reproduction
Key observations from log attached:
process-1774635480610-54170.log
createtool call is incomplete — it haspathbut is missingfile_text(the content was truncated before the model could finish writing it)
stop_reason: "max_tokens"with exactlyoutput_tokens: 8192— a hard ceiling[WARNING] Max tokens reached— the CLI is aware of the truncationsession.erroris emitted — onlyassistant.message→session.idlesession.idleand has no indication the response was truncatedtool.execution_start— the CLI doesn't even attempt to execute the truncatedtool call, but also doesn't report the failure
For comparison, the previous turn (turn 18) completed successfully with
output_tokens: 8147and
stop_reason: "tool_use"— just 45 tokens below the 8,192 ceiling. The model wasgenerating progressively longer tool calls (file contents) across turns, and the ceiling
was hit on the next one.