Title
Expose cache_control.ttl as a configurable switch in ClaudeAgentOptions (currently hardcoded to 1h in bundled CLI)
Summary
The Anthropic API supports two ephemeral cache tiers — 5-minute (default per the API contract) and 1-hour — via cache_control: {type: "ephemeral", ttl: "5m" | "1h"}. The bundled Claude Code CLI (verified on v2.1.126) appears to send ttl: "1h" unconditionally for all cached prefixes, with no SDK-side way to override.
This request is not to add 1h support (it's already in use). It's to expose the choice as a configurable parameter on ClaudeAgentOptions, so SDK consumers can deliberately pick the TTL appropriate for their workload — and pay accordingly.
What we observed
Running a no-tool probe through claude-agent-sdk-python v0.1.72 / bundled CLI 2.1.126 against Haiku 4.5, the ResultMessage.usage payload returned:
{
"input_tokens": 4,
"output_tokens": 87,
"cache_creation_input_tokens": 44053,
"cache_creation": {
"ephemeral_5m_input_tokens": 0,
"ephemeral_1h_input_tokens": 44053
},
"cache_read_input_tokens": 0
}
Every cache write went to the 1-hour bucket. We could not find any ClaudeAgentOptions parameter, env var, or CLI flag that influences this — it appears to be hardcoded inside the bundled CLI subprocess.
Why a switch matters even though 1h is the better default for our case
Different workloads have different economics:
- Long-pause conversational agents (our case — Sanskrit philosophy Q&A bot): 1h tier saves money because natural conversation pauses exceed 5m. 1h cache write costs 2× the input rate vs 1.25× for 5m, but the avoided rewrite saves much more.
- Fast pipelines, batch jobs, short-lived workers: 5m is cheaper because cache is reused within the window and the higher write premium of 1h tier is wasted.
- Cost-sensitive prototyping: users may want to opt OUT of the 2× cache write premium to measure baseline cost.
Even consumers happy with the 1h default deserve to know they're paying for it. Currently it's invisible until you parse usage.cache_creation per turn (we've been doing this for cost-attribution observability) and notice the 5m bucket is always zero. That's surprising behavior for an SDK whose abstraction goal is to hide subprocess details.
Requested behavior
Any of these would solve the problem; ordered by preference:
Option A — ClaudeAgentOptions parameter (preferred)
options = ClaudeAgentOptions(
model="claude-sonnet-4-6",
cache_ttl="1h", # accepts "5m" | "1h"; default = current behavior (probably "1h")
# ...
)
Cleanest fit with the existing options surface; trivially testable. Default value can match current behavior so it's a non-breaking addition.
Option B — Env var passed to CLI subprocess
options = ClaudeAgentOptions(
env={"ANTHROPIC_PROMPT_CACHE_TTL": "1h"}, # or "5m"
# ...
)
Maps to a CLI-side environment variable that the bundled CLI honors when building the cache_control block. Works without changing the SDK's options TypedDict.
Option C — CLI flag (lower priority)
claude --cache-ttl 5m ... for direct CLI users; SDK could thread it through.
Suggestion regardless of which option lands
Document the current default behavior somewhere visible (SDK README, CLI --help, or release notes). Right now it's recoverable only by inspecting ResultMessage.usage.cache_creation. A one-line note ("Bundled CLI uses cache_control.ttl=1h by default for prefix caching") would prevent future cost-attribution head-scratching for other SDK consumers.
Why this matters for the SDK ecosystem
Multi-turn agents built on the SDK have wildly different cache-hit patterns depending on UX (long-pause Q&A vs fast back-and-forth). A fixed TTL choice optimizes for some and pessimizes for others. The Anthropic API surface already exposes the choice; the SDK is the right layer to forward it.
There's also a transparency angle. Cost observability tooling (we've built per-turn cost attribution на agent_turn_metrics) breaks subtly when the TTL is implicit — we ended up reverse-engineering the cache tier from the usage payload to explain a $0.15/day pricing gap between SDK self-reported cost and Anthropic dashboard. An explicit option would have made this trivially correct.
Workaround considered and rejected
Bypassing the bundled CLI and calling the Anthropic API directly to set cache_control.ttl explicitly. Rejected because it would undo the SDK's built-in advantages (MCP server integration via stdio, agent loop, tool use handling, structured streaming). The SDK is the right layer for this — we just need it to forward the TTL parameter.
Willing to contribute
Happy to file a PR if the SDK team confirms the desired API surface (Option A / B / C above) and that this is welcome. The change is small — primarily plumbing one optional parameter from ClaudeAgentOptions through to the CLI subprocess invocation, then from the CLI into the cache_control.ttl field.
Versions
claude-agent-sdk-python: 0.1.72
- Bundled Claude Code CLI: 2.1.126
- Python: 3.12
- Production model: claude-sonnet-4-6
Thanks for considering this — it would improve both production economics (for workloads where 5m is the better fit) and cost-attribution clarity (for everyone) on the SDK.
Title
Expose
cache_control.ttlas a configurable switch inClaudeAgentOptions(currently hardcoded to 1h in bundled CLI)Summary
The Anthropic API supports two ephemeral cache tiers — 5-minute (default per the API contract) and 1-hour — via
cache_control: {type: "ephemeral", ttl: "5m" | "1h"}. The bundled Claude Code CLI (verified on v2.1.126) appears to sendttl: "1h"unconditionally for all cached prefixes, with no SDK-side way to override.This request is not to add 1h support (it's already in use). It's to expose the choice as a configurable parameter on
ClaudeAgentOptions, so SDK consumers can deliberately pick the TTL appropriate for their workload — and pay accordingly.What we observed
Running a no-tool probe through
claude-agent-sdk-pythonv0.1.72 / bundled CLI 2.1.126 against Haiku 4.5, theResultMessage.usagepayload returned:{ "input_tokens": 4, "output_tokens": 87, "cache_creation_input_tokens": 44053, "cache_creation": { "ephemeral_5m_input_tokens": 0, "ephemeral_1h_input_tokens": 44053 }, "cache_read_input_tokens": 0 }Every cache write went to the 1-hour bucket. We could not find any
ClaudeAgentOptionsparameter, env var, or CLI flag that influences this — it appears to be hardcoded inside the bundled CLI subprocess.Why a switch matters even though 1h is the better default for our case
Different workloads have different economics:
Even consumers happy with the 1h default deserve to know they're paying for it. Currently it's invisible until you parse
usage.cache_creationper turn (we've been doing this for cost-attribution observability) and notice the 5m bucket is always zero. That's surprising behavior for an SDK whose abstraction goal is to hide subprocess details.Requested behavior
Any of these would solve the problem; ordered by preference:
Option A —
ClaudeAgentOptionsparameter (preferred)Cleanest fit with the existing options surface; trivially testable. Default value can match current behavior so it's a non-breaking addition.
Option B — Env var passed to CLI subprocess
Maps to a CLI-side environment variable that the bundled CLI honors when building the
cache_controlblock. Works without changing the SDK's options TypedDict.Option C — CLI flag (lower priority)
claude --cache-ttl 5m ...for direct CLI users; SDK could thread it through.Suggestion regardless of which option lands
Document the current default behavior somewhere visible (SDK README, CLI
--help, or release notes). Right now it's recoverable only by inspectingResultMessage.usage.cache_creation. A one-line note ("Bundled CLI usescache_control.ttl=1hby default for prefix caching") would prevent future cost-attribution head-scratching for other SDK consumers.Why this matters for the SDK ecosystem
Multi-turn agents built on the SDK have wildly different cache-hit patterns depending on UX (long-pause Q&A vs fast back-and-forth). A fixed TTL choice optimizes for some and pessimizes for others. The Anthropic API surface already exposes the choice; the SDK is the right layer to forward it.
There's also a transparency angle. Cost observability tooling (we've built per-turn cost attribution на
agent_turn_metrics) breaks subtly when the TTL is implicit — we ended up reverse-engineering the cache tier from the usage payload to explain a $0.15/day pricing gap between SDK self-reported cost and Anthropic dashboard. An explicit option would have made this trivially correct.Workaround considered and rejected
Bypassing the bundled CLI and calling the Anthropic API directly to set
cache_control.ttlexplicitly. Rejected because it would undo the SDK's built-in advantages (MCP server integration via stdio, agent loop, tool use handling, structured streaming). The SDK is the right layer for this — we just need it to forward the TTL parameter.Willing to contribute
Happy to file a PR if the SDK team confirms the desired API surface (Option A / B / C above) and that this is welcome. The change is small — primarily plumbing one optional parameter from
ClaudeAgentOptionsthrough to the CLI subprocess invocation, then from the CLI into thecache_control.ttlfield.Versions
claude-agent-sdk-python: 0.1.72Thanks for considering this — it would improve both production economics (for workloads where 5m is the better fit) and cost-attribution clarity (for everyone) on the SDK.