feat(gemini): context caching for the page prompt#50
Merged
Conversation
The ~2-3K-token PAGE_EXTRACTION_PROMPT was previously re-billed on every page in the ~16K-page corpus. GeminiClient.create_cache now registers the prompt as a cachedContent resource (cached as system_instruction); subsequent extract_page calls reference the cache and skip re-sending the prompt. The pipeline calls create_cache once at the start of process_pending; failures degrade silently to the un-cached path so a cache hiccup never blocks a corpus run. The response schema cannot live in the cache (SDK limitation — CreateCachedContentConfig has no response_schema field) and still travels in the per-call config. Savings are on the prompt portion only, which is the dominant input-token cost. Batch mode (the second half of #40's scope) is deferred to a follow-up: it requires a separate orchestration pattern (24h-SLA submit-poll-fetch) that doesn't compose with the per-page async pipeline and warrants its own state-machine PR. Partial: #40
This was referenced May 11, 2026
Catches SDK-shape drift between the mock-based unit tests and the actual Gemini API surface. Marked `external_api` so the default test run skips it; opt in with `pytest -m external_api` and a valid `GEMINI_API_KEY`. Environment-level failures (UNAUTHENTICATED, PERMISSION_DENIED, RESOURCE_EXHAUSTED) skip rather than fail — they're billing/auth/quota signals, not shape drift. A 400 INVALID_ARGUMENT would still propagate (that IS shape drift), which is what this test exists to surface. First external_api test in the repo; `tests/golden/README.md` already references the marker but had no tests using it yet.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
GeminiClient.create_cache()registers thePAGE_EXTRACTION_PROMPTas a GeminicachedContentresource (cached assystem_instruction).extract_pagebranches on cache presence: when a cache exists, the per-call payload is image-only; the prompt isn't re-sent. When caching is unavailable, the un-cached path is used unchanged.core/pipeline.process_pendingcallscreate_cacheonce at the start of each run; failures degrade silently.README.md"Cost calibration" section updated to document the caching behavior.Scope split
The original issue (#40) bundled batch mode + context caching. I'm shipping caching now and proposing batch mode as a follow-up. Reasons:
jobs.db).A follow-up issue will track batch mode against #37 (Sprint 1 parent). If you'd rather see them combined in one PR, say so and I'll rebase and amend.
SDK reality check
CreateCachedContentConfighas noresponse_schemafield — the schema can only travel in the per-callGenerateContentConfig. So caching saves on the prompt portion only, not on the prompt + schema as the issue body sketched. README and docstrings reflect this.Partial: #40
Test plan
ruff checkclean.ruff format --checkclean.mypy core cli.pyclean.caches.createis called exactly once per run with 6 pages.caches.createraises.caches.createis NOT called when there are no pending jobs.external_api-marker test against real Gemini caching — needs an opt-in test run withGEMINI_API_KEY.