feat(gemini): context caching for the page prompt by jakebromberg · Pull Request #50 · WXYC/flowsheet-digitization

jakebromberg · 2026-05-11T02:17:40Z

Summary

New GeminiClient.create_cache() registers the PAGE_EXTRACTION_PROMPT as a Gemini cachedContent resource (cached as system_instruction).
extract_page branches on cache presence: when a cache exists, the per-call payload is image-only; the prompt isn't re-sent. When caching is unavailable, the un-cached path is used unchanged.
core/pipeline.process_pending calls create_cache once at the start of each run; failures degrade silently.
6 new client tests (cache lifecycle, idempotency, fallback) + 3 new pipeline tests (cache-once-per-run, cache-failure-continues, no-pending-skips-cache-creation).
README.md "Cost calibration" section updated to document the caching behavior.

Scope split

The original issue (#40) bundled batch mode + context caching. I'm shipping caching now and proposing batch mode as a follow-up. Reasons:

The features are independently valuable and the SDK constraints around each are different. Caching is a per-call payload optimization; batch mode is a 24h-SLA orchestration pattern that doesn't compose with the existing per-page async pipeline (it requires submit → poll → fetch with a new state in jobs.db).
The issue's own constraint section says "Don't conflate the two features in tests — they're independent and one should be debuggable without the other."
Caching delivers the majority of the Sprint 1 cost win (the prompt is the ~2-3K-token dominant cost; batch is ~50% on top of whatever input rate remains).

A follow-up issue will track batch mode against #37 (Sprint 1 parent). If you'd rather see them combined in one PR, say so and I'll rebase and amend.

SDK reality check

CreateCachedContentConfig has no response_schema field — the schema can only travel in the per-call GenerateContentConfig. So caching saves on the prompt portion only, not on the prompt + schema as the issue body sketched. README and docstrings reflect this.

Partial: #40

Test plan

All 376 default tests pass (1 deselected).
ruff check clean.
ruff format --check clean.
mypy core cli.py clean.
Pipeline test confirms caches.create is called exactly once per run with 6 pages.
Pipeline test confirms the run completes when caches.create raises.
Pipeline test confirms caches.create is NOT called when there are no pending jobs.
(Out of scope locally) external_api-marker test against real Gemini caching — needs an opt-in test run with GEMINI_API_KEY.

The ~2-3K-token PAGE_EXTRACTION_PROMPT was previously re-billed on every page in the ~16K-page corpus. GeminiClient.create_cache now registers the prompt as a cachedContent resource (cached as system_instruction); subsequent extract_page calls reference the cache and skip re-sending the prompt. The pipeline calls create_cache once at the start of process_pending; failures degrade silently to the un-cached path so a cache hiccup never blocks a corpus run. The response schema cannot live in the cache (SDK limitation — CreateCachedContentConfig has no response_schema field) and still travels in the per-call config. Savings are on the prompt portion only, which is the dominant input-token cost. Batch mode (the second half of #40's scope) is deferred to a follow-up: it requires a separate orchestration pattern (24h-SLA submit-poll-fetch) that doesn't compose with the per-page async pipeline and warrants its own state-machine PR. Partial: #40

Catches SDK-shape drift between the mock-based unit tests and the actual Gemini API surface. Marked `external_api` so the default test run skips it; opt in with `pytest -m external_api` and a valid `GEMINI_API_KEY`. Environment-level failures (UNAUTHENTICATED, PERMISSION_DENIED, RESOURCE_EXHAUSTED) skip rather than fail — they're billing/auth/quota signals, not shape drift. A 400 INVALID_ARGUMENT would still propagate (that IS shape drift), which is what this test exists to surface. First external_api test in the repo; `tests/golden/README.md` already references the marker but had no tests using it yet.

This was referenced May 11, 2026

PR A2: Gemini Batch API for one-shot corpus run (split from #40) #51

Open

Sprint 1: Gemini cost wins #37

Open

jakebromberg merged commit b7683c2 into main May 11, 2026
3 checks passed

jakebromberg deleted the worktree-sprint1-gemini-batch-cache branch May 11, 2026 02:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(gemini): context caching for the page prompt#50

feat(gemini): context caching for the page prompt#50
jakebromberg merged 2 commits into
mainfrom
worktree-sprint1-gemini-batch-cache

jakebromberg commented May 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jakebromberg commented May 11, 2026

Summary

Scope split

SDK reality check

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant