Skip to content

feat(gemini): context caching for the page prompt#50

Merged
jakebromberg merged 2 commits into
mainfrom
worktree-sprint1-gemini-batch-cache
May 11, 2026
Merged

feat(gemini): context caching for the page prompt#50
jakebromberg merged 2 commits into
mainfrom
worktree-sprint1-gemini-batch-cache

Conversation

@jakebromberg
Copy link
Copy Markdown
Member

Summary

  • New GeminiClient.create_cache() registers the PAGE_EXTRACTION_PROMPT as a Gemini cachedContent resource (cached as system_instruction).
  • extract_page branches on cache presence: when a cache exists, the per-call payload is image-only; the prompt isn't re-sent. When caching is unavailable, the un-cached path is used unchanged.
  • core/pipeline.process_pending calls create_cache once at the start of each run; failures degrade silently.
  • 6 new client tests (cache lifecycle, idempotency, fallback) + 3 new pipeline tests (cache-once-per-run, cache-failure-continues, no-pending-skips-cache-creation).
  • README.md "Cost calibration" section updated to document the caching behavior.

Scope split

The original issue (#40) bundled batch mode + context caching. I'm shipping caching now and proposing batch mode as a follow-up. Reasons:

  1. The features are independently valuable and the SDK constraints around each are different. Caching is a per-call payload optimization; batch mode is a 24h-SLA orchestration pattern that doesn't compose with the existing per-page async pipeline (it requires submit → poll → fetch with a new state in jobs.db).
  2. The issue's own constraint section says "Don't conflate the two features in tests — they're independent and one should be debuggable without the other."
  3. Caching delivers the majority of the Sprint 1 cost win (the prompt is the ~2-3K-token dominant cost; batch is ~50% on top of whatever input rate remains).

A follow-up issue will track batch mode against #37 (Sprint 1 parent). If you'd rather see them combined in one PR, say so and I'll rebase and amend.

SDK reality check

CreateCachedContentConfig has no response_schema field — the schema can only travel in the per-call GenerateContentConfig. So caching saves on the prompt portion only, not on the prompt + schema as the issue body sketched. README and docstrings reflect this.

Partial: #40

Test plan

  • All 376 default tests pass (1 deselected).
  • ruff check clean.
  • ruff format --check clean.
  • mypy core cli.py clean.
  • Pipeline test confirms caches.create is called exactly once per run with 6 pages.
  • Pipeline test confirms the run completes when caches.create raises.
  • Pipeline test confirms caches.create is NOT called when there are no pending jobs.
  • (Out of scope locally) external_api-marker test against real Gemini caching — needs an opt-in test run with GEMINI_API_KEY.

The ~2-3K-token PAGE_EXTRACTION_PROMPT was previously re-billed on every page in the ~16K-page corpus. GeminiClient.create_cache now registers the prompt as a cachedContent resource (cached as system_instruction); subsequent extract_page calls reference the cache and skip re-sending the prompt. The pipeline calls create_cache once at the start of process_pending; failures degrade silently to the un-cached path so a cache hiccup never blocks a corpus run.

The response schema cannot live in the cache (SDK limitation — CreateCachedContentConfig has no response_schema field) and still travels in the per-call config. Savings are on the prompt portion only, which is the dominant input-token cost.

Batch mode (the second half of #40's scope) is deferred to a follow-up: it requires a separate orchestration pattern (24h-SLA submit-poll-fetch) that doesn't compose with the per-page async pipeline and warrants its own state-machine PR.

Partial: #40
Catches SDK-shape drift between the mock-based unit tests and the actual Gemini API surface. Marked `external_api` so the default test run skips it; opt in with `pytest -m external_api` and a valid `GEMINI_API_KEY`.

Environment-level failures (UNAUTHENTICATED, PERMISSION_DENIED, RESOURCE_EXHAUSTED) skip rather than fail — they're billing/auth/quota signals, not shape drift. A 400 INVALID_ARGUMENT would still propagate (that IS shape drift), which is what this test exists to surface.

First external_api test in the repo; `tests/golden/README.md` already references the marker but had no tests using it yet.
@jakebromberg jakebromberg merged commit b7683c2 into main May 11, 2026
3 checks passed
@jakebromberg jakebromberg deleted the worktree-sprint1-gemini-batch-cache branch May 11, 2026 02:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant