Skip to content

modal-qwen-vl-quad: add footer-strip call to populate comments_raw #33

@jakebromberg

Description

@jakebromberg

Problem

The per-quadrant adapter (modal-qwen-vl-quad) assembles a PageResult from a header-strip call plus four quadrant calls. None of those crops sees the bottom Comments: band — _crop_quadrants explicitly stops at layout.body_bottom_y so the comments line doesn't bleed into the bottom-quadrant transcriptions (scripts/calibrate_models.py L605-L631). With GeminiPageResult.comments_raw now landed (#31, PR #32), the adapter therefore emits comments_raw=None on every page even when the band has real content. The Gemini production path picks the field up because it sees the whole page; only modal-qwen-vl-quad has the gap.

Desired end state

modal-qwen-vl-quad adds a 6th remote call against a footer-strip crop (image[body_bottom_y:, :]) and populates PageResult.comments_raw from the result. On parse failure or empty band, comments_raw falls back to None — consistent with how the header call handles missing page_date_raw.

Files

  • scripts/calibrate_models.pymake_modal_qwen_vl_quad_adapter (L411-L505), _crop_header_strip/_crop_quadrants (L599-L631). Add a _crop_footer_strip(image, layout) helper that crops (0, layout.body_bottom_y, w, h), a FOOTER_WIRE_SCHEMA (analogous to HEADER_WIRE_SCHEMA at L403, just {"comments_raw": str|null}), and a FOOTER_EXTRACTION_PROMPT in core/prompts.py that mirrors HEADER_EXTRACTION_PROMPT's scoping discipline (capture verbatim, JSON null for blank, don't transcribe row content that bled in from above).
  • core/prompts.py — add FOOTER_EXTRACTION_PROMPT. Three top-level prompts becomes four; the module docstring at the top needs a one-line update.
  • core/page_layout.py — read-only reference. PageLayout.body_bottom_y already exists (L106) and is the load-bearing boundary; no schema change needed there.
  • tests/unit/test_calibrate_models.py — extend the existing adapter dispatch tests (test_modal_qwen_vl_quad_*) so the mocked .remote() is called 6 times (was 5); add a footer-failure test analogous to the header-failure case.
  • tests/unit/test_prompts.py — contract tests for FOOTER_EXTRACTION_PROMPT (names comments_raw, says verbatim, says JSON null for blank, scopes to footer band, forbids invented content).

Suggested approach

  1. Failing test first for _crop_footer_strip — a generated PIL.Image.new with a painted marker in the footer band; assert the crop captures it and excludes the body grid.
  2. Add FOOTER_WIRE_SCHEMA + FOOTER_EXTRACTION_PROMPT. Mirror header-call style: short prompt, page-scoped instructions, "do not transcribe row content from above the Comments line."
  3. Wire the 6th .remote() call inside the existing with app.run(): block, between the header call and the quadrant loop (or after the loop — doesn't matter functionally, but placing it next to the header call keeps the "page-level fields" cluster together). Same exception-tolerant pattern as the header call.
  4. Set comments_raw=... on the constructed PageResult.

The Modal container is warm for calls 2-6, so the cost delta is one warm forward pass per page (~5-10s, fractions of a cent). No new cold-start risk.

Constraints

  • Don't regress the header / quadrant calls. Wire the new call so a footer parse failure leaves the rest of the page intact — same try/except pattern as the header call.
  • Keep prompt fences sharp. The footer crop will overlap the bottom-quadrant baseline by a few pixels in skewed scans (just like the header strip overlaps the top-quadrant baseline). The prompt must tell the model to ignore content above the Comments: line. Otherwise the model will helpfully transcribe the last row of the bottom quadrants into comments_raw.
  • Match the page-level adapter's behavior. Gemini's full-page path returns the comments contents verbatim and null for blank; modal-qwen-vl-quad must end up doing the same so calibration A/B's are apples-to-apples.

Acceptance criteria

  • _crop_footer_strip helper exists with a unit test.
  • FOOTER_EXTRACTION_PROMPT exists with contract tests parallel to HEADER_EXTRACTION_PROMPT.
  • modal-qwen-vl-quad makes 6 remote calls per page; the 6th populates PageResult.comments_raw.
  • A footer parse failure leaves the rest of the page valid (regression test).
  • One-page Modal smoke run produces a PageResult with comments_raw set when the band has content, None when blank.

Related

Follow-up to #31 (closed by #32). Sibling structural fix to #19 (comments-line boundary detection), which is what made adding the 6th call cheap — layout.body_bottom_y already separates the footer band from the body grid.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions