Pipeline: src/version2/batch_vlm_analysis_lithops_v2.py
Model: google/gemini-2.5-flash-lite via OpenRouter
Infrastructure: AWS Lambda (Lithops), S3
Date: 2026-03-15
The production VLM pipeline processes comic-book page images through Gemini Flash 2.5 Lite and
expects a structured JSON response for each page (overall summary + per-panel bounding boxes,
descriptions, dialogue, etc.). Gemini frequently returns technically invalid JSON — unescaped
inner quotation marks, truncated responses, markdown formatting artefacts, invalid Unicode escapes,
and more. This document records every error class encountered during iterative batch testing
(~5 000 pages), the root cause of each, and the repair strategy implemented in inner_repair_json.
| Batch | Records | Success | Fail | Fail % | Notes |
|---|---|---|---|---|---|
| 1 | 100 | 93 | 7 | 7.0% | Baseline |
| 2 | 100 | 98 | 2 | 2.0% | After fix round 1 |
| 3 | 200 | 185 | 15 | 7.5% | New pages |
| 4 | 15 | 14 | 1 | 6.7% | Retry of 15 |
| 5 | 500 | 483 | 17 | 3.4% | New pages |
| 6 | 17 | 15 | 2 | 11.8% | Retry of 17 |
| 7 | 1 000 | 984 | 16 | 1.6% | After fix round 2 |
| 8 | 16 | 15 | 1 | 6.3% | Retry of 16 |
| 9 | 2 000 | 1 979 | 21 | 1.0% | After fix round 3 |
| 10 | 21 | 20 | 1 | 4.8% | Retry of 21 |
| 11 | 1 000 | 992 | 8 | 0.8% | After fix round 4 |
| 12 | 8 | 8 | 0 | 0.0% | Retry of 8 — all clear |
Effective production failure rate: ~0% when a single automatic retry pass is added (all observed failures resolved on retry with a fresh API call).
Token stats (n=1000): min=351, mean=2385, p95=4071, max=8223, truncated(>=8192)=19
- ~1.9% of pages hit the
max_tokens=8192ceiling and are truncated mid-response. - Truncation is reliably identified by
completion_tokens >= 8192in the usage dict. - Tokens slightly above 8192 (e.g. 8199, 8217, 8223) are returned by the API due to special-token accounting; these are still effectively truncated responses.
- The p95 of ~4 000 tokens means most pages complete well under the limit.
- Prompt tokens are ~3 657 for the current production prompt (image + text).
Frequency: Most common non-truncation failure (~60% of non-truncation errors)
Cause: The LLM embeds quoted words directly inside JSON string values without escaping:
"description": "She is about to "school" him on the matter"
The parser closes the outer string at "school, making him on the matter" outside the string.
Sub-patterns:
json_str[pos] char |
Meaning | Fix |
|---|---|---|
" |
Opening quote of the inner word (e.g. "school) |
Escape directly: \" |
| Letter/punctuation | Outer string closed early; error lands after closing " |
Backwards search for premature closer |
., ?, ! (was missed) |
Period/punctuation after quoted word (e.g. "Whatever it takes.") |
Covered by not in ':{}[],\\"' condition |
Backwards search mechanic (_last_unescaped_quote):
Scan backwards from the error position up to 200 characters to find the last unescaped " and
escape it. The 200-char lookback cap prevents accidentally escaping structural quotes far from the
error site.
Frequency: ~15% of non-truncation errors
Cause: When the unescaped word is followed by , in the text (e.g. "Doc", is speaking):
- Outer string closes at
"beforeDoc. - The literal
,in"Doc",is consumed as a key-value separator. - Parser looks for the next key but finds a plain letter →
Expecting property name.
This is a two-step unescaped-quote fix — the first backward-search pass (triggered by an
earlier , error) escapes the opening \"Doc, but the string then closes at the second " of
Doc. The Expecting property name handler applies a further backwards search to escape that
second " as well.
Frequency: ~35% of all failures; reliably identified by completion_tokens >= 8192
Cause: Response hits the max_tokens=8192 ceiling mid-JSON, typically mid-string inside a
panel's description field.
Fix (stack-based closer):
A stack-based scanner tracks every { and [ encountered outside strings. On truncation:
- If
_in_sis True (truncated inside a string), append"to close it. - Emit
''.join(reversed(_stk))to close all open brackets in the correct nesting order.
The old naive approach (']'*N + '}'*M) produced wrong nesting order for structures like
{"panels": [{"description": "... which needs "}]} not }]}.
Step 6 — depth-based truncation:
After the stack closer, scan for the first position where bracket depth returns to 0 and return
only up to that point, discarding any duplicate/extra JSON objects.
Frequency: ~10% of failures in deeper panel content
Cause: The LLM writes a character name as a key:
"character "Lex" Luthor": "description"The key string closes at " before Lex, and the parser finds L where it expects :.
Fix:
Expecting ':' delimiter handler: if json_str[pos] is a letter/non-structural char AND the
error is more than 200 chars from EOF, apply backwards search to escape the premature closer.
Structural quote guard (_is_structural_quote):
A ':' error near a structural key opener (a " at the start of an indented line, preceded by
newline + whitespace) must not be escaped — it is a legitimate key opener. The guard checks
whether the quote found by backwards search is preceded (ignoring spaces/tabs) by \n, {, [,
or ,. If so, break and let the stack closer handle it.
This guard also applies to Expecting property name backwards searches but NOT to
Expecting ',' backwards searches (where over-escaping eventually hits a break-char and stops,
and the stack closer recovers).
Frequency: Rare (~1–2 per 1 000 pages)
Cause: The model outputs \u followed by non-hex characters, e.g. \uROSC or \u (space).
Python's json module raises Invalid \uXXXX escape for these.
Fix (step 2.5):
json_str = re.sub(r'\\u(?![0-9a-fA-F]{4})', r'\\\\u', json_str)Negative lookahead: if \u is NOT followed by exactly 4 hex digits, double-escape the backslash
so it becomes a literal \u string rather than a Unicode escape sequence.
Frequency: Rare
Cause: The model outputs invalid escapes like \x, \p, \d inside strings.
Fix (step 2.5):
def fix_escapes(match):
esc = match.group(1)
valid = {'"', '\\', '/', 'b', 'f', 'n', 'r', 't'}
if esc in valid or esc == 'u':
return match.group(0)
return '\\\\' + esc
json_str = re.sub(r'\\([^"\\\/bfnrtu])', fix_escapes, json_str)Frequency: Occasional (~1–3%)
Cause: The model wraps its JSON response in ```json ... ``` fences despite being asked
for raw JSON via response_format: {type: json_object}.
Fix (step 1): Strip leading ```json or ``` and trailing ```.
Frequency: Rare but distinctive
Cause: The model uses markdown bold formatting before a JSON key:
** "panels": [
The ** before "panels" breaks the JSON structure — the parser sees * where it expects a
property name.
Fix (step 1.5):
json_str = re.sub(r'(?m)^(\s*)\*+\s*(?=")', r'\1', json_str)Strips any * sequence at the start of an indented line when immediately before a ".
Frequency: Rare, when OpenRouter returns "message": null in a choice
Cause: choice['message']['content'] raises AttributeError when message is null.
Fix:
message = choice.get('message') or {}
content = message.get('content') or ''Frequency: Very rare (~2 per 2 000 pages)
Cause: OpenRouter occasionally returns HTTP 200 with empty content and all-zero usage
counts. Possibly a cached error response or provider timeout served as success.
Fix: Detect not content and completion_tokens == 0 and retry up to 2 times with 5s/10s
backoff before failing.
Cause: Rate limits (429) and transient Lambda/OpenRouter server errors (5xx).
Fix: Retry loop with:
- 429 →
sleep(15 * attempt)(generous back-off for rate limits) - 5xx →
sleep(5 * attempt)(brief back-off for transient errors) - 4xx other → no retry (auth failure, bad request)
1. Strip markdown code fences (``` ... ```)
1.5. Strip markdown bold/italic markers before keys (** "key" → "key")
2. Find first { and discard prefix
2.5. Fix invalid escape sequences (\x → \\x, \uINVALID → \\uINVALID)
3-4. Fix missing commas between properties (regex)
4.5. Iterative unescaped-quote fixer (up to 100 parse-fix-reparse iterations):
- Expecting ',' delimiter:
pos == '"' → escape directly (inner word opening quote)
pos is letter/etc → backwards search for premature closer
- Expecting property name:
pos is letter/etc → backwards search (with structural-quote guard)
- Expecting ':' delimiter:
pos == '"' → escape directly
pos is letter/etc AND >200 chars from EOF AND not structural quote
→ backwards search
- Expecting value:
trailing comma → strip comma
None/undefined/NaN → replace with null
5. Stack-based closer (close open strings and {[ in correct nesting order)
6. Depth-based truncation (return first complete JSON object only)
Using json.loads(json_str, strict=False) allows the parser to report errors deep inside
multi-line strings (real-world panel descriptions span many lines). In strict mode, the parser
would abort at the first newline inside a string, giving an unhelpful Unterminated string error
at the wrong position.
Prevents the backwards search from finding structural quotes (key openers, array element quotes) that are far from the error site and should not be escaped.
Adding the structural guard to ',' caused regressions (Abbott-001 p002 style cases where a
subsequent retry would have succeeded but the guard caused early termination and step 5/6 produced
invalid output). For ',' errors, over-escaping a structural quote eventually hits : or another
break-character and stops naturally; step 5/6 then recovers. The guard is only applied where
it was confirmed necessary: ':' errors and Expecting property name errors.
The specific failure pattern (p063): a truncated response where a cross-line strict=False parse
caused the backwards search to find a key opener ("panel_number" at start of indented line),
and escaping it produced \"panel_number" — corrupting the key structure. The error position
was ~11 000 chars from EOF (clearly not truncation-at-EOF) but the " found was still structural.
The naive approach counted { and [ and appended ']'*open_brackets + '}'*open_braces. This
produced wrong nesting order for {"panels": [{"desc": "... → needs "}]} but naive produced
}]}. The stack-based approach tracks open tokens in order and reverses them, always producing
correct nesting.
| max_tokens | Truncation rate | Notes |
|---|---|---|
| 4 096 | ~8–10% estimated | Would trigger stack-closer much more frequently |
| 8 192 | ~1.9% observed | Current production setting |
| 16 384 | ~0.1% estimated | Would require larger Lambda memory |
The current 8 192 limit is a reasonable trade-off. Increasing to 16 384 would nearly eliminate truncation failures but at higher per-call cost and potentially requiring a Lambda memory increase.
All 8 remaining failures in the final batch resolved on a simple re-run. Adding an automatic
retry pass inside main() would bring the effective failure rate to ~0%:
# After chunk results: re-submit failures for one retry pass
failed_ids = {r['canonical_id'] for r in results if r['status'] == 'error'}
if failed_ids:
retry_tasks = [t for t in tasks if t['task_data']['canonical_id'] in failed_ids]
retry_futures = fexec.map(process_page_vlm, retry_tasks)
retry_results = fexec.get_result(retry_futures)
# merge retry_results back into resultsThis handles:
- Nondeterministic LLM responses that happen to be cleaner on the second call
- Truncated responses where a shorter retry response avoids truncation
- Transient API issues that the inner retry loop didn't catch
| Chunk | Duration | Truncated | Fail | Note |
|---|---|---|---|---|
| 1 (records 5001–6000) | ~1 min | 21 (2.1%) | 11 | Normal |
| 2 (records 6001–7000) | ~3.5 min | 48 (4.8%) | 20 | Rate-limit degradation |
| 3 (records 7001–8000) | hung at 2/1000 | — | — | Full rate-limit saturation |
Root cause: OpenRouter has per-API-key rate limits (RPM and/or concurrent-request limits).
At --workers 100 with a 1 000-record chunk, Lithops can launch all 1 000 Lambdas nearly
simultaneously. After 2 000 API calls in ~4 minutes, OpenRouter throttles heavily.
- Chunk 2 degradation signals: 3.5× slower duration AND 2.3× more truncations (the model truncates under load, serving shorter responses more quickly).
- Chunk 3 hit the wall: all 1 000 Lambdas received 429s and entered their retry sleep loops simultaneously. With backoff of 15 → 30 → 45 seconds (90 s total), plus Lambda timeout constraints, most functions timed out before succeeding.
-
--jitterflag (default 30 s): Each Lambda sleepsrandom.uniform(0, jitter_max)seconds before its first API call. This spreads 1 000 concurrent API calls over a 30-second window (~33 req/s) instead of a sub-second burst. -
Jittered 429 backoff: Retry sleeps are now
15*(attempt+1) + random(0,10)and5*(attempt+1) + random(0,5)so all rate-limited Lambdas do not retry at the same moment.
--workers 50 --jitter 30 # ~33 req/s peak, well under typical OpenRouter limits
--workers 100 --jitter 60 # same ~17 req/s, more cautious
Disable jitter only for small batches where burst is not a concern:
--limit 50 --workers 10 --jitter 0
- 8 341 Comic Analysis API calls processed on this date (~0.7% of the 1.2M corpus)
- Single model:
google/gemini-2.5-flash-litevia OpenRouter — 100% of calls - Source:
openrouter_activity_2026-03-15.csv - Analysis script:
testcopilot/comic_report.py→comic_report_results.json
| Token type | Google listed rate | OpenRouter actual rate | Markup |
|---|---|---|---|
| Input | $0.075 / M | $0.10 / M | ~33% |
| Output | $0.30 / M | $0.40 / M | ~33% |
OpenRouter's ~33% provider markup was absent from the March 1 benchmark doc, which used Google's listed rates.
| Metric | Value |
|---|---|
| Avg prompt tokens | 3 457 (mix of 3 657 Amazon + 2 109 Calibre) |
| Avg completion tokens | 2 499 |
| Truncated responses (completion ≥ 8 192) | 227 / 8 341 = 2.72% |
finish_reason = stop |
8 108 (97.2%) |
finish_reason = length |
229 (2.7%) |
| finish_reason = `` (empty) | 4 (0.05%) |
| Metric | Value |
|---|---|
| Total spend | $11.19 |
| Mean | $0.001342 |
| Median (p50) | $0.001278 |
| p75 | $0.001553 |
| p90 | $0.001870 |
| p95 | $0.002204 |
| p99 | $0.003642 |
| Max | $0.003668 |
| Component | Cost |
|---|---|
| Model API ($0.001342 × 1 200 000) | $1 610 |
| AWS S3 egress + Lambda (fixed) | $608 |
| Total all-in | $2 218 |
vs. March 1 doc estimate of $1 988 — a +$230 (+11.6%) underestimate, almost entirely due to the OpenRouter ~33% provider markup not being in the original doc.
- Avg prompt drops 3 657 → 2 109 tokens (~42% reduction)
- Estimated cost: ~$0.00093/page vs $0.00134 for Amazon
- Savings:
$0.00041/page × 400K pages ≈ **$164 saving** - Blended 1.2M projection with Calibre mix: model ~$1 500, all-in ~$2 108
| Item | Estimate | Notes |
|---|---|---|
| Failure reruns | +~$16 | ~1% of pages make 2 API calls; 1.2M × 1% × $0.001342 |
| Internal retries | +~$5–10 | 429/5xx/zero-token retries inside Lambda — billed but invisible in success count |
| AWS egress + Lambda | $608 (estimated) | Pending verification via AWS Cost Explorer in coming days |
| Calibre tranche saving | −~$164 | ~400K pages at $0.00093/page vs $0.00134 |
Likely true all-in range: ~$2,075–$2,250 once Calibre savings and retry overhead are netted off and AWS actuals confirmed.