Parent: #563
Context
lol_html fragments text nodes across input chunk boundaries when processing HTML incrementally. Script rewriters (NextJsNextDataRewriter, GoogleTagManagerIntegration) currently expect complete text content — if a domain string like "googletagmanager.com" is split across chunks, the rewrite silently fails.
Phase 1 works around this with a dual-mode HtmlRewriterAdapter: streaming mode when no script rewriters are registered, buffered mode when they are. This means streaming only benefits configs without GTM/NextJS script rewriters.
Phase 3 makes the rewriters themselves fragment-safe, enabling streaming for ALL configurations.
Approach
Each script rewriter accumulates text fragments internally via is_last_in_text_node, then operates on the complete text. Key considerations:
- Intermediate fragments must return
Replace("") (not Keep) to suppress output, since the full accumulated text is emitted on the final fragment
- When the rewriter returns
Keep on the full text but fragments were suppressed, must emit Replace(full_text) to restore the content
- When text is NOT fragmented (single fragment), return
Keep as before — no unnecessary replacement
- Multiple rewriters on the same selector (e.g.,
NextJsNextDataRewriter on script#__NEXT_DATA__ + NextJsRscPlaceholderRewriter on script) each accumulate independently — last text.replace() wins, same as current behavior
Tasks
Acceptance Criteria
- All script rewriters produce correct output regardless of chunk boundaries
HtmlRewriterAdapter always streams (no buffered mode)
- Streaming benefits all configurations, not just those without script rewriters
- All existing tests pass
Parent: #563
Context
lol_htmlfragments text nodes across input chunk boundaries when processing HTML incrementally. Script rewriters (NextJsNextDataRewriter,GoogleTagManagerIntegration) currently expect complete text content — if a domain string like"googletagmanager.com"is split across chunks, the rewrite silently fails.Phase 1 works around this with a dual-mode
HtmlRewriterAdapter: streaming mode when no script rewriters are registered, buffered mode when they are. This means streaming only benefits configs without GTM/NextJS script rewriters.Phase 3 makes the rewriters themselves fragment-safe, enabling streaming for ALL configurations.
Approach
Each script rewriter accumulates text fragments internally via
is_last_in_text_node, then operates on the complete text. Key considerations:Replace("")(notKeep) to suppress output, since the full accumulated text is emitted on the final fragmentKeepon the full text but fragments were suppressed, must emitReplace(full_text)to restore the contentKeepas before — no unnecessary replacementNextJsNextDataRewriteronscript#__NEXT_DATA__+NextJsRscPlaceholderRewriteronscript) each accumulate independently — lasttext.replace()wins, same as current behaviorTasks
Mutex<String>accumulation toNextJsNextDataRewriterMutex<String>accumulation toGoogleTagManagerIntegrationnew_buffered()fromHtmlRewriterAdapter— always streamhas_script_rewritersgate fromcreate_html_processor__NEXT_DATA__rewrite with text split across chunk boundariesAcceptance Criteria
HtmlRewriterAdapteralways streams (no buffered mode)