Phase 3: Make script rewriters fragment-safe for streaming#591
Phase 3: Make script rewriters fragment-safe for streaming#591aram356 wants to merge 8 commits intofeature/streaming-pipeline-phase2from
Conversation
prk-Jr
left a comment
There was a problem hiding this comment.
Summary
Phase 3 completes the streaming pipeline by making NextJsNextDataRewriter and GoogleTagManagerIntegration fragment-safe via Mutex<String> accumulation, removing the buffered adapter fallback from Phase 1, and introducing a 2xx streaming gate in publisher.rs. The implementation is correct, WASM-safe (Mutex is a no-op on single-threaded wasm32-wasip1), and the Keep-after-accumulation bug fix has a precisely targeted regression test. No blocking issues.
Non-blocking
🤔 thinking
- Streaming gate tests verify the formula, not the function — see inline comment at
publisher.rs
♻️ refactor
- GTM uninit-variable pattern —
let full_content; let text = if...(google_tag_manager.rs:503) - GTM indirect accumulation check —
text.len() != content.len()(google_tag_manager.rs:522)
📝 note
accumulated_texton the sharedArc— google_tag_manager.rs:138
👍 praise
- Keep-after-accumulation fix + targeted regression test — script_rewriter.rs:89
- Performance results in the PR description — 35% TTFB / 37% DOM Complete improvements backed by a named methodology, metric definitions, and causal explanation. Well above the usual PR description standard.
CI Status
- browser integration tests: PASS
- integration tests: PASS
- prepare integration artifacts: PASS
crates/trusted-server-core/src/integrations/google_tag_manager.rs
Outdated
Show resolved
Hide resolved
crates/trusted-server-core/src/integrations/google_tag_manager.rs
Outdated
Show resolved
Hide resolved
ChristianPavilonis
left a comment
There was a problem hiding this comment.
Review Summary
This is a well-executed PR. No blocking issues found. The P2 items below are genuine improvements but none are required for merge. CI is all green (3/3 checks passing).
The PR cleanly completes Phase 3 of the streaming pipeline — making NextJsNextDataRewriter and GoogleTagManagerIntegration fragment-safe for lol_html text node splitting, removing the buffered adapter fallback, and adding a 2xx streaming gate.
Highlights
- Keep-after-accumulation bug fix is precisely handled with a targeted regression test
- Small-chunk (32-byte) pipeline regression tests catch fragmentation bugs end-to-end
- Clean surgical removal of buffered mode
- 2xx streaming gate is a smart defensive measure
- Excellent engineering documentation of performance methodology and results
P2 — rsc_placeholders.rs fragmentation strategy undocumented
🤔 The RSC placeholder rewriter (crates/trusted-server-core/src/integrations/nextjs/rsc_placeholders.rs around lines 57-64) skips fragmented scripts entirely (returns Keep for intermediate fragments), deferring to the post-processor. This is a valid approach since it has a post-processor fallback. However, it's worth documenting this as a deliberate design choice vs. the accumulation pattern used by the GTM and NextData rewriters in this PR. A brief comment there would help future maintainers understand why this rewriter doesn't use the same accumulation approach.
P3 — Minor
- ⛏ Spec still references Phase 1 workaround:
docs/superpowers/specs/2026-03-25-streaming-response-design.mdaround lines 230-231 still references the Phase 1 workaround in the "Files Changed" section — should be updated to reflect that all script rewriters are now fragment-safe. - ⛏ Performance results duplicated in 3 places: PR description, plan doc, and spec doc all contain the same performance numbers. Minor maintenance concern if numbers are updated later.
crates/trusted-server-core/src/integrations/google_tag_manager.rs
Outdated
Show resolved
Hide resolved
- Extract streaming gate into can_stream_response() function so tests call production code instead of reimplementing the formula - Refactor GTM rewrite() to use Option<String> pattern instead of uninit variable, replacing indirect text.len() != content.len() accumulation check with explicit full_content.is_some() - Add cross-element safety doc comment on accumulated_text fields in GTM and NextJsNextDataRewriter - Document RSC placeholder deliberate non-accumulation strategy - Update spec to reflect script rewriters are now fragment-safe
|
Addressing review feedback: Fixed in this push:
Acknowledged (not changed):
|
ChristianPavilonis
left a comment
There was a problem hiding this comment.
PR Review: Phase 3 — Fragment-Safe Streaming
Good work on the fragment accumulation design and streaming gate. The RemoveNode / Replace(full) pattern is clean, the streaming gate has solid test coverage, and the performance results are impressive (35% TTFB improvement).
Key concerns:
- SSRF protection removed — The
allowed_domainsproxy allowlist and all redirect chain validation were removed. If intentional, the stale docs need updating. If not, this is a security regression. - Accept-Encoding not restricted — Removing
restrict_accept_encodingcan cause silently corrupted output for unsupported encodings (e.g., zstd). - Synthetic ID validation relaxed — No length bound on the character-only check means unbounded values accepted for cookies and KV store keys.
- Fragment accumulation duplication — The pattern is identical in GTM and NextJS rewriters; a shared helper would reduce maintenance risk.
Several doc comments were also lost when # Errors sections were removed — the summary lines should be restored.
See inline comments for details on each finding.
Findings not attached inline (files/lines outside this diff):
🔧 Synthetic ID validation relaxed — no length bound (crates/trusted-server-core/src/synthetic.rs:117)
is_valid_synthetic_id() (enforcing <64-hex>.<6-alphanum>, exactly 71 bytes) was removed. Now any string of [a-zA-Z0-9._-] of any length is accepted as a synthetic ID via validate_existing_synthetic_id. This means arbitrarily long values can be stored in cookies, used as KV store keys for deletion (delete_consent_from_kv), and logged. At minimum, add a length upper-bound check (e.g., synthetic_id.len() <= 256) to prevent abuse.
⛏ Missing doc on is_last_in_text_node field (crates/trusted-server-core/src/integrations/registry.rs:94)
This field drives the entire Phase 3 fragment accumulation protocol. Add a doc comment explaining that lol_html may deliver text in multiple fragments and rewriters needing complete text must accumulate until this is true.
crates/trusted-server-core/src/integrations/nextjs/script_rewriter.rs
Outdated
Show resolved
Hide resolved
crates/trusted-server-core/src/integrations/google_tag_manager.rs
Outdated
Show resolved
Hide resolved
Accumulate text fragments via Mutex<String> until last_in_text_node is true, then process the complete text. Intermediate fragments return RemoveNode to suppress output.
Accumulate text fragments via Mutex<String> until last_in_text_node is true, then match and rewrite on the complete text. Non-GTM scripts that were fragmented are emitted unchanged.
All script rewriters (NextJS __NEXT_DATA__, GTM) are now fragment-safe — they accumulate text internally until last_in_text_node. The buffered adapter workaround is no longer needed. Always use streaming mode in create_html_processor.
When rewrite_structured returns Keep on accumulated content, intermediate fragments were already removed via RemoveNode. Emit the full accumulated content via Replace to prevent silent data loss. Also updates spec to reflect Phase 3 completion.
- Add response.get_status().is_success() check to streaming gate so 4xx/5xx error pages stay buffered with complete status codes - Add streaming gate unit tests covering all gate conditions - Add stream_publisher_body gzip round-trip test - Add small-chunk (32 byte) pipeline tests for __NEXT_DATA__ and GTM that prove fragmented text nodes survive the real lol_html path
Phase 3 performance results: 35% TTFB improvement, 37% DOM Complete improvement on getpurpose.ai staging vs production. Phase 4 adds binary pass-through streaming via PublisherResponse::PassThrough.
- Extract streaming gate into can_stream_response() function so tests call production code instead of reimplementing the formula - Refactor GTM rewrite() to use Option<String> pattern instead of uninit variable, replacing indirect text.len() != content.len() accumulation check with explicit full_content.is_some() - Add cross-element safety doc comment on accumulated_text fields in GTM and NextJsNextDataRewriter - Document RSC placeholder deliberate non-accumulation strategy - Update spec to reflect script rewriters are now fragment-safe
- Document why Mutex<String> is used (Sync bound on trait, not concurrent access) in both NextJsNextDataRewriter and GoogleTagManagerIntegration - Add accumulation_buffer_drains_between_consecutive_script_elements test proving the buffer doesn't leak between two sequential <script> elements (fragmented GTM followed by fragmented non-GTM)
db597e6 to
ff05483
Compare
Summary
Make all script rewriters fragment-safe so streaming works even with GTM and NextJS active. This removes the buffered fallback introduced in Phase 1, enabling full streaming for all configurations. Also adds the 2xx streaming gate, publisher-level tests, and small-chunk pipeline regression tests.
Closes #586, closes #587, closes #588, closes #589, closes #590.
Part of epic #563. Depends on Phase 2 (#585).
Performance results (staging vs production, median over 5 runs, Chrome 1440x900)
Production (v135) buffers the entire response body before sending any bytes to the client. Staging (v136) streams processed chunks incrementally via
StreamingBody. The 35% TTFB improvement cascades into earlier paint metrics, and DOM Complete sees the largest absolute gain (-397ms) because the browser can parse/render while still receiving the body.Metric definitions
Problem
lol_html fragments text nodes across input chunk boundaries. When the streaming
HtmlRewriterAdapterfeeds chunks incrementally, a text node like"googletagmanager.com/gtm.js"can be split into"google"and"tagmanager.com/gtm.js"— neither fragment matches the full domain string, so the rewrite silently fails.Phase 1 worked around this with a buffered adapter mode. Phase 3 fixes the root cause.
Solution
Each script rewriter now accumulates text fragments via
Mutex<String>untillast_in_text_node()is true, then processes the complete text:RemoveNode(suppress output, accumulate)Replace(rewritten)orKeepWhat changed
script_rewriter.rsNextJsNextDataRewriteraccumulates fragmentsgoogle_tag_manager.rsGoogleTagManagerIntegrationaccumulates fragmentsstreaming_processor.rsnew_buffered(),bufferedflag,accumulated_input, buffered testhtml_processor.rshas_script_rewriterscheck, always use streaming adapterpublisher.rsstream_publisher_bodygzip testnextjs/mod.rs__NEXT_DATA__pipeline regression testgoogle_tag_manager.rsTests added
fragmented_next_data_is_accumulated_and_rewritten— splits__NEXT_DATA__mid-URLunfragmented_next_data_works_without_accumulation— fast path still worksfragmented_next_data_without_rewritable_urls_preserves_content— Keep-after-accumulation bugfragmented_gtm_snippet_is_accumulated_and_rewritten— splits GTM domain mid-stringnon_gtm_fragmented_script_is_passed_through— non-GTM scripts emitted unchangedsmall_chunk_next_data_rewrite_survives_fragmentation— 32-byte chunks through full HTML pipelinesmall_chunk_gtm_rewrite_survives_fragmentation— 32-byte chunks through full HTML pipelinestreaming_gate_allows_2xx_html_without_post_processors— gate unit teststreaming_gate_blocks_non_2xx_responses— 4xx/5xx stays bufferedstreaming_gate_blocks_html_with_post_processors— post-processors force bufferingstreaming_gate_allows_non_html_with_post_processors— non-HTML streams regardlessstreaming_gate_blocks_non_2xx_json— error JSON stays bufferedstream_publisher_body_preserves_gzip_round_trip— public API gzip testVerification
cargo test --workspace— 766 passed, 0 failedcargo clippy --workspace --all-targets --all-features -- -D warnings— cleancargo fmt --all -- --check— cleancargo build --release --target wasm32-wasip1— successTest plan
__NEXT_DATA__test passes