Skip to content

fix(fpm): guard response reads and strip bridge headers#62

Merged
sjinks merged 3 commits into
trunkfrom
pltfrm-2424-cron-mysteriously-stuck-10476
Jun 6, 2026
Merged

fix(fpm): guard response reads and strip bridge headers#62
sjinks merged 3 commits into
trunkfrom
pltfrm-2424-cron-mysteriously-stuck-10476

Conversation

@sjinks

@sjinks sjinks commented Jun 5, 2026

Copy link
Copy Markdown
Member

Summary

  • Add an opt-in -fpm-response-timeout flag that bounds the FPM response read/write phase and closes the response pipe on timeout.
  • Preserve existing default behavior with -fpm-response-timeout=0.
  • Clear customer-emitted headers in the devcontainer fpm-cron-runner.php bridge before returning the JSON envelope.
  • Add focused unit coverage for a hung FPM response path.

Why

PLTFRM-2424 traced stuck Cron Control Runner behavior to long customer-emitted response headers reaching the FPM/gofast path before normal watcher startup. The production bridge cleanup is being handled in Automattic/wpvip-dockerfiles#493; this PR keeps the runner devcontainer bridge in parity and adds runner-side fail-fast hardening for future FPM response stalls.

Validation

  • php -l .devcontainer/local-features/cron-runner-environment/fpm-cron-runner.php
  • go test ./...
  • git diff --check
  • Dev-container FastCGI validation confirmed a temporary mu-plugin emitting a 6000-byte Content-Security-Policy header returned only Content-Type: application/json and a valid { "buf", "stdout", "stderr" } JSON envelope.

Notes

  • Timeout remains disabled by default to avoid changing behavior for legitimate long-running cron commands.

Copilot AI review requested due to automatic review settings June 5, 2026 14:14
@sjinks sjinks changed the title Guard FPM response reads and strip bridge headers fix(fpm): guard response reads and strip bridge headers Jun 5, 2026
@sjinks sjinks self-assigned this Jun 5, 2026

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an opt-in FastCGI (FPM) response timeout to prevent the runner from hanging indefinitely when the FPM bridge stalls, and aligns the devcontainer PHP bridge output to strip customer-emitted headers before returning the JSON envelope.

Changes:

  • Introduces -fpm-response-timeout (default 0, disabled) and threads it into the CLI performer to bound FPM response handling.
  • Adds a unit test covering a hung FPM response path using a gofast.ResponsePipe that never produces output.
  • Updates the devcontainer fpm-cron-runner.php bridge to remove all previously-set headers before sending the JSON response.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
readme.md Documents the new -fpm-response-timeout flag.
main.go Adds CLI flag plumbing and passes the timeout into the performer.
performer/cli.go Implements timeout-aware request creation + guarded response streaming.
performer/performer_test.go Adds a focused timeout test for a hung FPM response.
.devcontainer/local-features/cron-runner-environment/fpm-cron-runner.php Clears bridge-emitted headers before returning the JSON envelope.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread performer/performer_test.go
Comment thread performer/cli.go Outdated
Comment thread performer/performer_test.go Outdated
@sjinks sjinks requested a review from a team June 5, 2026 14:18
…wording

Addresses Copilot review comments on PR #62.

## What changed
- Bump the FPM response timeout test from 10ms to 1s to avoid CI flake
  from goroutine scheduling jitter on shared runners.
- Rename the timeout error wording from "response write timed out" to
  "response read timed out" so it reflects that the bound is on reading
  the FastCGI response stream (the in-memory `fakeHTTPResponseWriter`
  cannot block on writes), not on writing to a client.
- Update the test assertion to match the new wording.

## Why
- The 10ms threshold is below typical CI scheduler quanta; the test was
  fundamentally racy.
- "Response write timed out" misled the on-call about which leg of the
  FPM bridge was stuck during the incident referenced by this PR.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.

@sjinks sjinks merged commit 07bc1c3 into trunk Jun 6, 2026
1 check passed
@sjinks sjinks deleted the pltfrm-2424-cron-mysteriously-stuck-10476 branch June 6, 2026 19:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants