Skip to content

Test flakiness: daemon webui port-bind race and community-create near-timeout on Windows #42

@Rinse12

Description

@Rinse12

Two unrelated flakes surfaced during the CI runs on PR #41. They are independent of that PR's code change — the same commit was green on the rerun without any code changes. Captured here so we can root-cause them rather than just live with the noise.

Flake 1 — Ubuntu: 5chan webui ECONNREFUSED

Test: `test/cli/daemon.test.ts:774` — `bitsocial daemon webui > 5chan webui does not contain the hash redirect script`

Symptom:
```
TypeError: fetch failed
Caused by: Caused by: AggregateError

  • connect ECONNREFUSED 127.0.0.1:39138
  • connect ECONNREFUSED ::1:39138
    ```

Failing run: https://github.com/bitsocialnet/bitsocial-cli/actions/runs/26142249527/job/76889997036 (since-passed on rerun)

Likely cause: The `beforeAll` calls `startPkcDaemon(...)` and resolves, but the HTTP listener on `localhost:39138` (the PKC RPC port) isn't actually accepting connections yet when the `fetch` runs. So either:

  • `startPkcDaemon` returns before the HTTP server has finished binding (race between "daemon ready" signal and listener `.listen` completion), or
  • The HTTP listener crashed during startup on this run.

Investigation suggestion: In `test/helpers/daemon-helpers.ts` (or wherever `startPkcDaemon` lives), make readiness wait on a `fetch(rpcUrl + '/')` probe returning OK, not just on a log line / pubsub-ready signal. The other daemon tests that do probe HTTP after startup pass consistently — this one only fetches once and trusts `beforeAll`.

Flake 2 — Windows: community create just barely times out

Test: `test/cli/command-completion-time.test.ts:125` — `CLI commands complete within 10s (real pkc instance) > community create completes within 10s`

Symptom: Timed out at 10018ms (baseline on healthy Windows runs: ~4455ms; on Ubuntu: ~1979ms).

Failing run: https://github.com/bitsocialnet/bitsocial-cli/actions/runs/26142249527/job/76889997061 (since-passed on rerun in 13m35s)

Likely cause: The 10s budget has too little headroom for Windows runners under load. We've previously seen Windows runs roughly 2-3x slower than Linux. A test budget that fits in 2s on Linux + 4.5s on a fast Windows runner has no slack for a slow runner.

Investigation suggestion: Either:

  • Bump the timeout for Windows specifically (`process.platform === 'win32' ? 20_000 : 10_000`), or
  • Measure where the 5-10s is actually spent and shave whatever's avoidable. If most of it is daemon-handshake / pubsub warmup that the test doesn't care about, the test could pre-warm before `it()` starts.

Why this isn't pkc-js

PR #41 had two commits — first the logs fix alone (green on all 3 platforms), then the logs fix + pkc-js 0.0.34 → 0.0.35 bump (these two flakes). The same SHA passed on rerun with no code changes, ruling out the bump as the cause.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions