Skip to content

Server integration tests are flaky due to concurrent tests sharing mutable describe state #442

Description

@danshapiro

Summary

The server integration test suite is configured to run with sequence.shuffle: true and maxConcurrency: 10. Some test files use describe-scope shared mutable state (registry, WebSocket handler, HTTP server, ports, client state maps) but are not marked as sequential. This allows tests inside the same describe block to execute concurrently, which is unsafe for those suites and produces flaky failures.

Observed failures

test/server/ws-terminal-create-session-repair.test.ts

Shared state per suite:

  • server, port
  • FakeSessionRepairService
  • FakeRegistry
  • WsHandler bound to the above

Failure:

FAIL test/server/ws-terminal-create-session-repair.test.ts > terminal.create session repair wait > does not create terminal if client disconnects during repair wait
AssertionError: expected 1 to be +0 // Object.is equality
 ❯ test/server/ws-terminal-create-session-repair.test.ts:697:37

The test expected the registry to be empty after a client disconnected during session repair; instead it observed a stray terminal record, likely created by an overlapping test mutating the same shared registry variable.

test/server/ws-sidebar-snapshot-refresh.test.ts

This file uses beforeAll to create one server and one WsHandler shared across all tests and asserts on wsHandler.clientStates (e.g. getSingleClientState expects exactly one client).

Failure:

FAIL test/server/ws-sidebar-snapshot-refresh.test.ts > ws sidebar snapshot refresh > normalizes legacy agent-chat panes before deriving sidebar locators
AssertionError: expected [ Array(2) ] to have a length of 1 but got 2
 ❯ test/server/ws-sidebar-snapshot-refresh.test.ts:79:18

Running tests concurrently means multiple sockets can be attached to the shared handler at once, violating the single-client assumption.

Reproduction

Run the server integration suite repeatedly; the failures are non-deterministic because they depend on shuffled ordering and concurrent scheduling:

npm run test:integration
# or
npm run test:vitest -- --config vitest.server.config.ts test/server/ws-terminal-create-session-repair.test.ts --run --sequence.shuffle=true --maxConcurrency=20

Suggested fix

Mark these suites as describe.sequential(...) so their tests run in order and cannot mutate each other’s shared state. This is the same pattern already used by other long-lived-server suites in this repo, e.g. test/unit/server/terminal-registry.codex-recovery.test.ts and test/integration/real/coding-cli-session-contract.test.ts.

A fix branch is prepared at find-bug (commit 8097a26193db8d28f3ecfcf9aa1f6c4fe16564d1).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions