feat(test): unify e2e harness across local and kubernetes#1381
Conversation
Consolidate the three overlapping e2e harnesses — nix/k8s-tests and the standalone dev-scripts/test-cdc-lifecycle-e2e.py / test-inflight-staging-contention-e2e.py drivers — into one scenario-driven harness at nix/e2e-tests, exposed as `task test:e2e` / `nix run .#e2e`. A `--mode local|kubernetes` flag selects the substrate: local drives ncps through dev-scripts/run.py against the fixed-port `nix run .#deps` backends; kubernetes provisions a Kind cluster and installs the Helm chart. Scenarios are declared once in config.nix (the existing catalog, kept as the single source of truth and materialized via `nix eval`), with harness-only `phase` and `modes` keys derived or overridden per entry. A Deployment adapter (LocalDeployment) plus shared Client/DBAccess let the serve, cdc-lifecycle, and staging-contention phase drivers run unchanged across backends; kubernetes mode reuses the in-cluster NCPSTester validation. The two former drivers and their *-auto.sh wrappers are removed (parity verified live: cdc-lifecycle and staging-contention both pass in local mode); k8s_tests.py, k8s_tests_tester.py and config.nix move into nix/e2e-tests so the k8s backend is folded into the unified package and nix/k8s-tests is deleted (devshell now ships packages.e2e). Docs (Contributing, Testing, CLAUDE.md, README) point at the new entrypoint. The harness is manual/opt-in and intentionally stays out of `nix flake check`; no scenario is promoted unless proven under three minutes. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
This change is part of the following stack: Change managed by git-spice. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (8)
✅ Files skipped from review due to trivial changes (3)
🚧 Files skipped from review as they are similar to previous changes (5)
📝 WalkthroughSummary by CodeRabbit
WalkthroughA comprehensive consolidation of three overlapping E2E test harnesses into a single scenario-driven Python framework. The new unified harness supports both local ( ChangesUnified E2E Harness
Sequence DiagramsequenceDiagram
participant CLI
participant Runner
participant Catalog
participant Local
participant K8s
CLI->>Runner: run_scenario(--mode, --scenario, --verbose)
Runner->>Catalog: find_scenario(name)
Runner->>Runner: supports(mode)?
alt mode=local
Runner->>Local: provision + run phase
Local->>Runner: PASS/FAIL/ERROR
else mode=kubernetes
Runner->>K8s: run_kubernetes_scenario(scenario)
K8s->>Runner: rc (0/1)
end
Runner-->>CLI: exit code
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Poem
✨ Finishing Touches📝 Generate docstrings
|
|
Pages deployed to https://2b092783.ncps-docs.pages.dev |
There was a problem hiding this comment.
Code Review
This pull request consolidates multiple end-to-end test harnesses into a single, unified scenario-driven e2e harness under nix/e2e-tests/ that supports both local and Kubernetes execution modes. The review feedback highlights several critical correctness and robustness improvements, including fixing an incorrect relative path calculation for the repository root, defensively handling potentially null values from the Nix configuration, decoding URL-encoded database credentials, ensuring the Deployment protocol declares all invoked methods, and explicitly specifying UTF-8 encoding when opening log files.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
nix/e2e-tests/README.md (1)
81-99:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAdd language specifier to fenced code block.
The fenced code block at line 81 is missing a language specifier, which is flagged by markdownlint (MD040). Add a language identifier for proper syntax highlighting and markdown compliance.
📝 Proposed fix
-``` +```text nix/e2e-tests/ flake-module.nix packages.e2e + apps.e2e (writeShellApplication)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@nix/e2e-tests/README.md` around lines 81 - 99, Add a language specifier to the fenced code block that shows the project tree (the block beginning with "nix/e2e-tests/ ..."), e.g., change the opening triple backticks to include a language like text or bash; update the README.md fenced block so markdownlint MD040 is satisfied and syntax highlighting is explicit.Source: Linters/SAST tools
🧹 Nitpick comments (2)
nix/e2e-tests/src/client.py (1)
78-110: ⚡ Quick winConsider extracting duplicate decompression logic.
The decompression logic in
served_nar_digest(lines 87-96) is duplicated in the standalonedecode_narfunction (lines 102-110). Consider refactoringserved_nar_digestto calldecode_narinstead.♻️ Proposed refactor
def served_nar_digest(self, narinfo_fields: Dict[str, str]) -> Tuple[str, int]: """(sha256 of DECOMPRESSED NAR, served byte length). Comparing the decompressed-content digest proves byte-identity across compression representations (xz whole-file vs none-from-chunks), which catches same-size corruption a length check would miss. """ raw = self.fetch_nar_bytes(narinfo_fields) comp = narinfo_fields.get("Compression", "none") - if comp in ("none", ""): - data = raw - elif comp == "xz": - data = lzma.decompress(raw) - elif comp in ("zst", "zstd"): - import zstandard - - data = zstandard.ZstdDecompressor().stream_reader(io.BytesIO(raw)).read() - else: - raise RuntimeError(f"served_nar_digest: unsupported compression {comp!r}") + data = decode_nar(raw, comp) return hashlib.sha256(data).hexdigest(), len(raw)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@nix/e2e-tests/src/client.py` around lines 78 - 110, The decompression logic is duplicated between served_nar_digest and decode_nar; update served_nar_digest to call decode_nar(raw, comp) instead of reimplementing branching logic: obtain comp via narinfo_fields.get(...), call data = decode_nar(raw, comp), then compute and return hashlib.sha256(data).hexdigest() and len(raw); preserve the existing RuntimeError behavior by relying on decode_nar for unsupported compressions and ensure any required imports used by decode_nar (lzma, zstandard, io) remain available.nix/e2e-tests/src/phases/cdc_lifecycle.py (1)
97-97: ⚡ Quick winClarify the log message format.
The message format
({before} -> after)is misleading because "after" appears to be a variable placeholder but is actually literal text. Either compute and include the actual after-count, or use clearer wording.📝 Suggested improvements
Option 1 - Show the actual count:
- check(_chunked_nar_count(db) > before, f"new chunked NAR recorded ({before} -> after)") + after = _chunked_nar_count(db) + check(after > before, f"new chunked NAR recorded ({before} -> {after})")Option 2 - Use clearer wording:
- check(_chunked_nar_count(db) > before, f"new chunked NAR recorded ({before} -> after)") + check(_chunked_nar_count(db) > before, f"new chunked NAR recorded (was {before})")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@nix/e2e-tests/src/phases/cdc_lifecycle.py` at line 97, The log message passed to check is misleading because it uses the literal "after" instead of the actual post-count; compute the current count by calling _chunked_nar_count(db) into a variable (e.g., after) and use it in the message or change the wording to something like "new chunked NAR recorded (was {before})" to avoid implying a variable. Update the call in the check that currently references _chunked_nar_count(db) and f"new chunked NAR recorded ({before} -> after)" to include the computed after value or clearer wording so the message accurately reflects the state.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@nix/e2e-tests/src/catalog.py`:
- Around line 36-44: The repo_root calculation uses os.path.dirname three times
on __file__, which yields the package directory instead of the repository root;
update the repo_root expression to apply one more os.path.dirname (i.e., four
dirname() calls) so that repo_root points to the repository root before
constructing candidate paths (look for the repo_root variable assignment where
__file__ is wrapped with os.path.abspath and nested os.path.dirname calls).
In `@nix/e2e-tests/src/deployment.py`:
- Around line 36-38: Add a proper type annotation for the `extra` parameter on
the `run_subcommand` method: change its type to Optional[List[str]] (or
Sequence[str] if you prefer immutability) and import the needed typing symbols
(Optional, List or Sequence) at the top of the module; update the signature of
run_subcommand to reflect the new annotation so callers and static checkers know
`extra` is an optional list of CLI argument strings.
In `@nix/e2e-tests/src/phases/cdc_lifecycle.py`:
- Line 135: The second return value from
deployment.run_subcommand("migrate-chunks-to-nar", ["--force-reclaim"]) is
intentionally unused; replace the unused variable name `out` with a throwaway
underscore variable (e.g., `_`) so the call becomes `rc, _ =
deployment.run_subcommand(...)`, updating any references to `out` if present;
this makes intent explicit while keeping `rc` as the meaningful result.
In
`@openspec/changes/archive/2026-06-09-consolidate-test-harnesses/specs/unified-e2e-harness/spec.md`:
- Around line 24-25: The example phrase "a multi-replica scenario requested in a
single-process local mode" contradicts the PR (local mode can be used for
multi-replica scenarios); update the wording in the WHEN/THEN block to a
non-conflicting topology example such as "a multi-datacenter or
network-partitioned topology requested in single-process local mode" or remove
the example entirely; edit the quoted lines ("**WHEN** a scenario requires a
topology the selected mode cannot express ..." and the following example) so the
example accurately reflects a topology local mode cannot express (e.g.,
multi-datacenter or network-partitioned) and ensure the THEN clause remains
unchanged (report SKIPPED with reason, not PASSED).
---
Outside diff comments:
In `@nix/e2e-tests/README.md`:
- Around line 81-99: Add a language specifier to the fenced code block that
shows the project tree (the block beginning with "nix/e2e-tests/ ..."), e.g.,
change the opening triple backticks to include a language like text or bash;
update the README.md fenced block so markdownlint MD040 is satisfied and syntax
highlighting is explicit.
---
Nitpick comments:
In `@nix/e2e-tests/src/client.py`:
- Around line 78-110: The decompression logic is duplicated between
served_nar_digest and decode_nar; update served_nar_digest to call
decode_nar(raw, comp) instead of reimplementing branching logic: obtain comp via
narinfo_fields.get(...), call data = decode_nar(raw, comp), then compute and
return hashlib.sha256(data).hexdigest() and len(raw); preserve the existing
RuntimeError behavior by relying on decode_nar for unsupported compressions and
ensure any required imports used by decode_nar (lzma, zstandard, io) remain
available.
In `@nix/e2e-tests/src/phases/cdc_lifecycle.py`:
- Line 97: The log message passed to check is misleading because it uses the
literal "after" instead of the actual post-count; compute the current count by
calling _chunked_nar_count(db) into a variable (e.g., after) and use it in the
message or change the wording to something like "new chunked NAR recorded (was
{before})" to avoid implying a variable. Update the call in the check that
currently references _chunked_nar_count(db) and f"new chunked NAR recorded
({before} -> after)" to include the computed after value or clearer wording so
the message accurately reflects the state.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 7ab40503-4bdc-41b9-84f5-e20d8d187f62
📒 Files selected for processing (46)
CLAUDE.mdTaskfile.ymldev-scripts/profile-flake-checks.pydev-scripts/test-cdc-lifecycle-auto.shdev-scripts/test-cdc-lifecycle-e2e.pydev-scripts/test-inflight-staging-contention-auto.shdev-scripts/test-inflight-staging-contention-e2e.pydocs/docs/Developer Guide/Contributing.mddocs/docs/Developer Guide/Testing.mdflake.nixnix/checks/flake-module.nixnix/devshells/flake-module.nixnix/e2e-tests/README.mdnix/e2e-tests/config.nixnix/e2e-tests/flake-module.nixnix/e2e-tests/src/__init__.pynix/e2e-tests/src/catalog.pynix/e2e-tests/src/cli.pynix/e2e-tests/src/client.pynix/e2e-tests/src/db.pynix/e2e-tests/src/deployment.pynix/e2e-tests/src/deps.pynix/e2e-tests/src/harness_config.pynix/e2e-tests/src/k8s_tests.pynix/e2e-tests/src/k8s_tests_tester.pynix/e2e-tests/src/kubernetes_mode.pynix/e2e-tests/src/local.pynix/e2e-tests/src/phases/__init__.pynix/e2e-tests/src/phases/cdc_lifecycle.pynix/e2e-tests/src/phases/serve.pynix/e2e-tests/src/phases/staging_contention.pynix/e2e-tests/src/runner.pynix/k8s-tests/README.mdnix/k8s-tests/flake-module.nixopenspec/changes/archive/2026-06-09-consolidate-test-harnesses/.openspec.yamlopenspec/changes/archive/2026-06-09-consolidate-test-harnesses/design.mdopenspec/changes/archive/2026-06-09-consolidate-test-harnesses/proposal.mdopenspec/changes/archive/2026-06-09-consolidate-test-harnesses/specs/cdc-lifecycle-e2e/spec.mdopenspec/changes/archive/2026-06-09-consolidate-test-harnesses/specs/cdc-lifecycle-k8s-test/spec.mdopenspec/changes/archive/2026-06-09-consolidate-test-harnesses/specs/inflight-staging-contention-e2e/spec.mdopenspec/changes/archive/2026-06-09-consolidate-test-harnesses/specs/unified-e2e-harness/spec.mdopenspec/changes/archive/2026-06-09-consolidate-test-harnesses/tasks.mdopenspec/specs/cdc-lifecycle-e2e/spec.mdopenspec/specs/cdc-lifecycle-k8s-test/spec.mdopenspec/specs/inflight-staging-contention-e2e/spec.mdopenspec/specs/unified-e2e-harness/spec.md
💤 Files with no reviewable changes (9)
- dev-scripts/test-inflight-staging-contention-auto.sh
- nix/k8s-tests/README.md
- openspec/specs/inflight-staging-contention-e2e/spec.md
- nix/k8s-tests/flake-module.nix
- dev-scripts/test-cdc-lifecycle-auto.sh
- openspec/specs/cdc-lifecycle-e2e/spec.md
- dev-scripts/test-cdc-lifecycle-e2e.py
- openspec/specs/cdc-lifecycle-k8s-test/spec.md
- dev-scripts/test-inflight-staging-contention-e2e.py
Apply the valid CodeRabbit/Gemini review findings on the unified e2e
harness:
- catalog: fix the CONFIG_FILE-unset fallback, which resolved the repo
root one directory short (nix/ instead of the repo) and so looked for
config.nix under nix/nix/...; reuse harness_config.find_repo_root, which
walks up to flake.nix. Also harden the Nix-permutation accessors with
`or {}` / `or []` so an explicit null in config.nix cannot raise.
- db: url-decode percent-encoded MySQL/MariaDB credentials before passing
them to pymysql.connect.
- deployment: declare stop() and start() on the Deployment protocol (the
cdc-lifecycle phase calls them) and type run_subcommand's `extra` as
Optional[List[str]].
- local: open the harness and replica log files with encoding="utf-8".
- cdc_lifecycle: discard the unused migrate-chunks-to-nar output.
- docs/spec: use a non-contradictory topology example (local mode does
support multi-replica) and add a language to the README layout block.
The remaining review threads are devskim/CodeQL false positives —
intentional fixed-dev-port (127.0.0.1) access and the well-known dev
Garage fixture credentials (identical to dev-scripts/run.py), which are
redacted before logging.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Pages deployed to https://d196ca0b.ncps-docs.pages.dev |
Summary
Consolidates the three overlapping e2e harnesses —
nix/k8s-testsand the standalonedev-scripts/test-cdc-lifecycle-e2e.py/test-inflight-staging-contention-e2e.pydrivers — into one scenario-driven harness atnix/e2e-tests, exposed astask test:e2e/nix run .#e2e.--mode local|kubernetesselects the substrate.localdrives ncps throughdev-scripts/run.pyagainst the fixed-portnix run .#depsbackends;kubernetesprovisions a Kind cluster and installs the Helm chart.config.nix(kept as the source of truth, materialized vianix eval), with harness-onlyphase/modeskeys derived per entry. Runnix run .#e2e -- --list.Deploymentadapter (LocalDeployment) plus sharedClient/DBAccessrun theserve,cdc-lifecycle, andstaging-contentionphases unchanged across backends; kubernetes mode reuses the in-clusterNCPSTestervalidation.*-auto.shwrappers are deleted (parity verified live);k8s_tests.py/k8s_tests_tester.py/config.nixmove intonix/e2e-tests, folding the k8s backend into the unified package and deletingnix/k8s-tests(devshell now shipspackages.e2e).The harness is manual / opt-in and intentionally stays out of
nix flake check; no scenario is promoted unless proven under three minutes.Test plan
task fmt(0 changed),task lint(0 issues),task test(all ok)openspec validate— valid; change archived, specs syncedserve,cdc-lifecycle(5 phases + cross-cutting, 20 chunked NARs drained),staging-contention(download + chunking windows, staging activated, byte-identical) all PASSsingle-local-sqliteon Kind — cluster + infra + image build/push + Helm install +NCPSTester6/6 checks PASS