feat: replace coding-agent hero demo#105
Conversation
|
/coder-agents-review |
There was a problem hiding this comment.
This is a first-pass (Netero) mechanical review only. The full review panel has not yet reviewed this PR. The panel will review after these findings are addressed.
The change replaces the recursive coding-agent demo with a VHS-based Hero Demo Generator, adds deterministic unit tests, pins demo tools, and promotes a curated artifact set. The artifact structure is well-organized, the leak-check pipeline is thoughtful, and the test coverage on the deterministic boundary is solid.
Severity breakdown: 1 P2, 2 P3, 1 Nit, 1 Note.
The P2 is the main item: RunRecord.passed promises graceful partial-failure handling, but the implementation hard-aborts on any run failure, making the field and its downstream filters dead logic. Either make partial failure real (try/catch in the run loop) or remove the dead field.
"Every
RunRecordthat reaches therecordsarray haspassed: true. The filter can never exclude anything. The summary renderer formats'pass'/'fail'per record, but'fail'is unreachable." (Netero)
🤖 This review was automatically generated with Coder Agents.
|
/coder-agents-review |
|
/coder-agents-review |
There was a problem hiding this comment.
AGTTY-1 (P2, RunRecord.passed dead logic) was addressed in 3d3958a with try/catch in the run loop and a failedRunRecord() factory. Good fix.
However, further review is blocked until the author responds to or pushes fixes for the remaining findings:
- AGTTY-2 (P3) Local
invariantat line 97 duplicatessrc/util/assert.tsexport. Still present, no response. - AGTTY-3 (P3)
mkDebugRootunused_bundleDirparameter at line 1061. Still present, no response. - AGTTY-4 (Nit)
CuratedArtifactandRunRecordexported with no external consumers. Still present, no response. - AGTTY-5 (Note) Post-promotion leak check tautological for pre-sanitized text artifacts. No response.
For each: fix in this PR, or respond on the thread explaining why the current code is acceptable. The full review panel will run once all findings have a response.
🤖 This review was automatically generated with Coder Agents.
There was a problem hiding this comment.
First full panel review (11 reviewers). All five prior findings (AGTTY-1 through AGTTY-5) are resolved. The architecture is sound: VHS as the outer camera, the claim boundary is well-documented, and the generator's pure/IO split is clean.
Severity breakdown: 1 P2, 8 P3, 3 Nit (Nits not posted as inline comments).
The P2 is a sanitizer/leak-checker regex divergence that will cause spurious promotion failures with real agent greetings. The P3s cover test gaps, magic numbers, usage string errors, type duplication, and missing schema validation.
"The sanitizer demands a trailing ! that the leak detector does not. Real agent greetings without ! survive sanitization, then the leak check flags them, and promotion aborts for a 'leak' the sanitizer was supposed to have removed." (Hisoka)
Nits not posted inline: AGTTY-6 (safeDemoToolVersion duplicates safeToolVersion), AGTTY-16 (no doc comments on exported functions), AGTTY-17 (duplicate mise.toml comment about mise install --locked), AGTTY-18 (sentence vs expectedText naming inconsistency).
🤖 This review was automatically generated with Coder Agents.
|
/coder-agents-review |
There was a problem hiding this comment.
Second panel round (8 reviewers). All 9 prior findings (AGTTY-7 through AGTTY-15) independently verified fixed by 4+ reviewers. The sanitizer regex fix, named constants, schema validation, type deduplication, and usage string correction are all solid.
No P0-P2 findings remain. 4 new P3 observations noted inline for future improvement; none block merge.
The architecture is clean: VHS as outer camera, agent-tty as inner proof, hero claim boundary well-documented. The generator's pure/IO split is thoughtful, and the deterministic test boundary covers all planning and validation logic. The fix-chain quality across rounds has been strong: each commit targeted root causes, no regressions introduced.
"Non-promote runs never clean debug roots. If you run 50 debug iterations, /tmp grows by 50 disposable workspaces. The flag exists, but it only fires in the code path where you'd want to keep them." (Hisoka)
🤖 This review was automatically generated with Coder Agents.
|
/coder-agents-review |
There was a problem hiding this comment.
Third panel round (5 reviewers). All 4 prior findings (AGTTY-19 through AGTTY-22) verified fixed by 3+ reviewers. No P0-P2 findings remain. No new P3+ from the panel.
One Netero observation (AGTTY-25): quote() has a latent shell-quoting bug for inputs containing single quotes. No current caller triggers it (all callers pass paths and model names without single quotes), so it's noted inline but does not block.
The PR has gone through 5 rounds across 25 findings. 17 fixed by author, 1 acknowledged, 6 dropped by orchestrator (Nits/Notes below posting threshold), 1 latent observation noted. Fix quality has been consistently strong throughout.
🤖 This review was automatically generated with Coder Agents.
|
/coder-agents-review |
There was a problem hiding this comment.
Fourth panel round (5 reviewers). AGTTY-25 (quote() shell-quoting bug) verified fixed in f208273. Netero and 4 of 5 panel reviewers found no new issues. Bisky flagged 2 P3 test coverage observations inline; neither blocks merge.
All 25 prior findings resolved across 6 rounds (18 fixed, 1 acknowledged, 6 dropped). The fix chain has been clean throughout, with each commit targeting root causes and no regressions introduced.
The two new P3s are about test assertions that don't cover key behavioral lines: the tape's runner invocation and the runner's artifact-path env vars. Both gaps mean you could delete the critical lines and tests would still pass. Worth adding, but the live-demo workflow catches both immediately.
🤖 This review was automatically generated with Coder Agents.
|
/coder-agents-review |
There was a problem hiding this comment.
Fifth panel round (5 reviewers). AGTTY-26/27 (test assertion gaps) verified fixed in 2cc286b. Netero and all 5 panel reviewers found zero new findings.
All 27 findings across 7 rounds are resolved: 20 fixed by author, 1 acknowledged, 6 dropped by orchestrator. No open items remain. Fourth consecutive APPROVE.
🤖 This review was automatically generated with Coder Agents.
Summary
agent-ttycast/WebM proofs, prompts, final file proofs, and manifest hashes.gpt-5.5and Claudeclaude-opus-4-7defaults for the promoted demo, with CLI/env overrides for maintainers.User-facing / automation-facing behavior
dogfood/agent-uses-agent-tty/Hero Demo bundle.mise run demo:agent-uses-agent-tty -- --agent both --runs 3 --record-seconds 180 --promoteregenerates and promotes the real-agent demo when local Codex/Claude auth is available.Validation
mise run format-checkmise run lintmise run typechecknpx vitest run test/unit/tools/hero-demo.test.tsnpm run validate-bundle:canonicalfile dogfood/agent-uses-agent-tty/artifacts/*-thumbnail.png dogfood/agent-uses-agent-tty/artifacts/*-outer.webm dogfood/agent-uses-agent-tty/artifacts/*-inner-nvim.webmmise run buildmise run install-smokeGIT_CONFIG_GLOBAL=/dev/null mise run testNote: plain
mise run testin this workspace hits the user's global Git SSH signing config, which points at a missing 1Password signing binary. The full test suite passed withGIT_CONFIG_GLOBAL=/dev/null.Dogfood / visual artifacts
dogfood/agent-uses-agent-tty/README.mddogfood/agent-uses-agent-tty/manifest.jsondogfood/agent-uses-agent-tty/promoted-run-summary.mddogfood/agent-uses-agent-tty/artifacts/codex-outer.webm,dogfood/agent-uses-agent-tty/artifacts/claude-outer.webmdogfood/agent-uses-agent-tty/artifacts/codex-thumbnail.png,dogfood/agent-uses-agent-tty/artifacts/claude-thumbnail.pngdogfood/agent-uses-agent-tty/artifacts/*-inner-nvim.cast,dogfood/agent-uses-agent-tty/artifacts/*-inner-nvim.webm,dogfood/agent-uses-agent-tty/artifacts/*-final-file-proof.txtDesign-doc deviations
None. The implementation follows the PRD/ADR direction: VHS is the external Outer Camera,
agent-ttyremains responsible for the inner proof artifacts, and the old recursive README bundle is removed.📋 Implementation Plan
PRD: External Outer Camera for the Coding-Agent Hero Demo
Problem Statement
The current coding-agent demo is valuable self-dogfood, but it is too unstable and visually noisy for the README-facing story. It asks
agent-ttyto record an outer Codex or Claude Code TUI while that real coding agent creates an inneragent-ttysession and exports proof artifacts. This recursive setup couples too many unstable layers: live coding-agent UI behavior, trust prompts, update notices, nested terminal sessions, renderer timing, exit behavior, thumbnail timing, and recording post-processing.As a maintainer or reviewer, I want the README demo to be a polished, stable Hero Demo that shows real coding-agent TUIs using
agent-ttyto produce inner proof artifacts. I do not need the README visual to prove thatagent-ttyrecorded the outer coding-agent TUI. That stronger self-dogfood claim makes the demo harder to regenerate and harder to review.Solution
Replace the recursive README-facing demo with a Promoted Hero Demo that uses an external Outer Camera for the visible Codex and Claude Code TUIs, while keeping
agent-ttyresponsible for the inner terminal proof artifacts.The new flow will use a Node/TypeScript Hero Demo Generator. The generator prepares isolated inputs for Manual Demo Regeneration, creates runner scripts and raw VHS tapes, invokes VHS as the Outer Camera, validates the resulting Curated Hero Artifact Set, runs automated text leak checks, records tool/model settings, and writes a Promoted Hero Run Summary.
The scenario is an Exploratory Hero Demo: the real coding-agent TUI loads the packaged
agent-ttyskill, inspects the CLI as needed, chooses its own command flow, drives an inner Neovim workflow throughagent-tty, and exports inner proof artifacts. The prompt supplies success criteria, required output paths, final text, and a configurable fixed review window, but no prewritten helper script or exact command sequence. The generated VHS tape owns startup waits, the configurable fixed review window, and exit keypresses during recording. Raw VHS tapes, recorder logs, and disposable workspaces are Debug-Only Raw Demo Files and are ignored by default.The existing recursive bundle is removed once the new Hero Demo is promoted. The README claim is narrowed through the Hero Claim Boundary: the outer TUI is presentation, and the inner
agent-ttyartifacts are the product proof.User Stories
agent-tty, so that I quickly understand the coding-agent workflow.agent-tty, so that I understand the workflow is not tied to one coding-agent CLI.agent-ttyproduced the inner proof artifacts, so that I understand what the product is proving.agent-ttyrecorded the outer coding-agent TUI, so that I do not misunderstand the proof boundary.agent-ttycast artifacts, so that reviewers can inspect terminal replay data from the product-generated proof.agent-ttyWebM artifacts, so that reviewers can visually review the inner terminal workflow.agent-tty, so that I do not reintroduce the recursive flow accidentally.agent-ttyhome or editor state.agent-ttysessions it does not own, so that demo regeneration remains safe.Implementation Decisions
agent-ttyskill and CLI, chooses its own command flow, drives an inner Neovim workflow throughagent-tty, and exports the inner proof artifacts to required paths.agent-tty-records-the-agent flow after the new Hero Demo is promoted.agent-ttyartifact set is the product proof.agent-ttycast, inneragent-ttyWebM, final file proof, prompt, summary, and canonical manifest entries.agent-tty. Those tools are for Manual Demo Regeneration.Testing Decisions
Out of Scope
agent-ttycoding-agent recording as a maintained parallel proof path.agent-ttycan record the outer coding-agent TUI.agent-ttyCLI JSON contracts, protocol schemas, or artifact formats for end users.agent-ttyrecording/export behavior beyond what the demo generator needs.Further Notes
The VHS prototype passed a one-run-per-agent smoke: VHS launched real Codex and Claude Code TUIs, waited on screen regexes, captured PNG screenshots, and rendered compact WebM/ASCII evidence. That prototype proved feasibility, not promotion-level reliability. The promotion bar remains three successful local regenerations per agent plus automated leak checks and human visual review.
The implementation should keep the current ADR and glossary terminology aligned with the resulting code and docs. In particular, use Hero Demo Generator rather than “controller”, and use Exploratory Hero Demo rather than “Nested Helper Proof” unless the implementation intentionally returns to a deterministic helper design.
If raw VHS tapes fail the promotion bar due to optional prompts or UI drift, the design can evolve to an active PTY proxy controller while preserving the same Hero Claim Boundary and curated artifact policy.
Generated with
mux• Model:openai:gpt-5.5• Thinking:xhigh