Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
94 changes: 94 additions & 0 deletions CONTEXT.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,74 @@ A persisted PNG file plus its manifest entry recording the **Screenshot Result**
The operation that invokes the renderer to produce the PNG, builds the **Screenshot Result**, and records the matching **Screenshot Artifact**.
_Avoid_: Renderer capture

**Hero Demo**:
A README-facing demonstration optimized for stable, polished visualization of real coding-agent TUI behavior. It may use an external outer recorder while keeping `agent-tty` responsible for the inner terminal proof artifacts.
_Avoid_: Treating it as the strongest self-dogfood proof.

**Outer Camera**:
The recorder responsible for the visible coding-agent TUI in a **Hero Demo**. It is presentation infrastructure and does not define the product proof by itself.
_Avoid_: Product proof

**Hero Demo Generator**:
The Node/TypeScript workflow that prepares **Manual Demo Regeneration** inputs, generates raw VHS tapes and runner scripts, invokes the **Outer Camera**, validates the **Curated Hero Artifact Set**, and records the run summary. The generated VHS tape owns screen waits and keypresses during recording.
_Avoid_: PTY proxy controller

**Real-Agent Hero Artifact**:
A checked-in **Hero Demo** artifact captured from a real coding-agent CLI such as Codex or Claude Code, rather than from a fixture or mock TUI. It is regenerated manually because it depends on local authentication and live service behavior.
_Avoid_: CI-generated demo, fixture demo

**Manual Demo Regeneration**:
An auth-gated maintainer workflow for refreshing **Real-Agent Hero Artifacts** outside CI. It should be reproducible from documented commands, but not required for ordinary test or release automation.
_Avoid_: CI gate

**Exploratory Hero Demo**:
A **Hero Demo** scenario where the real coding-agent TUI discovers the `agent-tty` skill and CLI, chooses its own command flow, and uses `agent-tty` to drive an inner Neovim workflow and export proof artifacts. The prompt supplies success criteria, required output paths, and a configurable fixed review window, but no prewritten helper script or exact command sequence.
_Avoid_: Nested Helper Proof, scripted helper run

**Curated Hero Artifact Set**:
The reviewer-facing artifact set for a **Hero Demo**: outer WebM, outer thumbnail or screenshot, outer transcript, inner `agent-tty` cast/WebM, final file proof, prompt, and summary. Raw logs and workspaces are debugging aids, not part of the curated set by default.
_Avoid_: Raw artifact dump

**Hero Demo Promotion Bar**:
The evidence threshold for replacing the README-facing demo with a new **Hero Demo**. For real coding-agent TUIs, this requires repeated local regeneration, visual review, and secret-leakage review before promotion.
_Avoid_: One-off smoke

**Hero Claim Boundary**:
The product claim made by the **Hero Demo**. The claim is that real coding-agent TUIs can use `agent-tty` to produce inner proof artifacts, not that `agent-tty` itself recorded the outer coding-agent TUI.
_Avoid_: Outer recording proof

**Promoted Hero Demo**:
A **Hero Demo** that has passed the **Hero Demo Promotion Bar** and replaced the README-facing demo. Once promoted, the old recursive bundle is removed rather than maintained as a separate proof path.
_Avoid_: Parallel recursive demo

**Hero Demo Leak Check**:
The promotion review for **Real-Agent Hero Artifacts** that combines automated text scanning of transcripts, logs, summaries, and generated text artifacts with human visual review of PNG/WebM outputs.
_Avoid_: Text-only secret scanning

**Debug-Only Raw Demo Files**:
Generated VHS tapes, recorder logs, and disposable workspaces from **Manual Demo Regeneration**. They are useful for debugging failed runs but are not part of the **Curated Hero Artifact Set** and should stay ignored by default.
_Avoid_: Promoted artifacts

**Promoted Hero Run Summary**:
The checked-in summary proving the **Hero Demo Promotion Bar** passed. It records three successful local regenerations for Codex and three for Claude, while only one selected **Curated Hero Artifact Set** per agent is promoted for README review.
_Avoid_: Checking in all trial runs

**Hero Demo Promotion Command**:
The maintainer-facing named mise task that runs **Manual Demo Regeneration** through the **Hero Demo Generator**, including smoke/debug modes and the promotion mode that requires the full **Hero Demo Promotion Bar**.
_Avoid_: Direct tsx-only workflow

**Hero Demo Partial Pass**:
A failed promotion state where one coding-agent TUI passes the required regeneration count and another does not. It does not produce a **Promoted Hero Demo**; both Codex and Claude must pass before README and canonical artifact replacement.
_Avoid_: Single-agent promotion

**Hero Demo UI Noise Policy**:
The rule for real coding-agent UI noise in **Real-Agent Hero Artifacts**: suppress or isolate known noise where practical, fail promotion on secrets or account-sensitive details, and tolerate benign product notices that do not dominate the recording.
_Avoid_: Post-hoc cosmetic editing as the default

**Recursive Dogfood Proof**:
A dogfood scenario where an outer `agent-tty` **Session** records a coding-agent TUI that creates an inner `agent-tty` **Session** and exports proof artifacts. It proves `agent-tty` can capture coding-agent TUIs, but is not required to be the primary polished demo.
_Avoid_: Hero Demo, README demo

**Release Prep Workflow**:
The maintainer-facing process for choosing the next release version and preparing release changes for review before they land on the default branch.

Expand Down Expand Up @@ -132,6 +200,29 @@ _Avoid_: bare "agent", "Coder agent"
- A **Waited Run** may produce one **Run Completion**, time out for its caller, or be interrupted by **Session** exit.
- Caller timeout does not cancel the underlying **Run Completion**; it may still be observed later to keep internal completion bytes out of artifacts.
- After **Session** exit, an unobserved **Run Completion** can no longer arrive.
- A **Promoted Hero Demo** replaces the existing recursive README demo entirely; the old recursive bundle is deleted rather than maintained in parallel.
- The **Hero Claim Boundary** narrows the README claim after that deletion: the outer TUI is presentation, while inner `agent-tty` artifacts are the product proof.
- An **Exploratory Hero Demo** is the preferred **Hero Demo** scenario because it shows the coding-agent TUI discovering the `agent-tty` skill and CLI before producing inner `agent-tty` proof artifacts.
- A **Curated Hero Artifact Set** is what README and reviewer docs link to; raw recorder logs remain local/debug artifacts unless explicitly promoted.
- The **Hero Demo Promotion Bar** for real Codex and Claude recordings is three successful local regenerations per agent, plus visual and secret-leakage review.
- VHS, ttyd, and ffmpeg are pinned in the named demo task, but ordinary CI does not install those live-demo-only recorder tools or regenerate real-agent **Hero Demo** artifacts.
- The **Hero Demo Generator** relies on task-pinned VHS, ttyd, and ffmpeg tools rather than transient unpinned local installations.
- The refactor to a **Promoted Hero Demo** is intended to land as one coherent change: generator, tool pins, promoted artifacts, README/catalog updates, manifest updates, and recursive-bundle deletion.
- A **Hero Demo Promotion Command** is exposed as a named mise task and delegates to the Node/TypeScript **Hero Demo Generator**.
- The **Hero Demo Generator** has default Codex/Claude model and effort settings, supports overrides, and records the resolved versions and settings in the **Promoted Hero Run Summary**.
- A **Hero Demo Partial Pass** fails promotion for the whole **Hero Demo**; Codex and Claude must both pass the promotion bar before replacement.
- A **Promoted Hero Run Summary** is checked in with the selected Codex and Claude **Curated Hero Artifact Sets**; extra successful trial outputs remain **Debug-Only Raw Demo Files**.
- A **Promoted Hero Demo** uses WebM as its primary outer video format and PNG thumbnails for README links.
- `dogfood/agent-uses-agent-tty/manifest.json` remains the canonical manifest for the **Curated Hero Artifact Set**, including sha256 and byte counts for promoted artifacts.
- **Debug-Only Raw Demo Files** remain ignored by default; promoted review links should point to the **Curated Hero Artifact Set**.
- A **Hero Demo Leak Check** combines automated text scanning with human visual review before promotion.
- A **Hero Demo UI Noise Policy** is applied during promotion review: names, emails, billing/account lines, auth warnings, tokens, and absolute home paths block promotion; generic update or product notices may remain if they do not dominate.
- A **Real-Agent Hero Artifact** is produced by **Manual Demo Regeneration**, not by CI, because real coding-agent CLIs require local authentication and may show account-, update-, or service-dependent UI.
- A **Hero Demo Generator** generates raw VHS tapes and runner scripts, invokes VHS as the **Outer Camera**, and validates the resulting **Curated Hero Artifact Set**.
- In a **Hero Demo**, the generated VHS tape owns screen waits and keypresses during recording; the **Hero Demo Generator** owns setup, invocation, artifact validation, and summary reporting around that tape.
- A **Hero Demo** may use a recorder outside `agent-tty` for the outer coding-agent TUI, but the demonstrated product proof remains the inner `agent-tty` artifact set.
- A **Recursive Dogfood Proof** uses `agent-tty` for both the outer coding-agent recording and the inner terminal proof artifacts.
- A **Hero Demo** and a **Recursive Dogfood Proof** may cover the same scenario, but they optimize for different review questions: presentation stability versus self-dogfood coverage.
- A **Release Prep Workflow** produces local, reviewable release changes before any release tag exists.
- A **Release Finalization Step** happens after the **Release Prep Workflow** has landed on the default branch.
- A **Release Finalization Step** produces the release tag consumed by the **Publish Pipeline**.
Expand Down Expand Up @@ -170,4 +261,7 @@ _Avoid_: bare "agent", "Coder agent"

- "Active" and "offline replay eligible" are independent classifications: `destroying` is both **Active** and **Offline Replay Eligible**.
- "Running Session Target" was used during design discussion, but the canonical term is **Command Target** because commandability is the policy being resolved.
- "controller" was used during design discussion for the Hero Demo, but the canonical term is **Hero Demo Generator** because the settled design generates raw VHS tapes rather than actively proxying PTY control during recording.
- "helper proof" was used during design discussion, but the canonical scenario is now **Exploratory Hero Demo**: success criteria and output paths are fixed, while the coding agent chooses the command flow inside a configurable fixed review window.
- "demo" and "proof" are not interchangeable for coding-agent recordings: a **Hero Demo** optimizes for stable presentation, while a **Recursive Dogfood Proof** optimizes for self-dogfood coverage.
- "agent" is overloaded across three referents: this project's **Triage Agent** (a Claude Code instance), Coder's **Coder workspace agent** (the SSH/exec daemon), and a generic AFK implementation agent (the actor on `ready-for-agent` issues — Phase 2 of the triage pipeline). Always qualify in code comments and docs.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,13 +72,13 @@ For prerelease channels, tarball installs, authenticated GitHub Release installs

## Agent Demo

This dogfood bundle records Codex and Claude interactive TUIs using `agent-tty` to drive `nvim --clean`, write a file, export proof artifacts, and exit cleanly. The linked WebMs are trimmed review cuts; the bundle also keeps untrimmed outer recordings.
This dogfood bundle uses VHS as the outer camera for real Codex and Claude interactive TUIs while each agent explores the `agent-tty` skill/CLI, drives `nvim --clean`, writes a file, and exports inner proof artifacts.

| Codex | Claude |
| -------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------- |
| [![Codex agent-tty demo](./dogfood/agent-uses-agent-tty/artifacts/codex-thumbnail.png)](./dogfood/agent-uses-agent-tty/artifacts/codex-outer.webm) | [![Claude agent-tty demo](./dogfood/agent-uses-agent-tty/artifacts/claude-thumbnail.png)](./dogfood/agent-uses-agent-tty/artifacts/claude-outer.webm) |

See [`dogfood/agent-uses-agent-tty/`](./dogfood/agent-uses-agent-tty/) for the reproducer, inner Neovim recordings, transcripts, and final file proofs.
See [`dogfood/agent-uses-agent-tty/`](./dogfood/agent-uses-agent-tty/) for the Hero Demo reproducer, outer transcripts, inner Neovim recordings, and final file proofs.

## Common Usage

Expand Down
52 changes: 52 additions & 0 deletions docs/adr/0005-use-external-outer-camera-for-agent-hero-demo.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# ADR 0005: Use an external outer camera for the coding-agent Hero Demo

Date: 2026-05-20

## Status

Accepted

## Context

The current README-facing coding-agent demo is the `dogfood/agent-uses-agent-tty/` recursive flow: an outer `agent-tty` session records Codex or Claude Code, and the real coding agent creates an inner `agent-tty` session to drive Neovim and export proof artifacts.

That flow is strong self-dogfood, but it couples the README visual to multiple unstable layers at once: live coding-agent TUI behavior, startup/trust prompts, update notices, nested terminal sessions, renderer timing, exit behavior, thumbnail timing, and WebM post-processing. The resulting artifact proves a lot, but it is not the most stable or polished way to show the user-facing story.

A VHS prototype showed that VHS can launch and record real Codex and Claude TUIs, wait on screen regexes, capture PNG screenshots, and render compact WebM/ASCII artifacts. The first implementation pass used a deterministic helper script, but review of the resulting recordings showed that it hid the most interesting part of the story: how a coding agent discovers and uses `agent-tty` in the wild. The chosen scenario is therefore exploratory, with fixed success criteria, artifact paths, and a configurable fixed review window but no prewritten command script.

## Decision

The README-facing coding-agent demo will become a **Hero Demo** that uses an external **Outer Camera** for the visible coding-agent TUI, while keeping `agent-tty` responsible for the inner proof artifacts.

Concretely:

- `dogfood/agent-uses-agent-tty/` remains the canonical path, but its existing recursive contents are replaced wholesale by the new **Promoted Hero Demo**.
- The old recursive `agent-tty`-records-the-agent bundle is deleted after replacement, rather than maintained as a parallel proof path.
- The README claim is narrowed: the outer Codex/Claude TUI is presentation; the product proof is that the real coding-agent TUI uses `agent-tty` to produce inner terminal proof artifacts.
- The scenario is an **Exploratory Hero Demo**: the real agent loads the `agent-tty` skill, inspects the CLI as needed, chooses its own command flow, drives Neovim through `agent-tty`, and exports inner artifacts to required paths.
- A Node/TypeScript **Hero Demo Generator** prepares workspaces, prompts, runner scripts, and raw VHS tapes; invokes VHS as the **Outer Camera**; validates artifacts; performs automated leak checks; and writes the run summary.
- The generated VHS tape owns startup waits, a configurable fixed review window, and exit keypresses during recording. Raw tapes, logs, and disposable workspaces stay debug-only and ignored by default.
- VHS, ttyd, and ffmpeg are pinned in the named demo task, not installed by ordinary CI or used to regenerate real-agent artifacts there.
- The refactor lands as one coherent change: generator, tool pins, promoted artifacts, README/catalog updates, manifest updates, and recursive-bundle deletion.
- Regeneration is exposed through a named mise demo task that delegates to the Node/TypeScript generator.
- Codex/Claude model and effort settings have defaults, support maintainer overrides, and are recorded with tool versions in the promoted run summary.
- The promoted artifact set is curated: one selected run per agent, WebM outer recording, PNG thumbnail/screenshot, outer transcript, inner `agent-tty` cast/WebM, final file proof, prompt, summary, and canonical `manifest.json` entries with sha256 and byte counts.
- A partial pass is not promotable: Codex and Claude must both pass before README and canonical artifacts are replaced.
- Promotion requires three successful local regenerations for Codex and three for Claude, automated text leak checks, and human visual review of PNG/WebM outputs.

## Considered options

- **Keep the recursive `agent-tty` outer recording as the README demo.** This preserved the strongest self-dogfood story, but kept the README artifact exposed to the most timing and UI flake. Rejected because the README demo should optimize for stable presentation.
- **Maintain both a Hero Demo and a recursive dogfood proof.** This preserved self-dogfood coverage, but doubled maintenance for a live-agent scenario and kept the repo carrying two similar bundles. Rejected in favor of deleting the recursive bundle and narrowing the README claim.
- **Use an active PTY proxy controller with VHS only as the camera.** This would handle optional prompts and cleanup more robustly, but adds more implementation complexity than the current goal requires. Rejected for the first refactor; repeated-run promotion is the guard against raw-tape brittleness.
- **Use fixture/mock TUIs for deterministic visuals.** This would produce the cleanest recordings, but would no longer show real Codex or Claude using `agent-tty`. Rejected because the Hero Demo should remain a real-agent artifact.
- **Use a deterministic helper script for the inner proof.** This was more reliable and easy to validate, but made the outer recording look like an agent executing a canned script instead of discovering the skill and CLI. Rejected after prototype review because the Hero Demo should show the workflow in the wild.
- **Use asciinema/agg instead of VHS.** The tuicr reference demonstrates this shape well. Rejected for now because the VHS prototype proved sufficient for real TUIs and provides a convenient path to polished WebM/PNG outputs.

## Consequences

- Future readers should not interpret the README coding-agent demo as proof that `agent-tty` recorded the outer coding-agent TUI. It demonstrates that real coding-agent TUIs can use `agent-tty` to generate inner proof artifacts.
- The canonical `agent-uses-agent-tty` manifest remains important: it locks the promoted curated artifacts even though raw regeneration files stay ignored.
- Real-agent demo regeneration remains auth-gated and manual. CI can validate checked-in metadata and normal package behavior, but it must not require Codex or Claude credentials.
- Strict leak review becomes part of promotion. Names, emails, billing/account lines, auth warnings, tokens, and absolute home paths block promotion; generic update or product notices may remain if they do not dominate the recording.
- If raw VHS tapes fail the three-run promotion bar due to optional prompts or UI drift, the design can evolve to an active PTY proxy controller without changing the Hero Demo claim boundary.
Loading
Loading