diff --git a/CHANGELOG.md b/CHANGELOG.md index 1be8060..c309bc7 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -5,6 +5,8 @@ ## [Unreleased] ### Added +- `docs/system-design.md` — Harness architecture maps for current Claude Code runtime boundaries, data flow, agent topology, model policy, and extension points +- `docs/hermes-skill-translation-plan.md` — adapter plan for translating Harness concepts into Hermes skills, profiles, delegation patterns, and validation contracts - 신규 에이전트/스킬 생성 전 중복 검토 단계 (Phase 3-0, Phase 4-0) - `references/agent-design-patterns.md` "에이전트 재사용 설계" 섹션 - `references/skill-writing-guide.md` §9 "스킬 재사용 설계" diff --git a/README.md b/README.md index b64e786..c38c429 100644 --- a/README.md +++ b/README.md @@ -115,6 +115,14 @@ harness/ └── README.md ``` +## Architecture Docs + +For contributors reviewing the system boundary rather than running the 5-minute +quickstart: + +- [`docs/system-design.md`](docs/system-design.md) — current Harness architecture maps: distribution, trigger, generated project harness, Claude Code runtime boundary, data flow, model policy, and extension points. +- [`docs/hermes-skill-translation-plan.md`](docs/hermes-skill-translation-plan.md) — adapter plan for translating Harness concepts into Hermes skills, profiles, delegation patterns, and validation contracts. + ## Usage Trigger in Claude Code with prompts like: diff --git a/docs/hermes-skill-translation-plan.md b/docs/hermes-skill-translation-plan.md new file mode 100644 index 0000000..ecf24a4 --- /dev/null +++ b/docs/hermes-skill-translation-plan.md @@ -0,0 +1,386 @@ + + +# Hermes Skill Translation Plan for Harness + +> Status: adapter plan, not current Harness runtime behavior. +> Scope: translate the Harness concept into Hermes-compatible skills, profiles, +> and orchestration patterns without smuggling Claude Code assumptions into +> Hermes. + +## 1. Working Decision + +Port Harness to Hermes as a design-time skill factory, not as a literal +`TeamCreate` clone. + +The Claude Code version writes `.claude/agents`, `.claude/skills`, and a +`CLAUDE.md` pointer. The Hermes version should write Hermes skills, profile or +agent specs, orchestration plans, and workspace artifacts using Hermes-native +contracts. + +| Decision | Rationale | +|---|---| +| Use one umbrella Hermes skill first. | Avoid ten cute micro-skills and keep the workflow coherent. | +| Keep runtime adapters explicit. | Claude Code, Hermes, Codex, and DeepSeek-backed workers do not share the same primitives. | +| Prefer parent-orchestrated fan-out/fan-in for Hermes. | Hermes `delegate_task` children are isolated and do not freely message each other. | +| Preserve `_workspace/`. | File-based handoff is runtime-independent and auditable. | +| Replace `model: "opus"` with capability/model policy. | Hermes routes through profiles/providers such as OpenAI Codex and DeepSeek. | + +## 2. Confirmed Current State + +| Harness concept | Current implementation | +|---|---| +| Distribution | Claude plugin manifests in `.claude-plugin/` | +| Trigger skill | `skills/harness/SKILL.md` with Korean-first description/body | +| Agent definitions | Generated under `.claude/agents/*.md` | +| Skill definitions | Generated under `.claude/skills/*/SKILL.md` | +| Team runtime | Claude Code Agent Teams API: `TeamCreate`, `SendMessage`, `TaskCreate` | +| Lightweight subagents | Claude Code `Agent` tool with `run_in_background` | +| Project pointer | `CLAUDE.md` trigger + change history | +| Durable handoff | `_workspace/phase_agent_artifact.md` | +| Model policy | Hard-coded guidance to use `model: "opus"` | + +## 3. Hermes Translation Map + +| Harness / Claude Code | Hermes equivalent | Gap | Recommendation | +|---|---|---|---| +| `.claude-plugin/plugin.json` | Hermes Skills Hub/tap metadata or profile distribution metadata | Hermes does not load Claude plugin manifests. | Preserve metadata in skill frontmatter and optional distribution docs. | +| `skills/harness/SKILL.md` | `skills/autonomous-ai-agents/harness/SKILL.md` or user-local `~/.hermes/skills/...` | Current repo path lacks Hermes category placement. | For Hermes core/tap usage, add category layout and peer metadata. | +| Minimal frontmatter | Hermes peer frontmatter | Hermes expects richer conventions even if validator only requires name/description. | Add `version`, `author`, `license`, `metadata.hermes.tags`, `related_skills`. | +| Korean-only body | English Hermes umbrella skill with multilingual triggers | Discoverability suffers for non-Korean Hermes users. | Translate core body; preserve Korean/Japanese trigger phrases. | +| `.claude/agents/*.md` | Hermes profile specs, `SOUL.md` fragments, or delegate prompt templates | `delegate_task` does not load `.claude/agents`. | Generate persistent profile templates or ephemeral child prompts. | +| `.claude/skills/*` | Hermes skills under profile/user/repo skill tree | Different layout and validation rules. | Generate Hermes `SKILL.md` files with category and metadata. | +| `TeamCreate` | `delegate_task(tasks=[...])`, profile fleet, or Kanban workers | No native child-to-child team chat in simple delegation. | Default to parent-mediated fan-out/fan-in. Use Kanban/profile workers for durable teams. | +| `SendMessage` | Parent synthesis, shared files, Kanban/task comments | Leaf subagents cannot message each other. | Require file artifacts and parent aggregation. | +| `TaskCreate` / `TaskUpdate` | `todo`, Kanban, or `_workspace/tasks.json` | `delegate_task` batch has no shared board. | Use parent-owned task state or Kanban for long-lived lanes. | +| `Agent(..., run_in_background)` | `delegate_task` batch or `terminal(background=True)` for shell jobs | Delegate children are synchronous to the parent turn. | Use delegate batch for bounded reasoning; background terminal/cron for durable jobs. | +| `CLAUDE.md` pointer | Hermes project docs, profile distribution, profile description, optional `SOUL.md` | Hermes will not automatically treat `CLAUDE.md` as control plane. | Use profile/distribution metadata; avoid mutating `SOUL.md` unless creating a profile. | +| `model: "opus"` | Hermes provider/model/profile config | Hard-coded Anthropic names are wrong for Codex/DeepSeek. | Use model classes: orchestrator, builder, evaluator, researcher. Bind via profile config. | +| `Explore` / `Plan` / `general-purpose` | Hermes toolsets + profile personas | Tool access model differs. | Map to `file` read-only, `web` research, `terminal+file` implementation/QA. | +| QA agent | Hermes QA child with `terminal` + `file` toolsets | Same methodology, different tool syntax. | Parent must verify QA results because subagent summaries are self-reports. | +| Trigger evals | Fresh Hermes sessions + direct validator scripts | Current session skill loader is cached. | Test in a new Hermes session or validate files directly. | + +## 4. Proposed Hermes Skill Layout + +The first useful port should be one umbrella skill with references, not a swarm +of underbaked children. + +```text +skills/autonomous-ai-agents/harness/ +├── SKILL.md +├── references/ +│ ├── hermes-runtime-mapping.md +│ ├── architecture-patterns.md +│ ├── orchestrator-templates.md +│ ├── generated-skill-authoring.md +│ ├── testing-and-evaluation.md +│ └── qa-agent-guide.md +├── templates/ +│ ├── generated-skill.md +│ ├── delegate-task-plan.md +│ ├── profile-spec.yaml +│ └── run-manifest.json +└── scripts/ + └── validate_generated_harness.py +``` + +### `SKILL.md` frontmatter shape + +```yaml +--- +name: harness +description: Use when designing or maintaining a Hermes agent harness: agent teams, specialist profiles, generated skills, orchestration workflows, delegate_task plans, profile distributions, QA loops, or translations from Claude Code Harness. Trigger on "build a harness", "agent team", "skill factory", "하네스 구성", and "ハーネス" requests. +version: 1.0.0 +author: Hermes Agent +license: Apache-2.0 +metadata: + hermes: + tags: [harness, multi-agent, orchestration, skill-authoring, profiles] + related_skills: [hermes-agent, hermes-agent-skill-authoring, dispatching-parallel-agents] +--- +``` + +Keep the body around 8-15k characters. Put the heavy Claude comparison and +pattern catalog in `references/`. + +## 5. Hermes Adapter Architecture + +```mermaid +flowchart TB + User["user request\nbuild/maintain a harness"]:::sensor + + subgraph HermesSkill["Hermes harness umbrella skill"] + Trigger["Use when... trigger description"]:::truth + Workflow["audit → design → generate → validate"]:::agent + Mapping["Hermes runtime mapping reference"]:::recall + Templates["skill/profile/delegate templates"]:::recall + end + + subgraph HermesRuntime["Hermes runtime boundary"] + Parent["parent orchestrator agent\nOpenAI Codex or DeepSeek-backed profile"]:::agent + Delegate["delegate_task batch\nbounded parallel children"]:::worker + Profiles["Hermes profiles\npersistent specialists"]:::agent + Kanban["Kanban / task lane\nlong-lived routing"]:::bus + Tools["file / terminal / web / skills toolsets"]:::worker + end + + subgraph Generated["generated Hermes harness"] + Skills["Hermes SKILL.md files\ncategory + metadata"]:::agent + ProfileSpecs["profile specs / SOUL fragments"]:::agent + Plan["delegate-task-plan.md"]:::truth + Workspace["_workspace/ artifacts + run manifest"]:::store + Reports["QA + trigger eval reports"]:::audit + end + + User --> Trigger + Trigger --> Workflow + Workflow --> Mapping + Workflow --> Templates + Workflow --> Parent + Parent --> Delegate + Parent --> Profiles + Parent --> Kanban + Delegate --> Tools + Profiles --> Tools + Kanban --> Tools + Tools --> Skills + Tools --> ProfileSpecs + Tools --> Plan + Tools --> Workspace + Tools --> Reports + + classDef sensor fill:#dbeafe,stroke:#2563eb,color:#111827; + classDef truth fill:#ffe4e6,stroke:#e11d48,color:#111827; + classDef agent fill:#fce7f3,stroke:#db2777,color:#111827; + classDef recall fill:#ede9fe,stroke:#7c3aed,color:#111827; + classDef worker fill:#fef9c3,stroke:#ca8a04,color:#111827; + classDef bus fill:#ffedd5,stroke:#f97316,color:#111827; + classDef store fill:#dcfce7,stroke:#16a34a,color:#111827; + classDef audit fill:#e5e7eb,stroke:#374151,color:#111827; +``` + +## 6. Hermes Workflow Translation + +```mermaid +flowchart TD + A["Phase 0\naudit current Hermes skills/profiles/workspace"]:::worker + A --> B["Phase 1\ndomain + project analysis"]:::worker + B --> C["Phase 2\nchoose runtime pattern"]:::truth + C --> D{"runtime pattern"}:::truth + + D -->|bounded parallel| E["delegate_task fan-out/fan-in"]:::worker + D -->|persistent specialists| F["profile distribution"]:::agent + D -->|long-lived lanes| G["Kanban/profile workers"]:::bus + D -->|single specialist| H["direct skill/profile prompt"]:::agent + + E --> I["generate Hermes SKILL.md files"]:::agent + F --> I + G --> I + H --> I + + I --> J["generate profiles or prompt templates"]:::agent + J --> K["write orchestration plan + run manifest"]:::truth + K --> L["validate frontmatter, links, triggers, dry run"]:::worker + L --> M["commit/version/distribute"]:::action + + classDef worker fill:#fef9c3,stroke:#ca8a04,color:#111827; + classDef truth fill:#ffe4e6,stroke:#e11d48,color:#111827; + classDef agent fill:#fce7f3,stroke:#db2777,color:#111827; + classDef bus fill:#ffedd5,stroke:#f97316,color:#111827; + classDef action fill:#fef3c7,stroke:#d97706,color:#111827; +``` + +## 7. Runtime Pattern Mapping + +| Harness pattern | Hermes default | Hermes advanced | Notes | +|---|---|---|---| +| Pipeline | Sequential parent steps; pass files between steps. | Dedicated profile per stage. | Keep artifacts in `_workspace/`. | +| Fan-out / fan-in | `delegate_task(tasks=[...])` batch. | Kanban worker lane per perspective. | Parent synthesizes; children do not chat. | +| Expert pool | Conditional delegate/profile route. | Profile fleet with skill tags. | Route only the needed experts. | +| Producer-reviewer | Producer delegate then reviewer delegate. | Iterative evaluator/evolver profile. | Parent gates retries. | +| Supervisor | Parent orchestrator owns task state. | Kanban/profile control plane. | Avoid pretending children share a task board. | +| Hierarchical delegation | Avoid by default. | `delegate_task(role="orchestrator")` only when nested delegation is intentionally enabled. | Cost and context blow up fast. | + +## 8. Model and Provider Policy + +The Hermes port should not hard-code `opus`. It should specify capability tiers +and bind them through profiles/config. + +```mermaid +flowchart LR + Task["agent task spec"]:::truth --> Policy["model policy"]:::model + Policy --> Codex["OpenAI Codex\ngpt-5.5 / coding tasks"]:::modelCloud + Policy --> DeepSeek["DeepSeek\nresearch / coding alternative"]:::modelCloud + Policy --> Cheap["cheap/fast model\ntrigger evals, lint summaries"]:::model + Policy --> None["no LLM\nfrontmatter/link/schema checks"]:::worker + + Codex --> Audit["model call note\nprofile + purpose"]:::audit + DeepSeek --> Audit + Cheap --> Audit + None --> Audit + + classDef truth fill:#ffe4e6,stroke:#e11d48,color:#111827; + classDef model fill:#e0e7ff,stroke:#4f46e5,color:#111827; + classDef modelCloud fill:#fae8ff,stroke:#9333ea,color:#111827; + classDef worker fill:#fef9c3,stroke:#ca8a04,color:#111827; + classDef audit fill:#e5e7eb,stroke:#374151,color:#111827; +``` + +Recommended tiers: + +| Tier | Use | Example binding | +|---|---|---| +| Orchestrator | Decomposition, synthesis, final decisions | `orchestrator` profile on OpenAI Codex. | +| Builder | File creation, code/doc implementation | OpenAI Codex or DeepSeek coding model. | +| Researcher | Repo/web/doc reconnaissance | DeepSeek or Codex with `web`/`file` tools. | +| Evaluator | QA, trigger tests, schema checks | Codex/DeepSeek plus deterministic scripts. | +| Deterministic | Frontmatter, links, file existence, JSON schema | Python scripts; no model. | + +## 9. Generated Hermes Contracts + +### Generated skill contract + +```yaml +generated_skill: + path: skills///SKILL.md + frontmatter: + required: + - name + - description + peer_recommended: + - version + - author + - license + - metadata.hermes.tags + - metadata.hermes.related_skills + limits: + description_max_chars: 1024 + file_max_chars: 100000 + body_required: + - overview + - when_to_use + - workflow + - pitfalls + - verification_checklist + resources_allowed: + - references/ + - templates/ + - scripts/ + - assets/ +``` + +### Agent/profile spec contract + +```yaml +agent_spec: + id: + runtime: hermes-profile | delegate-template | kanban-worker + purpose: + reads: + - + writes: + - + toolsets: + - file + - terminal + - web + model_policy: + class: builder | researcher | evaluator | orchestrator + provider_preference: openai-codex | deepseek | inherit + authority: + can_edit_files: true|false + can_run_commands: true|false + can_publish_or_push: false +``` + +### Run manifest contract + +```json +{ + "run_id": "2026-05-30-example", + "domain": "example-domain", + "runtime": "hermes-delegate-task", + "orchestrator_profile": "orchestrator", + "model_policy": { + "orchestrator": "openai-codex/gpt-5.5", + "allowed_child_families": ["openai-codex", "deepseek"] + }, + "artifacts": [ + { + "phase": "01", + "agent": "researcher", + "path": "_workspace/01_researcher_findings.md", + "status": "complete" + } + ], + "validation": { + "frontmatter": "passed", + "links": "passed", + "trigger_eval": "pending" + } +} +``` + +## 10. Validation Loop + +```mermaid +flowchart LR + Draft["draft generated skill/profile"]:::agent --> Static["static validation\nfrontmatter, size, paths"]:::worker + Static --> Trigger["trigger eval\nshould / should-not load"]:::truth + Trigger --> DryRun["dry run\nparent + child plan"]:::worker + DryRun --> QA["QA review\nclaims, outputs, edge cases"]:::agent + QA --> Decision{"passes?"}:::truth + Decision -->|no| Repair["generalize fix\ndo not overfit"]:::action + Repair --> Static + Decision -->|yes| Package["commit / publish / profile distribution"]:::export + + classDef agent fill:#fce7f3,stroke:#db2777,color:#111827; + classDef worker fill:#fef9c3,stroke:#ca8a04,color:#111827; + classDef truth fill:#ffe4e6,stroke:#e11d48,color:#111827; + classDef action fill:#fef3c7,stroke:#d97706,color:#111827; + classDef export fill:#f5d0fe,stroke:#c026d3,color:#111827; +``` + +Minimum checks for a Hermes port: + +- `SKILL.md` starts at byte 0 with `---`. +- `name` and `description` exist. +- Description is <= 1024 chars and starts with `Use when ...`. +- Body is non-empty and under 100,000 chars. +- Skill path uses a category directory when bundled in a repo. +- Related skills resolve or are explicitly optional/local. +- New skills are tested in a fresh Hermes session, not only the session that + created them. +- Parent verifies subagent outputs with real file reads or command output. + +## 11. Implementation Phases + +| Phase | Build now? | Output | Notes | +|---|---:|---|---| +| 0. Architecture docs | Yes | This plan + system maps | Establish boundary before writing a port. | +| 1. Umbrella skill | Yes | `skills/autonomous-ai-agents/harness/SKILL.md` | English body, multilingual triggers. | +| 2. Reference migration | Yes | `references/*.md` | Keep Claude-specific details behind runtime mapping. | +| 3. Validation script | Yes | `scripts/validate_generated_harness.py` | Deterministic checks beat vibes. | +| 4. Profile templates | Soon | `templates/profile-spec.yaml` | Needed for persistent specialists. | +| 5. Full profile distribution | Later | Hermes distribution package | Build after one generated harness works. | +| 6. Kanban worker integration | Later | Durable worker lane example | Only after simple delegate flow proves useful. | + +## 12. Open Questions + +| Question | Why it matters | Default answer | +|---|---|---| +| Should the Hermes port live in this repo? | Could confuse Claude Code users. | Start as docs/adapter plan here; prototype in Hermes skills repo or a fork. | +| Should generated agents become profiles or prompt templates? | Profiles are durable but heavier. | Prompt templates first; profiles for recurring teams. | +| Should children use OpenAI Codex or DeepSeek by default? | Cost, latency, and auth differ by environment. | Inherit parent unless the profile explicitly binds a model. | +| Should `SOUL.md` be edited by the generator? | Dangerous global behavior mutation. | No, unless creating a named profile distribution. | +| Should generated skills be user-local or repo-bundled? | Distribution and review path differ. | Repo-bundled for shared/public harnesses; user-local for personal scratch. | + +## 13. Design Verdict + +The translation is viable, but only if Harness is treated as a compiler for +agent-team architecture, not a Claude Code primitive emulator. + +Best next move: build a Hermes umbrella skill plus `hermes-runtime-mapping.md`, +then test it by generating one tiny harness with three roles: +researcher, builder, evaluator. If that loop cannot produce validated skill +files and a run manifest, do not scale it. Scaling broken choreography just makes +more expensive mud. diff --git a/docs/system-design.md b/docs/system-design.md new file mode 100644 index 0000000..8877d72 --- /dev/null +++ b/docs/system-design.md @@ -0,0 +1,491 @@ + + +# Harness System Design Maps + +> Status: draft architecture reference, not runtime canon. +> Scope: current `revfactory/harness` repository as of 2026-05-30. +> Audience: contributors who need to understand what Harness is, where the +> runtime boundary is, and what should be built or documented next. + +## 1. Working Decision + +Harness is a Claude Code meta-skill and plugin that turns a domain request into +project-local agent-team architecture. + +It is not a standalone multi-agent runtime. The repo ships instructions, +reference patterns, and plugin metadata; Claude Code Agent Teams executes the +result. + +| Type | Fact | +|---|---| +| Product/system | Harness — team-architecture factory for Claude Code | +| Kernel | `skills/harness/SKILL.md` | +| Reference modules | `skills/harness/references/*.md` | +| Distribution | `.claude-plugin/plugin.json`, `.claude-plugin/marketplace.json` | +| Generated outputs | `.claude/agents/`, `.claude/skills/`, orchestrator skill, `CLAUDE.md` pointer, `_workspace/` artifacts | +| Primary runtime dependency | Claude Code v2.x Agent Teams with `CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1` | + +Assumptions and boundaries: + +| Assumption | Confidence | How to verify | +|---|---:|---| +| Harness itself does not execute teams; Claude Code does. | High | Repo contains no standalone runtime code or workflow engine. | +| `_workspace/` is the durable run/audit surface. | High | `SKILL.md` and orchestrator template preserve it for handoff and reruns. | +| The six architecture patterns are conceptual templates, not compiled graph specs. | High | Patterns live as Markdown guidance. | +| Future cross-runtime support should be an adapter layer, not a rewrite of the concept. | Medium | Validate by prototyping a Hermes adapter from the existing skill. | + +Non-goals for this document: + +- Do not redesign Harness into a new product. +- Do not treat dashboards as value by themselves. +- Do not imply deterministic generation where the current repo relies on model + instruction following. +- Do not promote Hermes translation work as current Harness functionality. + +## 2. Product Definition + +Harness is a design-time factory for building agent teams and their skills from +a compact domain description. + +This is for: + +- Claude Code users who want reusable, project-local specialist teams. +- Contributors reviewing or extending the Harness meta-skill. +- Runtime-adapter authors who need to port the concept to another agent system. + +This is not: + +- A generic LangGraph-style state machine runtime. +- A hosted orchestration service. +- A deterministic code generator with a typed compiler pipeline. + +## 3. Core Properties and Planes + +| Property | Requirement | +|---|---| +| Design-time factory | Generate durable project artifacts before repeated execution. | +| Runtime boundary clarity | Keep Harness distinct from Claude Code Agent Teams. | +| File-first memory | Preserve generated definitions and `_workspace/` artifacts. | +| Progressive disclosure | Keep the trigger skill lean; load references only when needed. | +| Drift-aware maintenance | Audit existing agents, skills, pointers, and workspace state first. | +| Explicit orchestration | Name agents, roles, inputs, outputs, and handoff paths. | +| Validation loop | Test structure, trigger behavior, dry-runs, and with-skill deltas. | +| Model policy visibility | State model assumptions and cost/runtime implications directly. | + +| Plane | Components | Job | +|---|---|---| +| Data plane | Project files, user prompt, `_workspace/`, generated Markdown | Carry input, intermediate artifacts, and outputs. | +| Truth/control plane | `CLAUDE.md` pointer, change history, Phase 0 audit, validation checklist | Decide what exists, what changed, and what is safe to run. | +| Agent plane | Generated `.claude/agents/*.md`, orchestrator skill, QA agent guidance | Define who acts and what each agent may produce. | +| Model plane | Claude Code model calls, current `model: "opus"` policy | Route reasoning/execution through capable Claude models. | +| Product plane | Claude plugin, global skill install, README/quickstart | Let users install, trigger, and understand Harness. | +| Export/extension plane | Generated skills, references, future runtime adapters | Package the pattern for other domains and runtimes. | + +## 4. Master Map + +```mermaid +flowchart TB + User["user prompt\n'build a harness for X'"]:::sensor + + subgraph Dist["distribution boundary"] + Plugin["Claude plugin manifest\n.claude-plugin/plugin.json"]:::export + Market["marketplace manifest\n.claude-plugin/marketplace.json"]:::export + PublicDocs["README + docs"]:::surface + end + + subgraph Harness["Harness design-time factory"] + Skill["meta-skill\nskills/harness/SKILL.md"]:::agent + Patterns["architecture patterns\nagent-design-patterns.md"]:::recall + OrchTpl["orchestrator templates\norchestrator-template.md"]:::recall + SkillGuide["skill-writing guide"]:::recall + TestGuide["skill-testing guide"]:::recall + QAGuide["QA agent guide"]:::recall + end + + subgraph Claude["Claude Code runtime boundary"] + Trigger["skill trigger\nname + description"]:::truth + TeamAPI["Agent Teams API\nTeamCreate / SendMessage / TaskCreate"]:::bus + AgentTool["Agent tool\nsubagent invocation"]:::worker + ClaudeTools["Claude tools\nRead / Write / Edit / shell / web"]:::worker + CloudModel["Claude model calls\ncurrent policy: opus"]:::modelCloud + end + + subgraph Generated["generated project harness"] + Agents[".claude/agents/*.md"]:::agent + Skills[".claude/skills/*/SKILL.md"]:::agent + Orchestrator["orchestrator skill"]:::truth + Pointer["CLAUDE.md pointer\ntrigger + change history"]:::audit + Workspace["_workspace/\nrun artifacts + handoffs"]:::store + end + + User --> Trigger + Dist --> Trigger + Trigger --> Skill + Skill --> Patterns + Skill --> OrchTpl + Skill --> SkillGuide + Skill --> TestGuide + Skill --> QAGuide + Skill --> ClaudeTools + ClaudeTools --> Agents + ClaudeTools --> Skills + ClaudeTools --> Orchestrator + ClaudeTools --> Pointer + Orchestrator --> TeamAPI + Orchestrator --> AgentTool + TeamAPI --> CloudModel + AgentTool --> CloudModel + TeamAPI --> Workspace + AgentTool --> Workspace + + classDef sensor fill:#dbeafe,stroke:#2563eb,color:#111827; + classDef adapter fill:#cffafe,stroke:#0891b2,color:#111827; + classDef bus fill:#ffedd5,stroke:#f97316,color:#111827; + classDef worker fill:#fef9c3,stroke:#ca8a04,color:#111827; + classDef store fill:#dcfce7,stroke:#16a34a,color:#111827; + classDef raw fill:#f1f5f9,stroke:#64748b,color:#111827; + classDef audit fill:#e5e7eb,stroke:#374151,color:#111827; + classDef truth fill:#ffe4e6,stroke:#e11d48,color:#111827; + classDef recall fill:#ede9fe,stroke:#7c3aed,color:#111827; + classDef security fill:#fee2e2,stroke:#dc2626,color:#111827; + classDef agent fill:#fce7f3,stroke:#db2777,color:#111827; + classDef model fill:#e0e7ff,stroke:#4f46e5,color:#111827; + classDef modelCloud fill:#fae8ff,stroke:#9333ea,color:#111827; + classDef surface fill:#ccfbf1,stroke:#0f766e,color:#111827; + classDef action fill:#fef3c7,stroke:#d97706,color:#111827; + classDef external fill:#f8fafc,stroke:#475569,color:#111827; + classDef export fill:#f5d0fe,stroke:#c026d3,color:#111827; +``` + +## 5. Layer Responsibilities + +| Layer | Name | Owns | Does not own | +|---|---|---|---| +| L0 | Packaging | Plugin and marketplace metadata | Runtime behavior | +| L1 | Trigger skill | When Harness activates and which references to load | Agent execution state | +| L2 | Pattern library | Six team architecture patterns and selection guidance | Generated project truth | +| L3 | Generator workflow | Audit, analysis, agent/skill/orchestrator creation | Deterministic compilation guarantees | +| L4 | Generated harness | Project-local agents, skills, pointer, workspace | Global plugin distribution | +| L5 | Claude Code runtime | Team creation, message passing, task state, model calls | Harness documentation upkeep | +| L6 | Validation/evolution | Trigger checks, dry-runs, QA, change history | Hidden autonomous mutation | + +## 6. Generation Flow + +```mermaid +flowchart TD + A["user asks for a harness"]:::sensor --> B["Harness skill triggers"]:::truth + B --> C["Phase 0\naudit existing .claude/agents, .claude/skills, CLAUDE.md"]:::worker + C --> D{"existing state?"}:::truth + D -->|none| E["new build"]:::action + D -->|existing + add/change| F["extension path"]:::action + D -->|audit/sync/maintain| G["maintenance path"]:::action + + E --> H["Phase 1\ndomain + codebase analysis"]:::worker + F --> H + G --> Z["audit, sync, validate, update change history"]:::truth + + H --> I["Phase 2\nexecution mode + architecture pattern"]:::truth + I --> J["Phase 3\ngenerate/reuse agent definitions"]:::agent + J --> K["Phase 4\ngenerate/reuse skills"]:::agent + K --> L["Phase 5\ngenerate/update orchestrator skill"]:::truth + L --> M["register CLAUDE.md pointer"]:::audit + M --> N["Phase 6\nstructure, trigger, dry-run validation"]:::worker + N --> O["Phase 7\nfeedback and evolution loop"]:::truth + + classDef sensor fill:#dbeafe,stroke:#2563eb,color:#111827; + classDef worker fill:#fef9c3,stroke:#ca8a04,color:#111827; + classDef truth fill:#ffe4e6,stroke:#e11d48,color:#111827; + classDef agent fill:#fce7f3,stroke:#db2777,color:#111827; + classDef action fill:#fef3c7,stroke:#d97706,color:#111827; + classDef audit fill:#e5e7eb,stroke:#374151,color:#111827; +``` + +## 7. Runtime Execution Flow + +```mermaid +flowchart TD + A["later user task"]:::sensor --> B["CLAUDE.md pointer nudges orchestrator"]:::audit + B --> C["orchestrator checks _workspace"]:::truth + C --> D{"run mode"}:::truth + D -->|initial| E["create fresh _workspace"]:::store + D -->|partial rerun| F["reuse targeted prior artifacts"]:::store + D -->|new input| G["archive old _workspace\nthen recreate"]:::store + + E --> H{"execution mode"}:::truth + F --> H + G --> H + + H -->|Agent Teams| T["TeamCreate members"]:::bus + T --> U["TaskCreate shared tasks"]:::bus + U --> V["agents coordinate\nSendMessage + TaskUpdate"]:::agent + V --> W["agents write artifacts"]:::store + W --> X["leader reads artifacts\nand integrates"]:::truth + X --> Y["TeamDelete / cleanup"]:::action + + H -->|Subagents| S["Agent tool calls\nrun_in_background optional"]:::worker + S --> S2["collect returns\nread file artifacts"]:::worker + S2 --> X + + H -->|Hybrid| H2["alternate team and subagent phases"]:::truth + H2 --> T + H2 --> S + + Y --> Z["final deliverable + user summary"]:::surface + + classDef sensor fill:#dbeafe,stroke:#2563eb,color:#111827; + classDef bus fill:#ffedd5,stroke:#f97316,color:#111827; + classDef worker fill:#fef9c3,stroke:#ca8a04,color:#111827; + classDef store fill:#dcfce7,stroke:#16a34a,color:#111827; + classDef truth fill:#ffe4e6,stroke:#e11d48,color:#111827; + classDef agent fill:#fce7f3,stroke:#db2777,color:#111827; + classDef audit fill:#e5e7eb,stroke:#374151,color:#111827; + classDef action fill:#fef3c7,stroke:#d97706,color:#111827; + classDef surface fill:#ccfbf1,stroke:#0f766e,color:#111827; +``` + +## 8. Data and Truth Boundaries + +```mermaid +flowchart LR + Input["user prompt + project files"]:::sensor --> Analysis["domain/codebase analysis"]:::worker + Analysis --> Plan["team pattern + execution plan"]:::truth + + Plan --> AgentDefs["agent definitions\n.claude/agents/*.md"]:::agent + Plan --> SkillDefs["skill definitions\n.claude/skills/*/SKILL.md"]:::agent + Plan --> Orch["orchestrator skill"]:::truth + Orch --> Pointer["CLAUDE.md pointer\nminimal trigger + change log"]:::audit + + Input --> WS0["_workspace/00_input"]:::store + WS0 --> Work["agent work"]:::worker + Work --> Msg["SendMessage\nlightweight exchange"]:::bus + Work --> Tasks["TaskCreate/TaskUpdate\nshared task state"]:::bus + Work --> Artifacts["_workspace/phase_agent_artifact.md"]:::store + Artifacts --> Integrate["orchestrator reads + integrates"]:::truth + Integrate --> Final["final deliverable"]:::surface + Integrate --> Trail["_workspace preserved\naudit / debug / partial rerun"]:::audit + + classDef sensor fill:#dbeafe,stroke:#2563eb,color:#111827; + classDef worker fill:#fef9c3,stroke:#ca8a04,color:#111827; + classDef truth fill:#ffe4e6,stroke:#e11d48,color:#111827; + classDef agent fill:#fce7f3,stroke:#db2777,color:#111827; + classDef audit fill:#e5e7eb,stroke:#374151,color:#111827; + classDef store fill:#dcfce7,stroke:#16a34a,color:#111827; + classDef bus fill:#ffedd5,stroke:#f97316,color:#111827; + classDef surface fill:#ccfbf1,stroke:#0f766e,color:#111827; +``` + +Truth rules: + +| Object | Truth status | Notes | +|---|---|---| +| `skills/harness/SKILL.md` | Source for the current factory workflow | Human-authored meta-skill, not generated state. | +| Reference files | Source for pattern/detail guidance | Loaded conditionally by the skill. | +| `.claude/agents/*.md` | Generated project-local role specs | Must be audited before extension to avoid duplicates. | +| `.claude/skills/*/SKILL.md` | Generated project-local operating procedures | Skills are the reusable "how" for each agent. | +| Orchestrator skill | Generated project-local control plane | Owns who runs when and how artifacts move. | +| `CLAUDE.md` pointer | Minimal project entrypoint | Should not duplicate the whole team inventory. | +| `_workspace/` | Run evidence and handoff staging | Preserved for audit, partial reruns, and debugging. | +| SendMessage/task state | Runtime coordination state | Useful but less durable than files. | + +## 9. Agent and Worker Contracts + +Agents make judgments. Workers transform or validate artifacts. + +| Agent/worker | Purpose | Reads | Writes | Authority | +|---|---|---|---|---| +| Harness meta-skill | Design a project harness | User prompt, project files, existing harness state, references | Generated agent/skill/orchestrator files | May create/update design artifacts after audit. | +| Generated orchestrator | Coordinate a domain team | `CLAUDE.md`, `_workspace/`, generated agent/skill files | `_workspace/`, final output, change history | Owns workflow sequencing and fallback. | +| Generated specialist agent | Perform a bounded domain role | Assigned input, prior artifacts, role skill | Assigned artifact path | Writes only its assigned artifact unless told otherwise. | +| QA agent | Cross-check output and boundary assumptions | Source files, API/UI shapes, generated output, tests | QA report, issue list, optional fixes if authorized | Verifies, does not rubber-stamp existence checks. | +| Trigger evaluator | Test skill descriptions and near misses | Candidate descriptions and eval prompts | Trigger pass/fail report | Recommends description changes. | + +### Core generated artifact schemas + +```yaml +agent_definition: + path: .claude/agents/.md + required_sections: + - core_role + - working_principles + - input_output_protocol + - error_handling + - collaboration + - team_communication_protocol_if_team_mode +``` + +```yaml +skill_definition: + path: .claude/skills//SKILL.md + frontmatter: + name: + description: + body: + - overview + - workflow + - outputs + - pitfalls + - verification + optional_resources: + - references/ + - scripts/ + - assets/ +``` + +```yaml +run_artifact: + root: _workspace/ + input_dir: _workspace/00_input/ + artifact_name: __.md + final_output: + preserve: true +``` + +## 10. Model Layer + +Current Harness policy is Claude-specific: generated agent calls should use +`model: "opus"`. That is simple and quality-biased, but expensive and not +portable. + +```mermaid +flowchart LR + Context["agent prompt + project context"]:::recall --> Router["Claude Code model route"]:::model + Router --> Opus["Claude model\ncurrent policy: opus"]:::modelCloud + Opus --> Result["agent output"]:::agent + Result --> Audit["artifact / summary / workspace trail"]:::audit + + Router --> Risk["cost + availability risk\n50K-200K token tasks possible"]:::security + + classDef recall fill:#ede9fe,stroke:#7c3aed,color:#111827; + classDef model fill:#e0e7ff,stroke:#4f46e5,color:#111827; + classDef modelCloud fill:#fae8ff,stroke:#9333ea,color:#111827; + classDef agent fill:#fce7f3,stroke:#db2777,color:#111827; + classDef audit fill:#e5e7eb,stroke:#374151,color:#111827; + classDef security fill:#fee2e2,stroke:#dc2626,color:#111827; +``` + +| Model class | Current use | Sensitive data policy | Gap | +|---|---|---|---| +| Claude Opus-class | All generated agents by policy | Governed by Claude Code runtime and user project context | No per-agent cost/fallback routing. | +| No-LLM deterministic checks | Markdown/YAML/frontmatter validation when scripted | Safe for local validation | Scripts are mostly methodology, not bundled automation. | +| Future adapter models | Hermes/Codex/DeepSeek/local routes | Must be explicit per adapter | Requires capability tiers, not hard-coded Claude names. | + +## 11. Architecture Patterns + +```mermaid +flowchart LR + P1["Pipeline\nsequential dependency"]:::worker --> M1["Use when outputs feed next stage"]:::truth + P2["Fan-out / Fan-in\nparallel perspectives"]:::worker --> M2["Best native Agent Teams fit"]:::truth + P3["Expert Pool\nconditional expert routing"]:::worker --> M3["Often cheaper as subagents"]:::truth + P4["Producer / Reviewer\ngenerate then verify"]:::worker --> M4["Team useful for feedback loop"]:::truth + P5["Supervisor\ndynamic task allocation"]:::worker --> M5["Maps to shared task list"]:::truth + P6["Hierarchical Delegation\nrecursive decomposition"]:::worker --> M6["Constraint: no nested teams\nflatten or team + subagents"]:::security + + classDef worker fill:#fef9c3,stroke:#ca8a04,color:#111827; + classDef truth fill:#ffe4e6,stroke:#e11d48,color:#111827; + classDef security fill:#fee2e2,stroke:#dc2626,color:#111827; +``` + +## 12. External Dependency Loop + +```mermaid +flowchart TD + Harness["Harness v1.2.x"]:::agent --> ClaudeCode["Claude Code v2.x+"]:::external + ClaudeCode --> Flag["CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1"]:::security + Flag --> TeamCreate["TeamCreate"]:::bus + Flag --> SendMessage["SendMessage"]:::bus + Flag --> TaskCreate["TaskCreate / TaskUpdate"]:::bus + ClaudeCode --> AgentTool["Agent tool"]:::worker + ClaudeCode --> Anthropic["api.anthropic.com"]:::modelCloud + Install["plugin install"]:::surface --> GitHub["GitHub marketplace repo"]:::external + + Flag --> Future{"upstream future"}:::truth + Future --> A["Agent Teams GA\nremove flag path"]:::action + Future --> B["Managed Agents GA\nadapter/export path"]:::action + Future --> C["breaking API change\nhotfix + matrix"]:::security + + classDef agent fill:#fce7f3,stroke:#db2777,color:#111827; + classDef external fill:#f8fafc,stroke:#475569,color:#111827; + classDef security fill:#fee2e2,stroke:#dc2626,color:#111827; + classDef bus fill:#ffedd5,stroke:#f97316,color:#111827; + classDef worker fill:#fef9c3,stroke:#ca8a04,color:#111827; + classDef modelCloud fill:#fae8ff,stroke:#9333ea,color:#111827; + classDef surface fill:#ccfbf1,stroke:#0f766e,color:#111827; + classDef truth fill:#ffe4e6,stroke:#e11d48,color:#111827; + classDef action fill:#fef3c7,stroke:#d97706,color:#111827; +``` + +## 13. Pluggability Model + +Harness already has the right conceptual split: patterns, generated agents, +generated skills, and orchestration. The missing abstraction is a runtime +adapter. + +```yaml +runtime_adapter: + id: claude-code-agent-teams + status: current + input: + - domain_request + - project_context + - existing_harness_state + outputs: + - agent_definitions + - skill_definitions + - orchestrator_skill + - project_pointer + - workspace_manifest_optional + primitives: + create_team: TeamCreate + message_agent: SendMessage + create_task: TaskCreate + invoke_subagent: Agent + model_policy: opus + audit_surface: _workspace/ +``` + +A future adapter should implement the same conceptual contract without copying +Claude-specific tool names into the core design. + +## 14. MVP Spine for the Next Build Slice + +The next useful module is not a dashboard. It is a manifest-backed generated +harness contract. + +Closest time-to-value: + +| Feature | Why now | Depends on | +|---|---|---| +| `_workspace/run-manifest.json` | Makes reruns, audit, and partial execution deterministic. | Artifact naming contract. | +| Adapter contract doc | Separates Harness from Claude Code primitive names. | This system map. | +| Cost/model policy table | Makes `model: "opus"` explicit and overrideable later. | Runtime adapter boundary. | +| Validation script | Turns methodology into repeatable checks. | Markdown/frontmatter schema. | + +Deferred: + +- Hosted dashboard. +- Multi-user permission system. +- Runtime state database. +- Full cross-runtime implementation before the adapter contract is proven. + +## 15. Open Questions + +| Question | Why it matters | Default assumption | +|---|---|---| +| Should generated runs include a machine-readable manifest? | Without it, drift detection is model/manual. | Yes, add as a docs-backed convention first. | +| Should model policy stay hard-coded to Opus? | Cost and portability suffer. | Keep for Claude path, but add model class language. | +| Should Hermes/Codex support live in this repo or a sibling repo? | Avoid confusing current users. | Start as an adapter design doc, then prototype separately. | +| Should README link architecture docs in all languages? | Discoverability vs i18n maintenance. | Add docs first; update localized READMEs only if maintainers want parity. | + +## 16. Design Verdict + +Harness is strongest as a team-architecture compiler written in Markdown: +pattern selection, role definition, skill authoring, orchestration, and +validation are all legible. + +The weak spot is pretending runtime details are generic. They are not. Today the +runtime is Claude Code Agent Teams. Future ports should keep the Harness factory +abstraction and swap the runtime adapter underneath it. + +Build next: a manifest-backed adapter contract plus a validation script. That +turns the useful methodology into something contributors can test instead of +just admire.