diff --git a/ts/docs/design/workflowSystem/ir/decisions/0001-bound-outputs.md b/ts/docs/design/workflowSystem/ir/decisions/0001-bound-outputs.md index c4c312c5a..adcf3ddee 100644 --- a/ts/docs/design/workflowSystem/ir/decisions/0001-bound-outputs.md +++ b/ts/docs/design/workflowSystem/ir/decisions/0001-bound-outputs.md @@ -7,7 +7,11 @@ future extensions. Related: [0002-cfg-ddg-separation.md](0002-cfg-ddg-separation.md) (C2, C3, C4, C5), [../future/block-scope.md](../future/block-scope.md) (blocks remain post-v1 for the multi-statement try and the regional -grouping cases that bound outputs alone don't address). +grouping cases that bound outputs alone don't address), +[0010-copilot-task-family.md](0010-copilot-task-family.md) (the +deferred `copilot.session.fork` pattern carries session/event IDs +across nodes through the `bind`/`$from "scope"` mechanism this +decision establishes). ## 1. The proposal diff --git a/ts/docs/design/workflowSystem/ir/decisions/0003-task-schema-source.md b/ts/docs/design/workflowSystem/ir/decisions/0003-task-schema-source.md index 3c3c6aa2b..eca1ed56f 100644 --- a/ts/docs/design/workflowSystem/ir/decisions/0003-task-schema-source.md +++ b/ts/docs/design/workflowSystem/ir/decisions/0003-task-schema-source.md @@ -388,3 +388,10 @@ of it. reduces duplication within a single IR; orthogonal to this decision. - ir-v0.1.md §8.1 notes (post-v1) - the DSL layer that would handle the authoring-friction side of this trade-off. +- [0010-copilot-task-family.md](0010-copilot-task-family.md) §4 - + schema-guided design for `copilot.invoke` relies on the Option 1' + drift check to reject non-object IR `outputSchema`s at IR + validation time. +- [0011-task-context-schema-awareness.md](0011-task-context-schema-awareness.md) + exposes the IR-declared schemas this decision makes authoritative + to task implementers via `TaskContext`. diff --git a/ts/docs/design/workflowSystem/ir/decisions/0010-copilot-task-family.md b/ts/docs/design/workflowSystem/ir/decisions/0010-copilot-task-family.md new file mode 100644 index 000000000..ce05e691e --- /dev/null +++ b/ts/docs/design/workflowSystem/ir/decisions/0010-copilot-task-family.md @@ -0,0 +1,354 @@ +# Copilot SDK task family (decision 0010) + +Status: **Adopted (v1).** Design-complete for the full family; v1 ships +only `copilot.invoke` (the rest is documented and approved but +deferred to a later rev). The IR schema does **not** change — every +member of the family is just a registered task. + +Related: + +- [../../principles/design-principles.md](../../principles/design-principles.md) — P1-P5 and the "fewest concepts / behavioral variance" discipline. +- [0001-bound-outputs.md](0001-bound-outputs.md) — `bind`/`$from` mechanism the deferred fork pattern relies on. +- [0003-task-schema-source.md](0003-task-schema-source.md) — Option 1' drift check: the rule that rejects non-object IR `outputSchema`s for `copilot.invoke`. +- [0011-task-context-schema-awareness.md](0011-task-context-schema-awareness.md) — engine extension that exposes a node's declared schemas to the task implementer (used by `copilot.invoke` to drive its schema-guided turn loop). + +## 1. Problem + +The workflow engine ships an `llm.generate` builtin task that calls a +chat model via the in-repo `aiclient` package. We want sibling tasks +that drive **agentic** turns through the GitHub Copilot CLI via +[`@github/copilot-sdk`](https://github.com/github/copilot-sdk). + +The SDK is qualitatively different from `aiclient`: + +- **Stateful.** A long-lived `CopilotClient` (which spawns a CLI + subprocess over JSON-RPC) hosts one or more `CopilotSession`s. +- **Agentic.** The model calls tools (read/write file, shell, web, + MCP servers, custom JS handlers) inside the session's tool-use + loop. +- **Permissioned.** Every tool call is gated through an + `onPermissionRequest` callback the host must supply. +- **Sub-agent capable.** Custom agents declared at session creation + (`customAgents: [{name, prompt, tools, ...}]`) are auto-delegated + to by the runtime. +- **State-rich.** `sessions.fork`, `session.history.compact`, and + `session.history.truncate` (all marked `@experimental` on the SDK's + RPC surface) provide non-destructive forks from a given event ID, + forced compaction, and destructive rollback respectively. +- **Event-streaming.** `assistant.message_delta`, `tool.execution_*`, + `subagent.*`, `session.idle`, `session.compaction_*`, and more. + +Mapping all of this naively into the IR would create several new +IR-level concepts (sessions as IR-visible resources, sub-agent +topology as IR structure, tool surfaces as wired data flow). The IR +discipline (§principles preamble: "fewest concepts / behavioral +variance"; "principles govern the boundary, not the interior") says +we should NOT pull those into the IR unless they earn it. + +## 2. Key insight: the family fits with zero new IR concepts + +The Copilot SDK's session/fork/compact APIs operate on two opaque +identifiers: **session ID** (string) and **event ID** (string). +These are pure data. They flow through the IR's existing reference +mechanism — `bind` (decision 0001) and `$from: "scope"` — +without requiring any new IR concept: + +``` +research: copilot.session.send → bind: "research" = { sessionId, lastEventId, text } + ┌─────────────────────────────────────────┐ + ▼ ▼ +forkA: copilot.session.fork forkB: copilot.session.fork + { sessionId: $from research.sessionId, { sessionId: $from research.sessionId, + toEventId: $from research.lastEventId } toEventId: $from research.lastEventId } + bind: "branchA" = { sessionId } bind: "branchB" = { sessionId } + │ │ + ▼ ▼ +sumA: copilot.session.send (parallel) sumB: copilot.session.send (parallel) +``` + +Principle scorecard: + +- **P2** ("All data flow is traceable through the IR alone" — _for any + task input, can I trace it back to its origin by reading the IR?_): + every consumer reads who forked from what. +- **P3** ("IR structure corresponds to computational structure" — _does + the IR reveal the pattern, or must you analyze the graph to discover + it?_): a fork in the conversation is a fork in the IR. +- **P5** ("A reader of the IR can predict engine behavior" — _would a + reader be surprised by the behavior?_): which nodes share session + state is visible from the IR. +- **Fewest concepts.** Zero new IR concepts. Sessions and event IDs are + opaque values; the existing reference machinery moves them. Each + task is just typed-in, typed-out (P4 boundary). + +## 3. The full task family + +Each entry has a one-line "earns its place by exposing a distinct SDK +behavior the existing tasks cannot reproduce" justification. + +| Task | Earns its place by | Implementation phase | +| ------------------------- | ------------------------------------------------------------------------------------------------ | -------------------- | +| `copilot.invoke` | Convenience: create+send+close in one call; no session-ID plumbing; matches `llm.generate` shape | **v1 (now)** | +| `copilot.session.create` | Produce a session ID; configure model/agents/tools once | Deferred | +| `copilot.session.send` | Send a turn; return `{sessionId, lastEventId, text}` so downstream can fork or rewind | Deferred | +| `copilot.session.fork` | Non-destructive branch from a remembered event ID — the parallel-continuation primitive | Deferred | +| `copilot.session.compact` | Force compaction; expose `tokensRemoved` / `contextWindow` for IR-visible decisions | Deferred | +| `copilot.session.close` | Release in-memory resources (data preserved on disk for resume) | Deferred | + +`copilot.session.truncate` (destructive in-place rollback) is +**rejected** from the family — see §6 alternative D. + +## 4. Schema-guided design (applies to every member of the family that + +returns a value) + +`copilot.invoke` (and the deferred `copilot.session.send`) is "JSON in, +JSON out": the registered output schema is `{ "type": "object" }`, and +the _actual_ per-call output shape is whatever the IR node declares as +its `outputSchema`. Decision 0003 (Option 1') already makes the IR +node's `outputSchema` authoritative; this decision adds the rule that +the task **uses** that schema to drive the agent's response, not just +have it validated post-hoc. + +The pattern is TypeChat-shaped: + +1. **The task reads `ctx.outputSchema`** — exposed by decision 0011 + (engine API extension; not an IR change). +2. **It registers a synthetic `submit_response` tool** whose + parameters JSON Schema _is_ the node's `outputSchema`. The system + prompt nudges the agent to call it exactly once when finished. +3. **It runs the agent** via `session.sendAndWait`. +4. **It captures the validated tool arguments** from the + `submit_response` handler. The SDK validates the tool args against + the schema before our handler runs, so most of the repair work is + free. +5. **On failure** — either the agent called `submit_response` with + schema-invalid arguments (the SDK's tool-validation message + becomes the next-turn instruction), or the session reached `idle` + without `submit_response` being called (the task sends a follow-up + nudge) — **it retries** within a bounded budget. + +Default repair budget: **3 attempts** (initial + 2 repairs). Override +via optional `repairBudget?: integer` input; budget ≥ 1; task-internal +cap at 10 to prevent runaway loops. On budget exhaustion, +`copilot.invoke` returns +`{ kind: "fail", error: { message, data: { lastResponse, ajvErrors, attempts } } }` +so the workflow's existing `onError` mechanism can react. + +Non-object IR `outputSchema`s are **rejected at IR validation time** +by decision 0003's drift check (Option 1') — `copilot.invoke`'s +registered output is `{"type":"object"}`, and any narrower IR-side +declaration must be a subtype of that. Free-text returns require +explicit wrapping (e.g. +`{type:"object", required:["text"], properties:{text:{type:"string"}}}`). +Primitive returns are not supported in v1; if a real workflow needs +them, the answer is a separate task (e.g. `copilot.invokeText`), not +blurring this one. + +The IR author's optional `systemMessage` input is **appended** to the +SDK's system prompt scaffolding (mode `append`), never replaces it. + +## 5. v1 task: `copilot.invoke` + +### 5.1 Input schema + +| Field | Type | Required | Notes | +| ----------------- | ------------------------------------------------------------ | -------- | ----------------------------------------------------------------- | -------- | --- | -------------------------- | +| `prompt` | string | yes | The user-turn message | +| `model` | string | no | e.g. `"gpt-5"`. Defaults to engine config / SDK default | +| `systemMessage` | string | no | Appended to SDK system prompt scaffolding (mode `append`) | +| `customAgents` | array of `{name, displayName?, description, prompt, tools?}` | no | Pure-data sub-agent definitions | +| `allowedTools` | string[] | no | Allow-list of CLI built-in tool names (`view`, `edit`, `bash`, …). The engine always merges the synthetic `submit_response` tool into the SDK's `availableTools` allow-list so an empty `allowedTools: []` (deny all built-ins) still leaves the termination contract intact. | +| `attachments` | array of `{path}` | no | Paths validated against the `validateFilePath` allowed roots | +| `timeoutMs` | integer | no | Hard cap on session run time | +| `reasoningEffort` | `"low" | "medium" | "high" | "xhigh"` | no | For models that support it | +| `repairBudget` | integer | no | Schema-repair attempts; default 3, range 1–10 | + +### 5.2 Output schema (registered) + +`{ "type": "object" }`. Per-call shape comes from the IR node's +declared `outputSchema` per §4. + +### 5.3 Side-effects and permissions + +`sideEffects: true`. The engine's existing per-task policy +(`allow|prompt|deny`) gates the entire invocation as today. **Inside** +the session, the SDK's `onPermissionRequest` is wired to `approveAll` +in v1 — the agent may freely call any tool the SDK exposes +(read/write file, shell, web, MCP, custom). This is deliberately +temporary; see §7. + +### 5.4 Authentication + +Environment variables only — `COPILOT_GITHUB_TOKEN` / `GH_TOKEN` / +`GITHUB_TOKEN` — falling back to the `copilot` CLI's stored OAuth +login. No `gitHubToken` or `provider` field appears in the IR. Same +posture as `llm.generate`. Per-task BYOK is purely additive and can +be added later without revisiting this record. + +## 6. Deferred tasks (design captured; not in v1 implementation) + +All `sideEffects: true`. All schema-guided per §4 (where applicable). + +- **`copilot.session.create`** + - In: same as `copilot.invoke` minus `prompt`/`repairBudget`. + - Out: `{ sessionId: string }`. +- **`copilot.session.send`** + - In: `{ sessionId, prompt, attachments?, timeoutMs?, repairBudget? }`. + - Out: `{ sessionId, lastEventId, ... }` where `` + is whatever the IR node declares; `sessionId` and `lastEventId` + are added by the task on top of the IR-declared object so + downstream nodes can `bind` and chain. (The IR author declares + these fields on their node's `outputSchema`.) +- **`copilot.session.fork`** + - In: `{ sessionId, toEventId? }`. + - Out: `{ sessionId }` (the new fork's ID). +- **`copilot.session.compact`** + - In: `{ sessionId }`. + - Out: `{ sessionId, tokensRemoved, messagesRemoved, contextTokens, contextLimit }`. +- **`copilot.session.close`** + - In: `{ sessionId, deletePersistent?: boolean }`. + - Out: `{}`. (`deletePersistent: true` calls the SDK's + `deleteSession` instead of `disconnect`.) + +## 7. Permission posture and the longer-term direction + +v1's `approveAll` posture is **deliberately temporary**. The durable +answer is a **capability-based security model** in which: (a) each +task declares the capabilities it needs (file-write, shell, network, +outbound-domain), (b) a workflow declares the capability budget it +grants, and (c) the engine enforces the intersection at task +boundaries. This aligns with the existing aspiration in +[../../principles/design-principles.md](../../principles/design-principles.md): + +> "The design should remain open to expanding what tasks declare +> about themselves … capability and side-effect declarations." + +That work is engine/IR-wide, not Copilot-specific, so it gets its own +decision record when it begins. The Copilot family will be one of its +first consumers — its `onPermissionRequest` is the natural enforcement +point. + +## 8. Engine-side concerns (not IR concepts) + +### 8.1 SDK client lifecycle + +The `@github/copilot-sdk` `CopilotClient` spawns a CLI subprocess and +holds a JSON-RPC connection. Spawning per task call is too expensive. +v1 holds a **lazy module-singleton** in +`engine/src/copilotClientHost.ts`, lazily started on first call to any +`copilot.*` task and disposed at engine shutdown. The SDK import +itself is dynamic (`import("@github/copilot-sdk")`) so consumers who +never invoke any `copilot.*` task don't pay the bundled-CLI install +cost on first use. + +### 8.2 Session-leak safety net + +`copilot.invoke` creates an internal session per call and disposes it +in a `finally`. The deferred `copilot.session.*` family creates +session IDs that cross node boundaries; without a safety net, long +workflows leak. Recommended approach for the deferred tasks (already +plumbed into `copilotClientHost.ts` for v1 so it's ready): + +- **Explicit close is the contract.** Authors `bind` a session ID + and pair it with a `copilot.session.close` consumer. +- **Best-effort safety net.** The host module maintains a per-run + set of Copilot session IDs created by `copilot.session.create` / + `copilot.session.fork`, and disconnects any not closed when the + run ends (success or failure). + +This is **not** a new IR concept — it's an engine concern analogous +to `AbortSignal` cleanup. + +### 8.3 Concurrency + +The SDK explicitly states "no built-in session locking; concurrent +access to the same session is undefined." When fork lands, the engine +MUST ensure two `copilot.session.send` nodes never share the same +session ID. Forking creates a new ID, so the IR-correct pattern (fork +before parallel send) makes this fall out automatically. A future +validation warning could flag two sends in concurrent regions +referencing the same `sessionId`. + +### 8.4 Experimental SDK surface + +`sessions.fork`, `session.history.compact`, and +`session.history.truncate` are marked `@experimental` on the SDK's +RPC layer. **`copilot.invoke` does NOT use any experimental RPCs** — +it only uses stable `createSession` / `sendAndWait` / `disconnect` +plus the stable `defineTool` mechanism for `submit_response`. When the +deferred `copilot.session.*` family lands, the experimental calls +will be isolated in `copilotClientHost.ts` so the surface area is one +well-named adapter. + +## 9. Alternatives considered + +### A. Single-shot only, opaque session per call (no session.\* family) + +Reject. Forecloses the fork/rewind/parallel-continuation patterns the +SDK specifically supports, with no IR-side justification. Leaves +genuine capability on the floor. + +### B. IR-visible "resource handle" type for sessions + +Reject. Session IDs are already opaque strings; introducing a new IR +concept (handle/resource value with engine-managed lifetime) earns no +behavioral variance the existing reference mechanism cannot already +express. Cleanup is an engine concern (§8.2), not an IR concern. The +"fewest concepts" discipline rejects new concepts that only relabel +existing mechanisms. + +### C. Non-IR-visible "session context" carried in `TaskContext` + +Reject. Violates P2 (data flow happens outside the IR — readers +can't see which nodes share a session) and P5 (reader can't predict +which sessions are shared without consulting engine internals). The +fact that session IDs cross node boundaries via `bind`/`$from` is +exactly what makes the family principle-aligned. + +### D. Include `copilot.session.truncate` in the family + +Reject. Destructive in-place mutation of a session referenced by other +nodes violates P5 ("would a reader be surprised by the behavior, +including by what the engine keeps alive?"). `fork` covers the same +use cases non-destructively; truncate's only edge over fork is "saves +the cost of duplicating the session prefix," which is not a workflow +author concern. + +### E. Free-text output (fixed `{text}` shape) for `copilot.invoke` + +Reject. Schema-guided structured output is the headline value of +running a tool-using agent inside a workflow — downstream nodes can +reference structured fields via `$from … path: […]` with full P1 +type-checking. `llm.generate` already exists for free-text use; that +is the right destination for callers who want a string. + +### F. Schema-guidance via system-prompt-only or wrapping primitives + +Reject system-prompt-only: brittle parsing of the agent's last message +(must strip code fences, narrative text), no SDK-side validation, +hits the repair loop more often. Reject primitive wrapping: creates +two ways to express one thing in the IR (`{type:"string"}` vs +`{type:"object", properties:{value:{type:"string"}}}`) which would +behave differently — P5 violation. The chosen design uses the SDK's +typed-tool surface (`defineTool`) as a clean termination contract. + +## 10. Risks and gotchas + +- **Bundled CLI install size.** `@github/copilot-sdk` bundles the + Copilot CLI binary. Mitigated by dynamic import in + `copilotClientHost.ts`. +- **CI cannot make real Copilot calls.** Tests mock the client + factory. Per repo policy, `pnpm run test:live` is not run. +- **Experimental RPCs in deferred tasks.** When the + `copilot.session.*` family lands, the experimental surface is + isolated in `copilotClientHost.ts` so a future SDK churn affects + one adapter. + +## 11. Cross-references + +- [../../principles/design-principles.md](../../principles/design-principles.md) — P1-P5 and the "fewest concepts" discipline. +- [0001-bound-outputs.md](0001-bound-outputs.md) — `bind`/`$from` mechanism the deferred fork pattern relies on. +- [0003-task-schema-source.md](0003-task-schema-source.md) — Option 1' drift check that rejects non-object IR `outputSchema`s for `copilot.invoke`. +- [0011-task-context-schema-awareness.md](0011-task-context-schema-awareness.md) — engine extension exposing the node's declared schemas to the task implementer; what makes §4 possible. +- [../ir-v1.md](../ir-v1.md) §3.5 (task node), §5.2 (runtime output schema validation). diff --git a/ts/docs/design/workflowSystem/ir/decisions/0011-task-context-schema-awareness.md b/ts/docs/design/workflowSystem/ir/decisions/0011-task-context-schema-awareness.md new file mode 100644 index 000000000..605e8b217 --- /dev/null +++ b/ts/docs/design/workflowSystem/ir/decisions/0011-task-context-schema-awareness.md @@ -0,0 +1,141 @@ +# TaskContext schema awareness (decision 0011) + +Status: **Adopted (v1).** Engine API extension; **not** an IR change. +Adds `outputSchema` to the `TaskContext` value the engine passes to +`task.execute`. The schema is populated from the dispatching node's +IR-declared `outputSchema` — i.e. existing IR data, made visible to +the task implementer. + +Related: + +- [../../principles/design-principles.md](../../principles/design-principles.md) — P4 ("each part can be understood / validated / tested without the whole, given only its declared boundary contract"). +- [0003-task-schema-source.md](0003-task-schema-source.md) — establishes that the IR node's `inputSchema`/`outputSchema` are authoritative (Option 1'). +- [0010-copilot-task-family.md](0010-copilot-task-family.md) — the first consumer; `copilot.invoke` reads `ctx.outputSchema` to drive its schema-guided turn loop. + +## 1. Problem + +Decision 0003 establishes that for every task node the IR's declared +`inputSchema`/`outputSchema` is authoritative — it either restates the +registered task's contract verbatim or narrows it. The engine already +validates a task's return value against the IR-declared `outputSchema` +at runtime (`ir-v1.md` §5.2). + +But today, the task implementation cannot **read** its own node's +declared `outputSchema`. `TaskContext` carries `runId`, `nodeId`, +`scopePath`, `signal`, `constraints` — but not the output schema. So +a task that wants to _use_ the schema as part of its computation (for +example, instructing an LLM agent to produce a value of a specific +shape, or driving a schema-aware transform) has no first-class access +to it. + +## 2. Decision + +Add one field to `TaskContext`: + +```typescript +export interface TaskContext { + runId: string; + nodeId: string; + scopePath: string[]; + signal: AbortSignal; + constraints?: TaskConstraints; + /** + * The dispatching node's declared output schema, per IR §3.5. + * Authoritative for this call: equal to or a narrowing of the + * registered task's outputSchema (decision 0003 Option 1'). + * The engine validates the task's return value against this + * schema after execution (IR §5.2); tasks may also use it to + * shape their computation (e.g. schema-guided LLM responses). + */ + outputSchema: JSONSchema; +} +``` + +The engine's runner populates this field from the dispatching +`WorkflowNode`'s `outputSchema` before invoking `task.execute`. + +`outputSchema` is declared as required (not optional) on `TaskContext`: +`TaskNode.outputSchema` is required by the IR contract +(`model/src/ir.ts`) and the static validator rejects task nodes that +omit it, so the runner can — and does — pass it unconditionally. + +## 3. Why this earns its place + +This is a near-zero-cost extension that exposes existing IR data to +the task implementer. It satisfies: + +- **P4 (boundary contract).** The IR-declared output schema _is_ half + of the task's boundary contract for this call (the other half — the + input — is already supplied as the `execute` argument). P4's + one-line test — _"Can I validate/test this part using only what its + boundary declares?"_ — is more directly satisfied when the task + itself can see its declared output shape, not merely have it + enforced from outside. +- **Decision 0003 alignment.** 0003 made the IR's output schema + authoritative. Making it visible to the task is the natural + consequence: if the IR is the source of truth, the task should be + able to consult that source. +- **Generality.** The change is not Copilot-specific. Any future + schema-aware task benefits without re-litigating: a structured- + response variant of `llm.generate`, a `json.transform` task that + reshapes input to the declared output, an MCP bridge that maps + the node's schema onto the upstream protocol, etc. + +## 4. Why this is NOT an IR change + +No IR field is added or removed. `outputSchema` already exists on +every task node (`ir-v1.md` §3.5). This decision changes only: + +- `workflow-model/src/taskDefinition.ts` — the `TaskContext` interface. +- `workflow-engine/src/runner.ts` — the runner populates the new + field when constructing the per-call `TaskContext`. + +`ir-v1.md` does not need editing. No validator rule changes. No +existing IR document semantics change. + +## 5. Alternatives considered + +### A. Pass the schema in via a side-channel (e.g., a per-runId map) + +Reject. Hides the contract from the task's documented interface; +implementers have to know the side-channel exists. The whole point of +`TaskContext` is to be the documented per-call contract handed to +tasks. + +### B. Have schema-aware tasks accept a `responseSchema` field on input + +Reject. Creates duplicate declarations of the same shape (the IR +node's `outputSchema` and the task's input `responseSchema`) which +must agree by convention but the engine can't enforce in a way that's +visible at one read site. P5 ("would a reader be surprised?") — yes, +because they'd have to know the redundancy is required. + +### C. Defer until the next schema-aware task earns the change + +Reject. The cost of the change is essentially zero (one field on a +context object, one population site in the runner). Doing it now +means decision 0010 (Copilot task family) lands cleanly and any +future schema-aware task gets the same affordance for free. Doing +it later means doing the migration of test fixtures and the runner +twice. + +## 6. Implementation notes + +- **No test-fixture cascade.** A scan of `engine/test/` and + `model/test/` confirms no test directly constructs a `TaskContext`; + all task execution flows through `WorkflowEngine.run`. The runner + populates the new field from the node it is dispatching, so + existing tests continue to work without per-fixture changes. +- **Schema is a JSON value, not an Ajv validator.** The runner does + NOT pre-compile a per-task `submit_response` validator or otherwise + cache schemas keyed by node — each task that wants to validate + against the schema brings its own validator (e.g. Ajv instance). + Keeping `TaskContext.outputSchema` as a plain `JSONSchema` mirrors + how `TaskDefinition.outputSchema` is typed today. + +## 7. Cross-references + +- [../../principles/design-principles.md](../../principles/design-principles.md) — P4. +- [0003-task-schema-source.md](0003-task-schema-source.md) — what made the IR's schemas the authoritative source this decision exposes. +- [0010-copilot-task-family.md](0010-copilot-task-family.md) — first consumer. +- [../ir-v1.md](../ir-v1.md) §3.5 (task node `outputSchema`), §5.2 (engine-side runtime output schema validation that this decision does **not** change). diff --git a/ts/examples/workflow/cli/src/cli.ts b/ts/examples/workflow/cli/src/cli.ts index 7eabd793b..c673fd254 100644 --- a/ts/examples/workflow/cli/src/cli.ts +++ b/ts/examples/workflow/cli/src/cli.ts @@ -188,6 +188,14 @@ async function cmdRun( console.error( `${prefix} Workflow failed${location}: ${result.error?.message ?? "unknown error"}`, ); + + // Log any structured context attached to the error. + if (result.error?.data !== undefined) { + console.error( + `${prefix} error data: ${JSON.stringify(result.error.data, null, 2)}`, + ); + } + process.exit(1); } } diff --git a/ts/examples/workflow/engine/package.json b/ts/examples/workflow/engine/package.json index 430d2fa1e..fb4e4f5fe 100644 --- a/ts/examples/workflow/engine/package.json +++ b/ts/examples/workflow/engine/package.json @@ -28,6 +28,7 @@ "tsc": "tsc -b" }, "dependencies": { + "@github/copilot-sdk": "^0.3.0", "aiclient": "workspace:*", "ajv": "^8.17.1", "debug": "^4.3.4", diff --git a/ts/examples/workflow/engine/src/builtinTaskSchemas.ts b/ts/examples/workflow/engine/src/builtinTaskSchemas.ts index c63287624..33887ff7f 100644 --- a/ts/examples/workflow/engine/src/builtinTaskSchemas.ts +++ b/ts/examples/workflow/engine/src/builtinTaskSchemas.ts @@ -235,6 +235,7 @@ export const BUILTIN_TASK_SCHEMAS: readonly BuiltinTaskSchema[] = [ outputSchema: { $typeParam: "N" }, }, { + // Not generic: output is always integer, regardless of input subtype. name: "math.floor", inputSchema: { type: "object", @@ -244,6 +245,7 @@ export const BUILTIN_TASK_SCHEMAS: readonly BuiltinTaskSchema[] = [ outputSchema: { type: "integer" }, }, { + // Not generic: output is always integer, regardless of input subtype. name: "math.round", inputSchema: { type: "object", @@ -253,6 +255,7 @@ export const BUILTIN_TASK_SCHEMAS: readonly BuiltinTaskSchema[] = [ outputSchema: { type: "integer" }, }, { + // Not generic: output is always integer, regardless of input subtype. name: "math.ceil", inputSchema: { type: "object", diff --git a/ts/examples/workflow/engine/src/builtinTasks.ts b/ts/examples/workflow/engine/src/builtinTasks.ts index a56796720..ea06ec706 100644 --- a/ts/examples/workflow/engine/src/builtinTasks.ts +++ b/ts/examples/workflow/engine/src/builtinTasks.ts @@ -24,7 +24,9 @@ import { } from "workflow-model"; import { isGenericBuiltinSchema } from "./builtinTaskSchemas.js"; import { openai } from "aiclient"; +import type { CustomAgentConfig } from "@github/copilot-sdk"; import { BUILTIN_TASK_SCHEMAS } from "./builtinTaskSchemas.js"; +import { invokeCopilotAgent } from "./copilotClientHost.js"; const SCHEMA_BY_NAME = new Map( BUILTIN_TASK_SCHEMAS.map((s) => [s.name, s] as const), @@ -341,6 +343,118 @@ export const llmGenerateJson: GenericTaskDefinition< }, }; +/** + * This task runs a Copilot agent turn against a fresh session, with the agent's + * response shaped by the IR node's declared `outputSchema`. + * + * Key contracts: + * - Registered output schema: `{type: "object"}`. The actual per-call + * output shape is whatever the IR node declares. + * - Authentication: env vars (`COPILOT_GITHUB_TOKEN` / `GH_TOKEN` / + * `GITHUB_TOKEN`) or the logged-in `copilot` CLI user. No IR knob. + * - Permission posture: Tools restricted to set listed in `allowedTools`. + * Requests to `allowedTools` are `approveAll` for v1. Capability-based + * security model is the longer-term follow-up (decision 0010 §7). + */ +export const copilotInvoke: TaskDefinition< + { + prompt: string; + model?: string; + systemMessage?: string; + customAgents?: CustomAgentConfig[]; + allowedTools?: string[]; + attachments?: Array<{ path: string }>; + timeoutMs?: number; + reasoningEffort?: "low" | "medium" | "high" | "xhigh"; + repairBudget?: number; + }, + unknown +> = { + name: "copilot.invoke", + sideEffects: true, + inputSchema: { + type: "object", + required: ["prompt"], + properties: { + prompt: { type: "string" }, + model: { type: "string" }, + systemMessage: { type: "string" }, + customAgents: { type: "array" }, + allowedTools: { type: "array", items: { type: "string" } }, + attachments: { + type: "array", + items: { + type: "object", + required: ["path"], + properties: { path: { type: "string" } }, + }, + }, + timeoutMs: { type: "integer" }, + reasoningEffort: { + type: "string", + enum: ["low", "medium", "high", "xhigh"], + }, + repairBudget: { + type: "integer", + minimum: 1, + maximum: 10, + description: "Schema-repair attempts; default 3.", + }, + }, + }, + // `copilot.invoke`'s actual per-call output shape is whatever the + // dispatching node declares (`{}` is JSONSchema for `any`). + outputSchema: {}, + async execute(input, ctx) { + // Validate any attachment paths against the same allowed roots + // file.read / file.write enforce. + if (input.attachments) { + for (const a of input.attachments) { + try { + a.path = validateFilePath(a.path); + } catch (err) { + return { + kind: "fail", + error: { + message: `copilot.invoke attachment ${a.path} rejected: ${err instanceof Error ? err.message : String(err)}`, + }, + }; + } + } + } + + const result = await invokeCopilotAgent({ + prompt: input.prompt, + outputSchema: ctx.outputSchema, + ...(input.model !== undefined ? { model: input.model } : {}), + ...(input.systemMessage !== undefined + ? { systemMessageAppend: input.systemMessage } + : {}), + ...(input.customAgents !== undefined + ? { customAgents: input.customAgents } + : {}), + ...(input.allowedTools !== undefined + ? { availableTools: input.allowedTools } + : {}), + ...(input.attachments !== undefined + ? { attachments: input.attachments } + : {}), + ...(input.timeoutMs !== undefined + ? { timeoutMs: input.timeoutMs } + : {}), + ...(input.reasoningEffort !== undefined + ? { reasoningEffort: input.reasoningEffort } + : {}), + ...(input.repairBudget !== undefined + ? { repairBudget: input.repairBudget } + : {}), + signal: ctx.signal, + }); + + return result; + }, +}; + // ---- Utility tasks ---- export const textTemplate: TaskDefinition< @@ -698,6 +812,7 @@ export const mathMultiply: TaskDefinition< }, }; +// Not generic: integer / integer can yield non-integer (1 / 2 = 0.5). export const mathDivide: TaskDefinition< { left: number; right: number }, number @@ -728,6 +843,7 @@ export const mathNegate: TaskDefinition<{ value: number }, number> = { }, }; +// Not generic: output is always integer, regardless of input subtype. export const mathFloor: TaskDefinition<{ value: number }, number> = { ...taskSchema("math.floor"), sideEffects: false, @@ -736,6 +852,7 @@ export const mathFloor: TaskDefinition<{ value: number }, number> = { }, }; +// Not generic: output is always integer, regardless of input subtype. export const mathRound: TaskDefinition<{ value: number }, number> = { ...taskSchema("math.round"), sideEffects: false, @@ -744,6 +861,7 @@ export const mathRound: TaskDefinition<{ value: number }, number> = { }, }; +// Not generic: output is always integer, regardless of input subtype. export const mathCeil: TaskDefinition<{ value: number }, number> = { ...taskSchema("math.ceil"), sideEffects: false, @@ -827,6 +945,7 @@ export const allBuiltinTasks: TaskDefinition[] = [ shellExec, llmGenerate, llmGenerateJson, + copilotInvoke, httpGet, fileRead, fileWrite, diff --git a/ts/examples/workflow/engine/src/copilotClientHost.ts b/ts/examples/workflow/engine/src/copilotClientHost.ts new file mode 100644 index 000000000..b72afc67b --- /dev/null +++ b/ts/examples/workflow/engine/src/copilotClientHost.ts @@ -0,0 +1,437 @@ +// Copyright (c) Microsoft Corporation. +// Licensed under the MIT License. + +/** + * Copilot SDK client host (decision 0010). + * + * Owns the `@github/copilot-sdk` `CopilotClient` lifecycle and provides + * the schema-guided turn driver that `copilot.invoke` builds on. + * + * Design notes: + * - SDK *types* are imported statically via `import type` so we + * never duplicate the SDK's surface to avoid drift. `import type` + * is erased at emit and does not trigger module loading. + * - SDK *runtime* values (`defineTool`, `approveAll`, `CopilotClient`) + * are loaded via dynamic `import("@github/copilot-sdk")` so + * consumers who never invoke any `copilot.*` task don't pay the + * Copilot CLI runtime install cost. + * - The SDK client is a lazy module-singleton: started on first call, + * disposed on engine shutdown. + * - `invokeCopilotAgent(...)` implements the §4 turn loop from + * decision 0010: register `submit_response` whose parameters are + * the node's declared `outputSchema`, run the agent in an + * ephemeral session, capture the validated arguments, and repair on + * failure within a bounded budget. + * - For tests, `setCopilotClientFactory` swaps in a mock client. + */ + +import AjvModule from "ajv"; +import Debug from "debug"; +import type { JSONSchema } from "workflow-model"; +import type { + CopilotClient, + CopilotSession, + CustomAgentConfig, + MessageOptions, + SessionConfig, +} from "@github/copilot-sdk"; + +const debug = Debug("typeagent:workflow:copilot"); + +/** Maximum length, in characters, that we log via debug(). */ +const DEBUG_TRUNCATE_LEN = 800; + +function truncateForDebug(s: string): string { + if (s.length <= DEBUG_TRUNCATE_LEN) return s; + return `${s.slice(0, DEBUG_TRUNCATE_LEN)}…[truncated ${s.length - DEBUG_TRUNCATE_LEN} chars]`; +} + +/** + * Extract assistant text for debug logging. + * + * Today the SDK type for `sendAndWait` is `AssistantMessageEvent | + * undefined`, and `AssistantMessageEvent.data.content` is a string. + * We still keep a narrow runtime guard because this path is diagnostics + * only: if the SDK/event payload drifts at runtime, we prefer to log a + * coarse marker instead of throwing from debug plumbing. + */ +function extractAssistantText(reply: unknown): string | undefined { + if (reply === undefined || reply === null) return undefined; + const data = (reply as { data?: unknown }).data; + if (typeof data !== "object" || data === null) { + return "[assistant message: unexpected data payload]"; + } + const content = (data as { content?: unknown }).content; + if (typeof content === "string") return content; + if (content === undefined || content === null) return undefined; + return "[assistant message: non-string content]"; +} + +const AjvConstructor = (AjvModule as any).default ?? AjvModule; + +// ---- Public types: structural views over the SDK types ---- +// +// These are derived from the SDK classes via `Pick` so the SDK is the +// single source of truth for method signatures. The narrow surface +// (only the members we actually call) keeps the test mock surface +// small while still failing the build if the SDK changes a signature +// out from under us. + +export type MinimalCopilotSession = Pick< + CopilotSession, + "sessionId" | "sendAndWait" | "disconnect" +>; + +export interface MinimalCopilotClient { + start: CopilotClient["start"]; + stop: CopilotClient["stop"]; + createSession(config: SessionConfig): Promise; +} + +/** + * Factory returning a started client. The default implementation + * dynamically imports `@github/copilot-sdk`. Tests inject a mock via + * `setCopilotClientFactory`. + */ +export type CopilotClientFactory = () => Promise; + +// ---- Lazy singleton + factory swap ---- + +let factory: CopilotClientFactory = defaultFactory; +let clientPromise: Promise | undefined; + +async function defaultFactory(): Promise { + // Dynamic import keeps the Copilot CLI runtime bundle out of the + // workflow-engine critical path until any copilot.* task actually runs. + const sdk = (await import("@github/copilot-sdk")) as any; + const client = new sdk.CopilotClient(); + await client.start(); + return client as MinimalCopilotClient; +} + +/** + * Swap the SDK client factory. Intended for tests; production code + * should leave the default in place. + */ +export function setCopilotClientFactory(fn: CopilotClientFactory): void { + factory = fn; + clientPromise = undefined; +} + +/** Reset the singleton (tests). */ +export function resetCopilotClientFactory(): void { + factory = defaultFactory; + clientPromise = undefined; +} + +/** Lazily start the singleton client. */ +export async function getCopilotClient(): Promise { + if (!clientPromise) { + clientPromise = factory(); + } + return clientPromise; +} + +/** + * Stop the lazy singleton SDK client, if started. Idempotent. + */ +export async function shutdownCopilotHost(): Promise { + if (!clientPromise) return; + const client = await clientPromise.catch(() => undefined); + clientPromise = undefined; + if (!client) return; + try { + await client.stop(); + } catch (err) { + debug("Error stopping Copilot client: %O", err); + } +} + +// ---- Schema-guided turn driver (decision 0010 §4) ---- + +const ajv = new AjvConstructor({ strict: false }); + +/** Result of an ephemeral copilot agent invocation (decision 0010 §4). */ +export type InvokeCopilotAgentResult = + | { kind: "ok"; output: unknown } + | { + kind: "fail"; + error: { message: string; data?: Record }; + }; + +export interface InvokeCopilotAgentOptions { + /** User-turn prompt. */ + prompt: string; + /** The node's IR-declared outputSchema. */ + outputSchema: JSONSchema; + /** Optional model name. */ + model?: string; + /** Author-supplied addition to the SDK system prompt (mode "append"). */ + systemMessageAppend?: string; + /** Custom sub-agent definitions. */ + customAgents?: CustomAgentConfig[]; + /** Allow-list of Copilot CLI runtime built-in tool names. */ + availableTools?: string[]; + /** File attachments (already path-validated by caller). */ + attachments?: Array<{ path: string }>; + /** + * Hard cap on session run time. Forwarded as the `timeout` second + * positional argument to `CopilotSession.sendAndWait`. + */ + timeoutMs?: number; + /** + * For models that support reasoning effort. Typed via the SDK so a + * change in supported levels is a compile error here, not a silent + * runtime mismatch. The IR-declared input schema for `copilot.invoke` + * narrows this to a fixed enum (decision 0010 §5). + */ + reasoningEffort?: SessionConfig["reasoningEffort"]; + /** Schema-repair attempts (default 3, range 1-10). */ + repairBudget?: number; + /** Engine cooperative-cancellation signal. */ + signal: AbortSignal; +} + +/** System-prompt scaffolding for the submit_response convention. */ +function buildSystemMessageContent( + submitParamsSchema: Record, + authorAppend?: string, +): string { + const schemaText = JSON.stringify(submitParamsSchema, null, 2); + const base = [ + "You are an AI agent driven by an automated workflow engine. Only your `submit_response` tool call is read; assistant text is ignored by the engine but allowed for reasoning.", + "", + "Call `submit_response` exactly once when done. Its `arguments` MUST match this JSON Schema:", + "", + "```json", + schemaText, + "```", + "", + "If rejected, the next user message contains the validator errors — fix the arguments and call `submit_response` again. Repair attempts are bounded.", + "", + "You may call other available tools before submitting.", + ].join("\n"); + return authorAppend ? `${base}\n\n${authorAppend}` : base; +} + +/** + * Implementation of the `copilot.invoke` builtin task: runs one or + * more agent turns against a fresh session, using a `submit_response` + * custom tool whose parameters JSON Schema is the IR node's + * `outputSchema`. Repairs (re-prompts) on validation failure or + * no-call-on-idle, up to `repairBudget` total attempts. + * + * The session is created and disposed inside this call (ephemeral). + */ +export async function invokeCopilotAgent( + options: InvokeCopilotAgentOptions, +): Promise { + options.signal.throwIfAborted(); + + // Validate budget bounds. + const budget = options.repairBudget ?? 3; + if (!Number.isInteger(budget) || budget < 1 || budget > 10) { + return { + kind: "fail", + error: { + message: `repairBudget must be an integer in [1, 10]; got ${budget}`, + }, + }; + } + + // LLM tool-calls MUST be a JSON-Schema object (not a bare string, etc.) + // Adapt scalar schemas like `{type: "string"}` by wrapping them in + // `{type:"object", properties: {value: }, required:["value"]}`. + // The captured value is unwrapped to the bare scalar before returning. + const userSchema = options.outputSchema as Record; + const userType = userSchema.type; + const isObjectShape = + userType === undefined || + userType === "object" || + (Array.isArray(userType) && (userType as unknown[]).includes("object")); + const submitSchema: Record = isObjectShape + ? userSchema + : { + type: "object", + properties: { value: userSchema }, + required: ["value"], + additionalProperties: false, + }; + + // Compile the validator for the (possibly wrapped) submit_response + // parameters schema. + let validate; + try { + validate = ajv.compile(submitSchema); + } catch (e) { + const m = e instanceof Error ? e.message : String(e); + return { + kind: "fail", + error: { message: `Invalid outputSchema: ${m}` }, + }; + } + + // Mutable capture cell shared with the submit_response handler. `value` + // holds the unwrapped node-output value (an object when the node's + // outputSchema is object-shaped, or the bare scalar otherwise). + const captured: { + value?: unknown; + hasValue?: boolean; + lastErrors?: string; + } = {}; + + const sdk = (await import("@github/copilot-sdk")) as any; + const client = await getCopilotClient(); + + // Build the synthetic submit_response tool. The SDK accepts a raw JSON + // Schema object as `parameters`. We also Ajv-validate inside the handler so + // we never accept malformed args even if SDK-side validation is permissive. + const submitTool = sdk.defineTool("submit_response", { + description: + "Submit your final answer. Call this exactly once when you have your final answer.", + parameters: submitSchema, + skipPermission: true, + handler: async (args: unknown) => { + if ( + typeof args !== "object" || + args === null || + Array.isArray(args) || + !validate(args) + ) { + const errs = validate?.errors + ? ajv.errorsText(validate.errors) + : "arguments must be a JSON object matching the schema"; + captured.lastErrors = errs; + return `\`submit_response\` rejected: ${errs}. Please call \`submit_response\` again with corrected arguments.`; + } + captured.value = isObjectShape + ? (args as Record) + : (args as Record).value; + captured.hasValue = true; + delete captured.lastErrors; + return "Response recorded."; + }, + }); + + const sessionConfig: SessionConfig = { + onPermissionRequest: sdk.approveAll, + tools: [submitTool], + systemMessage: { + mode: "append", + content: buildSystemMessageContent( + submitSchema, + options.systemMessageAppend, + ), + }, + }; + if (options.model !== undefined) sessionConfig.model = options.model; + if (options.reasoningEffort !== undefined) + sessionConfig.reasoningEffort = options.reasoningEffort; + if (options.customAgents !== undefined) + sessionConfig.customAgents = options.customAgents; + if (options.availableTools !== undefined) { + // Always add our synthetic `submit_response` tool to the user's + // allow-list so the model can complete the turn contract. + const merged = new Set(options.availableTools); + merged.add("submit_response"); + sessionConfig.availableTools = [...merged]; + } + + let session: MinimalCopilotSession | undefined; + const onAbort = () => { + // Best-effort: disconnect immediately on cancellation. + session?.disconnect().catch(() => undefined); + }; + options.signal.addEventListener("abort", onAbort, { once: true }); + + try { + session = await client.createSession(sessionConfig); + + const attachments: MessageOptions["attachments"] | undefined = + options.attachments?.map((a) => ({ + type: "file" as const, + path: a.path, + })); + + let attempt = 0; + let prompt = options.prompt; + let lastAssistantText: string | undefined; + while (attempt < budget) { + attempt++; + options.signal.throwIfAborted(); + debug( + "copilot.invoke attempt %d/%d (sessionId=%s) prompt=%s", + attempt, + budget, + session.sessionId, + truncateForDebug(prompt), + ); + + const sendOpts: MessageOptions = { prompt }; + if (attachments) sendOpts.attachments = attachments; + // `timeout` is the second positional arg to sendAndWait, NOT a + // field on MessageOptions. + const reply = await session.sendAndWait( + sendOpts, + options.timeoutMs, + ); + lastAssistantText = extractAssistantText(reply); + if (lastAssistantText !== undefined) { + debug( + "copilot.invoke attempt %d assistant text: %s", + attempt, + truncateForDebug(lastAssistantText), + ); + } + + if (captured.hasValue) { + return { kind: "ok", output: captured.value }; + } + + // No successful capture this turn. Build a repair prompt for the + // next attempt (if budget allows). + const reason = captured.lastErrors + ? `Your previous \`submit_response\` call was rejected: ${captured.lastErrors}.` + : `You did not call \`submit_response\`. You MUST call \`submit_response\` with arguments matching the required schema.`; + debug( + "copilot.invoke attempt %d rejected: %s", + attempt, + captured.lastErrors ?? "no submit_response call", + ); + prompt = `${reason} Please call \`submit_response\` now with corrected arguments matching the required schema.`; + } + + return { + kind: "fail", + error: { + message: `copilot.invoke exhausted repair budget (${budget}) without a valid submit_response call.`, + data: { + attempts: attempt, + lastErrors: captured.lastErrors ?? null, + lastAssistantText: lastAssistantText ?? null, + }, + }, + }; + } catch (err) { + if (options.signal.aborted || (err as Error)?.name === "AbortError") { + return { + kind: "fail", + error: { message: "copilot.invoke aborted" }, + }; + } + const m = err instanceof Error ? err.message : String(err); + return { + kind: "fail", + error: { message: `copilot.invoke error: ${m}` }, + }; + } finally { + options.signal.removeEventListener("abort", onAbort); + if (session) { + try { + await session.disconnect(); + } catch (e) { + debug("session.disconnect() failed: %O", e); + } + } + } +} diff --git a/ts/examples/workflow/engine/src/events.ts b/ts/examples/workflow/engine/src/events.ts index dececa844..c003d8e00 100644 --- a/ts/examples/workflow/engine/src/events.ts +++ b/ts/examples/workflow/engine/src/events.ts @@ -64,7 +64,11 @@ export type WorkflowEvent = | { type: "runFailed"; runId: string; - error: { message: string; data?: unknown }; + error: { + message: string; + nodeId?: string | undefined; + data?: unknown; + }; timestamp: number; } | { diff --git a/ts/examples/workflow/engine/src/index.ts b/ts/examples/workflow/engine/src/index.ts index 5d27e1322..d0a60f9b7 100644 --- a/ts/examples/workflow/engine/src/index.ts +++ b/ts/examples/workflow/engine/src/index.ts @@ -16,6 +16,7 @@ export { boolToLabel, shellExec, llmGenerate, + copilotInvoke, httpGet, fileRead, fileWrite, @@ -42,3 +43,14 @@ export { standardLibraryTasks, allBuiltinTasks, } from "./builtinTasks.js"; + +// TODO: The @github/copilot-sdk dependency hints that copilotInvoke may be +// a good first candidate for an external task. +export { + setCopilotClientFactory, + resetCopilotClientFactory, + shutdownCopilotHost, + type CopilotClientFactory, + type MinimalCopilotClient, + type MinimalCopilotSession, +} from "./copilotClientHost.js"; diff --git a/ts/examples/workflow/engine/src/runner.ts b/ts/examples/workflow/engine/src/runner.ts index 1b484632a..4c14b8ce7 100644 --- a/ts/examples/workflow/engine/src/runner.ts +++ b/ts/examples/workflow/engine/src/runner.ts @@ -399,7 +399,15 @@ export interface RunResult { runId: string; success: boolean; output?: unknown; - error?: { message: string; nodeId?: string | undefined }; + error?: { + message: string; + nodeId?: string | undefined; + /** + * Structured error context attached by the failing task. Opaque to the + * engine; callers may log or serialize it for diagnostics. + */ + data?: unknown; + }; } // ---- Engine ---- @@ -626,17 +634,31 @@ export class WorkflowEngine { return { runId, success: true, output }; } catch (err) { - const message = err instanceof Error ? err.message : String(err); - const nodeId = err instanceof TaskFailure ? err.nodeId : undefined; + const isTaskFailure = err instanceof TaskFailure; + const errorPayload: { + message: string; + nodeId?: string | undefined; + data?: unknown; + } = { + message: err instanceof Error ? err.message : String(err), + ...(isTaskFailure ? { nodeId: err.nodeId } : {}), + ...(isTaskFailure && err.taskError.data !== undefined + ? { data: err.taskError.data } + : {}), + }; this.emit({ type: "runFailed", runId, - error: { message }, + error: errorPayload, timestamp: Date.now(), }); - return { runId, success: false, error: { message, nodeId } }; + return { + runId, + success: false, + error: errorPayload, + }; } } @@ -959,9 +981,8 @@ export class WorkflowEngine { scopePath: [...scopePath], signal: taskSignal, ...(constraints ? { constraints } : {}), - ...(node.outputSchema - ? { outputSchema: node.outputSchema } - : {}), + // The dispatching node's declared output schema. + outputSchema: node.outputSchema, }; let result: TaskResult; diff --git a/ts/examples/workflow/engine/test/copilotInvoke.spec.ts b/ts/examples/workflow/engine/test/copilotInvoke.spec.ts new file mode 100644 index 000000000..db4c708ac --- /dev/null +++ b/ts/examples/workflow/engine/test/copilotInvoke.spec.ts @@ -0,0 +1,659 @@ +// Copyright (c) Microsoft Corporation. +// Licensed under the MIT License. + +/** + * Unit tests for copilot.invoke (decision 0010 §5). + * + * These tests use a mock CopilotClient injected via + * setCopilotClientFactory; the real @github/copilot-sdk is never + * loaded. Per repo policy, integration tests against the live SDK + * are out of scope (test:live). + */ + +import { + TaskRegistry, + WorkflowEngine, + copilotInvoke, + setCopilotClientFactory, + resetCopilotClientFactory, + type MinimalCopilotClient, + type MinimalCopilotSession, +} from "../src/index.js"; +import { TaskPolicy, WorkflowIR, ConcreteTaskDefinition } from "workflow-model"; +import type { MessageOptions, SessionConfig, Tool } from "@github/copilot-sdk"; + +// Allow-all policy for tests. +const allowAllPolicy: TaskPolicy = new Proxy({} as TaskPolicy, { + get: () => "allow" as const, +}); + +// ---- Mock client ---- + +type SubmitResponseHandler = (args: unknown) => Promise; + +function makeMockClient(script: AgentScript[]): { + client: MinimalCopilotClient; + sessions: MockSession[]; + stopCount: { value: number }; +} { + let scriptIdx = 0; + const sessions: MockSession[] = []; + const stopCount = { value: 0 }; + + const client: MinimalCopilotClient = { + async start() {}, + async stop() { + stopCount.value++; + return []; + }, + async createSession(config: SessionConfig) { + const submitTool = (config.tools ?? []).find( + (t: Tool) => t.name === "submit_response", + ); + if (!submitTool) { + throw new Error( + "mock createSession: expected a submit_response tool to be registered", + ); + } + // SDK ToolHandler is `(args, invocation)`; our submit_response + // handler ignores `invocation`, so it's safe to call with one arg. + const handler = submitTool.handler as SubmitResponseHandler; + const session = new MockSession(config, handler, () => { + if (scriptIdx >= script.length) { + throw new Error( + `mock client ran out of scripted turns (asked for #${scriptIdx + 1}, have ${script.length})`, + ); + } + return script[scriptIdx++]!; + }); + sessions.push(session); + return session; + }, + }; + + return { client, sessions, stopCount }; +} + +interface AgentScript { + /** + * Function that, given the prompt and the submit_response tool + * handler, simulates the agent's actions for one sendAndWait + * call. Returns when the "session" goes idle. May call the + * tool handler 0 or more times. + */ + onSend: ( + prompt: string, + callSubmit: SubmitResponseHandler, + ) => Promise; +} + +class MockSession implements MinimalCopilotSession { + public sessionId: string; + public sentPrompts: string[] = []; + public sentTimeouts: Array = []; + public disconnected = false; + constructor( + public config: SessionConfig, + private submitHandler: SubmitResponseHandler, + private nextScript: () => AgentScript, + ) { + this.sessionId = `mock-${Math.random().toString(36).slice(2, 8)}`; + } + async sendAndWait(opts: MessageOptions, timeout?: number) { + this.sentPrompts.push(opts.prompt); + this.sentTimeouts.push(timeout); + const script = this.nextScript(); + await script.onSend(opts.prompt, this.submitHandler); + return undefined; + } + async disconnect() { + this.disconnected = true; + } +} + +// ---- Tests ---- + +describe("copilot.invoke (decision 0010)", () => { + afterEach(() => { + resetCopilotClientFactory(); + }); + + function makeEngine() { + const reg = new TaskRegistry(); + reg.register(copilotInvoke); + return new WorkflowEngine(reg); + } + + function makeIR(opts: { + outputSchema: Record; + inputs: Record; + }): WorkflowIR { + return { + kind: "workflow", + version: "1", + entry: "copilotTest", + workflows: { + copilotTest: { + inputSchema: { type: "object" }, + outputSchema: opts.outputSchema, + entry: "step", + nodes: { + step: { + kind: "task", + task: "copilot.invoke", + inputSchema: ( + copilotInvoke as ConcreteTaskDefinition + ).inputSchema, + outputSchema: opts.outputSchema, + inputs: opts.inputs as any, + bind: "result", + }, + }, + output: { $from: "scope", name: "result" } as any, + }, + }, + }; + } + + it("happy path: agent calls submit_response with valid args", async () => { + const { client } = makeMockClient([ + { + async onSend(_prompt, callSubmit) { + await callSubmit({ summary: "hi", count: 3 }); + }, + }, + ]); + setCopilotClientFactory(async () => client); + + const eng = makeEngine(); + const ir = makeIR({ + outputSchema: { + type: "object", + required: ["summary", "count"], + properties: { + summary: { type: "string" }, + count: { type: "integer" }, + }, + }, + inputs: { prompt: "do the thing" }, + }); + + const result = await eng.run(ir, { policy: allowAllPolicy }); + expect(result.success).toBe(true); + expect(result.output).toEqual({ summary: "hi", count: 3 }); + }); + + it("repair loop: invalid args first, valid on second turn", async () => { + const { client, sessions } = makeMockClient([ + { + async onSend(_p, callSubmit) { + // Wrong shape — missing required `count`. + await callSubmit({ summary: "hi" }); + }, + }, + { + async onSend(_p, callSubmit) { + await callSubmit({ summary: "hi", count: 7 }); + }, + }, + ]); + setCopilotClientFactory(async () => client); + + const eng = makeEngine(); + const ir = makeIR({ + outputSchema: { + type: "object", + required: ["summary", "count"], + properties: { + summary: { type: "string" }, + count: { type: "integer" }, + }, + }, + inputs: { prompt: "do the thing" }, + }); + + const result = await eng.run(ir, { policy: allowAllPolicy }); + expect(result.success).toBe(true); + expect(result.output).toEqual({ summary: "hi", count: 7 }); + // Two sendAndWait turns were used. + expect(sessions[0]!.sentPrompts.length).toBe(2); + // Second prompt is a repair nudge. + expect(sessions[0]!.sentPrompts[1]).toContain("submit_response"); + }); + + it("repair loop: idle without calling submit_response is repaired", async () => { + const { client, sessions } = makeMockClient([ + { + // Agent says nothing useful and goes idle. + async onSend() {}, + }, + { + async onSend(_p, callSubmit) { + await callSubmit({ summary: "answer", count: 1 }); + }, + }, + ]); + setCopilotClientFactory(async () => client); + + const eng = makeEngine(); + const ir = makeIR({ + outputSchema: { + type: "object", + required: ["summary", "count"], + properties: { + summary: { type: "string" }, + count: { type: "integer" }, + }, + }, + inputs: { prompt: "..." }, + }); + + const result = await eng.run(ir, { policy: allowAllPolicy }); + expect(result.success).toBe(true); + expect(result.output).toEqual({ summary: "answer", count: 1 }); + expect(sessions[0]!.sentPrompts[1]).toContain( + "did not call `submit_response`", + ); + }); + + it("budget exhaustion fails with diagnostic data", async () => { + // 3 attempts, all invalid; default budget is 3. + const { client } = makeMockClient([ + { + async onSend(_p, callSubmit) { + await callSubmit({ wrong: "shape" }); + }, + }, + { + async onSend(_p, callSubmit) { + await callSubmit({ wrong: "shape" }); + }, + }, + { + async onSend(_p, callSubmit) { + await callSubmit({ wrong: "shape" }); + }, + }, + ]); + setCopilotClientFactory(async () => client); + + const eng = makeEngine(); + const ir = makeIR({ + outputSchema: { + type: "object", + required: ["summary", "count"], + properties: { + summary: { type: "string" }, + count: { type: "integer" }, + }, + }, + inputs: { prompt: "..." }, + }); + + const result = await eng.run(ir, { policy: allowAllPolicy }); + expect(result.success).toBe(false); + expect(result.error?.message).toMatch(/exhausted repair budget/); + // error.data must surface the per-attempt diagnostics so the + // CLI can print them. See copilotClientHost.ts (the failure + // payload) and runner.ts RunResult (which now propagates `data`). + const data = result.error?.data as + | { + attempts?: number; + lastErrors?: string | null; + lastAssistantText?: string | null; + } + | undefined; + expect(data).toBeDefined(); + expect(data?.attempts).toBe(3); + expect(typeof data?.lastErrors).toBe("string"); + expect(data?.lastErrors).toMatch(/required|summary|count/); + }); + + it("respects custom repairBudget input", async () => { + const { client, sessions } = makeMockClient([ + { + async onSend(_p, callSubmit) { + await callSubmit({ wrong: true }); + }, + }, + ]); + setCopilotClientFactory(async () => client); + + const eng = makeEngine(); + const ir = makeIR({ + outputSchema: { + type: "object", + required: ["summary"], + properties: { summary: { type: "string" } }, + }, + inputs: { prompt: "...", repairBudget: 1 }, + }); + + const result = await eng.run(ir, { policy: allowAllPolicy }); + expect(result.success).toBe(false); + expect(sessions[0]!.sentPrompts.length).toBe(1); + }); + + it("string outputSchema: wraps submit_response and unwraps the bare value", async () => { + let observedParams: unknown; + const { client } = makeMockClient([ + { + async onSend(_prompt, callSubmit) { + // Free-text mode: node declared `outputSchema: { type: "string" }`, + // so submit_response is wrapped as `{ value: }`. + await callSubmit({ value: "hello world" }); + }, + }, + ]); + // Spy on the synthetic tool's parameters via createSession. + const wrappedClient: MinimalCopilotClient = { + ...client, + async createSession(config) { + const submitTool = (config.tools ?? []).find( + (t: Tool) => t.name === "submit_response", + ); + observedParams = ( + submitTool as { parameters?: unknown } | undefined + )?.parameters; + return client.createSession(config); + }, + }; + setCopilotClientFactory(async () => wrappedClient); + + const eng = makeEngine(); + const ir = makeIR({ + outputSchema: { type: "string" }, + inputs: { prompt: "give me a string" }, + }); + + const result = await eng.run(ir, { policy: allowAllPolicy }); + expect(result.success).toBe(true); + // Output is unwrapped to the bare string, not the `{value: ...}` + // envelope. + expect(result.output).toBe("hello world"); + // submit_response's params were wrapped so the LLM tool-call + // constraint (object-typed params) is satisfied. + expect(observedParams).toEqual({ + type: "object", + properties: { value: { type: "string" } }, + required: ["value"], + additionalProperties: false, + }); + }); + + it("validates repairBudget bounds", async () => { + const { client } = makeMockClient([]); + setCopilotClientFactory(async () => client); + + const eng = makeEngine(); + const ir = makeIR({ + outputSchema: { + type: "object", + properties: { x: { type: "string" } }, + }, + inputs: { prompt: "...", repairBudget: 99 }, + }); + + const result = await eng.run(ir, { policy: allowAllPolicy }); + // The IR validates repairBudget (max 10) at IR validation time + // because the input schema declares minimum/maximum. + expect(result.success).toBe(false); + }); + + it("passes model, customAgents, availableTools through to SDK", async () => { + const { client, sessions } = makeMockClient([ + { + async onSend(_p, callSubmit) { + await callSubmit({ ok: true }); + }, + }, + ]); + setCopilotClientFactory(async () => client); + + const eng = makeEngine(); + const ir = makeIR({ + outputSchema: { + type: "object", + required: ["ok"], + properties: { ok: { type: "boolean" } }, + }, + inputs: { + prompt: "...", + model: "gpt-5", + customAgents: [ + { name: "researcher", description: "x", prompt: "y" }, + ], + allowedTools: ["view", "grep"], + reasoningEffort: "high", + systemMessage: "extra rules go here", + }, + }); + + const result = await eng.run(ir, { policy: allowAllPolicy }); + expect(result.success).toBe(true); + const cfg = sessions[0]!.config; + expect(cfg.model).toBe("gpt-5"); + expect(cfg.customAgents).toEqual([ + { name: "researcher", description: "x", prompt: "y" }, + ]); + // The host always merges `submit_response` into the SDK + // `availableTools` allow-list so the synthetic termination + // tool stays exposed alongside whatever built-ins the IR + // permits. See copilotClientHost.ts. + expect(cfg.availableTools).toEqual( + expect.arrayContaining(["view", "grep", "submit_response"]), + ); + expect(cfg.availableTools).toHaveLength(3); + expect(cfg.reasoningEffort).toBe("high"); + // The SDK system-message scaffolding uses mode "append" and + // includes the schema text plus the author's addendum. + expect(cfg.systemMessage?.mode).toBe("append"); + expect(cfg.systemMessage?.content).toContain("submit_response"); + expect(cfg.systemMessage?.content).toContain("extra rules go here"); + }); + + it("keeps submit_response available even when allowedTools is empty", async () => { + // Regression: an `allowedTools: []` IR input (deny all CLI + // built-ins) was previously forwarded as `availableTools: []`, + // which the SDK reads as "no tools at all" — including the + // synthetic `submit_response` — making it impossible for the + // model to terminate the turn-loop. The host must always merge + // `submit_response` into the allow-list. + const { client, sessions } = makeMockClient([ + { + async onSend(_p, callSubmit) { + await callSubmit({ ok: true }); + }, + }, + ]); + setCopilotClientFactory(async () => client); + + const eng = makeEngine(); + const ir = makeIR({ + outputSchema: { + type: "object", + required: ["ok"], + properties: { ok: { type: "boolean" } }, + }, + inputs: { + prompt: "...", + allowedTools: [], + }, + }); + + const result = await eng.run(ir, { policy: allowAllPolicy }); + expect(result.success).toBe(true); + expect(sessions[0]!.config.availableTools).toEqual(["submit_response"]); + }); + + it("disposes the session after a successful call", async () => { + const { client, sessions } = makeMockClient([ + { + async onSend(_p, callSubmit) { + await callSubmit({ x: "y" }); + }, + }, + ]); + setCopilotClientFactory(async () => client); + + const eng = makeEngine(); + const ir = makeIR({ + outputSchema: { + type: "object", + required: ["x"], + properties: { x: { type: "string" } }, + }, + inputs: { prompt: "..." }, + }); + + await eng.run(ir, { policy: allowAllPolicy }); + expect(sessions[0]!.disconnected).toBe(true); + }); + + it("disposes the session even on failure", async () => { + const { client, sessions } = makeMockClient([ + { + async onSend() { + throw new Error("boom"); + }, + }, + ]); + setCopilotClientFactory(async () => client); + + const eng = makeEngine(); + const ir = makeIR({ + outputSchema: { + type: "object", + required: ["x"], + properties: { x: { type: "string" } }, + }, + inputs: { prompt: "..." }, + }); + + const result = await eng.run(ir, { policy: allowAllPolicy }); + expect(result.success).toBe(false); + expect(sessions[0]!.disconnected).toBe(true); + }); + + it("rejects attachment paths outside the allowed roots", async () => { + let factoryCalled = false; + setCopilotClientFactory(async () => { + factoryCalled = true; + throw new Error("factory should not be called"); + }); + + const eng = makeEngine(); + const ir = makeIR({ + outputSchema: { + type: "object", + required: ["x"], + properties: { x: { type: "string" } }, + }, + inputs: { + prompt: "...", + attachments: [{ path: "/etc/passwd" }], + }, + }); + + const result = await eng.run(ir, { policy: allowAllPolicy }); + expect(result.success).toBe(false); + expect(result.error?.message).toMatch(/attachment.*rejected/); + expect(factoryCalled).toBe(false); + }); + + it("respects ctx.signal cancellation", async () => { + const ctrl = new AbortController(); + + // Script that "hangs" until aborted, then resolves. + const { client, sessions } = makeMockClient([ + { + async onSend() { + return new Promise((resolve) => { + ctrl.signal.addEventListener("abort", () => resolve(), { + once: true, + }); + }); + }, + }, + ]); + setCopilotClientFactory(async () => client); + + const eng = makeEngine(); + const ir = makeIR({ + outputSchema: { + type: "object", + required: ["x"], + properties: { x: { type: "string" } }, + }, + inputs: { prompt: "..." }, + }); + + const promise = eng.run(ir, { + policy: allowAllPolicy, + signal: ctrl.signal, + }); + // Abort mid-flight. + setTimeout(() => ctrl.abort(), 30); + + const result = await promise; + expect(result.success).toBe(false); + // Either via the engine's "Run cancelled" or copilot's abort path. + expect(result.error?.message.toLowerCase()).toMatch(/cancel|abort/); + // The session should still have been disconnected. + expect(sessions[0]!.disconnected).toBe(true); + }); + + it("forwards timeoutMs as sendAndWait's second positional arg", async () => { + // The SDK's sendAndWait signature is `(options, timeout?)` — not + // a `timeout` field on `options`. This guards against drifting + // back to the buggy options-bag form (which the SDK silently + // ignores). + const { client, sessions } = makeMockClient([ + { + async onSend(_p, callSubmit) { + await callSubmit({ x: "ok" }); + }, + }, + ]); + setCopilotClientFactory(async () => client); + + const eng = makeEngine(); + const ir = makeIR({ + outputSchema: { + type: "object", + required: ["x"], + properties: { x: { type: "string" } }, + }, + inputs: { prompt: "...", timeoutMs: 12345 }, + }); + + const result = await eng.run(ir, { policy: allowAllPolicy }); + expect(result.success).toBe(true); + expect(sessions[0]!.sentTimeouts).toEqual([12345]); + }); + + it("omits timeout when timeoutMs is not provided", async () => { + const { client, sessions } = makeMockClient([ + { + async onSend(_p, callSubmit) { + await callSubmit({ x: "ok" }); + }, + }, + ]); + setCopilotClientFactory(async () => client); + + const eng = makeEngine(); + const ir = makeIR({ + outputSchema: { + type: "object", + required: ["x"], + properties: { x: { type: "string" } }, + }, + inputs: { prompt: "..." }, + }); + + const result = await eng.run(ir, { policy: allowAllPolicy }); + expect(result.success).toBe(true); + expect(sessions[0]!.sentTimeouts).toEqual([undefined]); + }); +}); diff --git a/ts/examples/workflow/engine/test/engine.spec.ts b/ts/examples/workflow/engine/test/engine.spec.ts index 0e99b904a..5ff3b6a3a 100644 --- a/ts/examples/workflow/engine/test/engine.spec.ts +++ b/ts/examples/workflow/engine/test/engine.spec.ts @@ -4904,7 +4904,8 @@ describe("WorkflowEngine (IR v1)", () => { nodeId: "test", scopePath: [], signal: new AbortController().signal, - }, + outputSchema: { type: "object" }, + } as any, ); expect(result.kind).toBe("fail"); if (result.kind === "fail") { diff --git a/ts/examples/workflow/model/src/taskDefinition.ts b/ts/examples/workflow/model/src/taskDefinition.ts index 96f7ecbb9..a54f38d5d 100644 --- a/ts/examples/workflow/model/src/taskDefinition.ts +++ b/ts/examples/workflow/model/src/taskDefinition.ts @@ -35,8 +35,29 @@ export interface TaskContext { */ constraints?: TaskConstraints; - /** The node's declared output schema, if any. */ - outputSchema?: JSONSchema; + /** + * The dispatching node's declared output schema. Tasks may use it + * to shape their computation (e.g. schema-guided LLM responses, per + * copilot.invoke). + * + * Always present: TaskNode.outputSchema is required by the IR contract + * (`model/src/ir.ts`) and the static validator rejects task nodes that + * omit it, so the runner can — and does — pass it unconditionally. + * + * The schema is a JSON Schema 7 value. A typical schema-guided task + * dispatches on its shape: + * - `{ type: "object", properties: { ... } }` — produce a structured + * JSON object matching the declared properties. + * - `{ type: "string" }` — produce free text; the returned value is a + * plain string. + * - `{}` (the top schema) — produce anything; the task is free to + * return any JSON value. + * + * NOTE: The engine always validates the task's return value against the + * output schema after execution. Tasks normally do not need to do this, + * unless the task uses the results internally. + */ + outputSchema: JSONSchema; } /** diff --git a/ts/examples/workflow/workflows/ir/d10-conventional-commit.json b/ts/examples/workflow/workflows/ir/d10-conventional-commit.json new file mode 100644 index 000000000..9f888522e --- /dev/null +++ b/ts/examples/workflow/workflows/ir/d10-conventional-commit.json @@ -0,0 +1,988 @@ +{ + "kind": "workflow", + "version": "1", + "description": "Generate a Conventional Commits message from staged git changes. Plumbing (file enumeration, per-file diff, templating, joining, looping) is fully deterministic; copilot.invoke is used twice with schema-guided structured output (decisions 0010/0011): once per file to categorize the change, then once at the end to synthesize the final message.", + "constants": { + "categorizePromptTemplate": { + "schema": { + "type": "string" + }, + "value": "You are categorizing one file change for a Conventional Commits message.\n\nFile: {{file}}\n\nGit diff (staged):\n{{diff}}\n\nCall the `submit_response` tool with:\n- `type`: one of feat, fix, refactor, perf, docs, test, style, chore, build, ci. Pick the type that best describes the dominant change in this file.\n- `scope`: a short lower-case scope (1-2 words, e.g. a package or module name). Use \"\" if no obvious scope.\n- `summary`: a single concise imperative-mood phrase (≤ 80 chars) describing what changed in this file. No trailing period." + }, + "synthesizePromptTemplate": { + "schema": { + "type": "string" + }, + "value": "You are composing a single Conventional Commits message that covers all of the following per-file changes.\n\n{{bullets}}\n\nCall the `submit_response` tool with:\n- `type`: the dominant Conventional Commits type across the files (feat, fix, refactor, perf, docs, test, style, chore, build, ci). When mixed, prefer feat > fix > refactor > perf > test > docs > build > ci > style > chore.\n- `scope`: a short lower-case scope shared by the changes, or \"\" if there is no single coherent scope.\n- `subject`: an imperative-mood subject line ≤ 72 chars, no trailing period, summarizing the overall intent of the commit (NOT a list of files).\n- `body`: a 1-3 paragraph body explaining what changed and why, not how. Use plain prose, no markdown headings, no bullet lists." + }, + "subjectWithScopeTemplate": { + "schema": { + "type": "string" + }, + "value": "{{type}}({{scope}}): {{subject}}" + }, + "subjectNoScopeTemplate": { + "schema": { + "type": "string" + }, + "value": "{{type}}: {{subject}}" + }, + "finalMessageTemplate": { + "schema": { + "type": "string" + }, + "value": "{{header}}\n\n{{body}}\n\n# Files\n{{bullets}}\n" + } + }, + "entry": "d10-conventional-commit", + "workflows": { + "d10-conventional-commit": { + "inputSchema": { + "type": "object", + "required": ["repoPath"], + "properties": { + "repoPath": { + "type": "string", + "description": "Absolute path to the git repo with staged changes." + } + } + }, + "outputSchema": { + "type": "object", + "required": [ + "message", + "type", + "scope", + "subject", + "body", + "fileBullets" + ], + "properties": { + "message": { + "type": "string", + "description": "Final conventional commit message ready to pass to git commit -m." + }, + "type": { + "type": "string" + }, + "scope": { + "type": "string" + }, + "subject": { + "type": "string" + }, + "body": { + "type": "string" + }, + "fileBullets": { + "type": "string" + } + } + }, + "entry": "getFileList", + "nodes": { + "getFileList": { + "kind": "task", + "task": "shell.exec", + "inputSchema": { + "type": "object", + "required": ["command"], + "properties": { + "command": { + "type": "string" + }, + "args": { + "type": "array", + "items": { + "type": "string" + } + }, + "cwd": { + "type": "string" + } + } + }, + "outputSchema": { + "type": "object", + "required": ["stdout", "stderr", "exitCode"], + "properties": { + "stdout": { + "type": "string" + }, + "stderr": { + "type": "string" + }, + "exitCode": { + "type": "integer" + } + } + }, + "inputs": { + "command": "git", + "args": ["diff", "--staged", "--name-only"], + "cwd": { + "$from": "input", + "name": "repoPath" + } + }, + "next": "splitFiles", + "bind": "fileListResult" + }, + "splitFiles": { + "kind": "task", + "task": "string.split", + "inputSchema": { + "type": "object", + "required": ["text", "delimiter"], + "properties": { + "text": { + "type": "string" + }, + "delimiter": { + "type": "string" + } + } + }, + "outputSchema": { + "type": "array", + "items": { + "type": "string" + } + }, + "inputs": { + "text": { + "$from": "scope", + "name": "fileListResult", + "path": ["stdout"] + }, + "delimiter": "\n" + }, + "next": "fileLoop", + "bind": "files" + }, + "fileLoop": { + "kind": "loop", + "inputs": { + "files": { + "$from": "scope", + "name": "files" + }, + "repoPath": { + "$from": "input", + "name": "repoPath" + }, + "categorizePromptTemplate": { + "$from": "constant", + "name": "categorizePromptTemplate" + } + }, + "state": { + "i": { + "schema": { + "type": "integer" + }, + "initial": 0 + }, + "bullets": { + "schema": { + "type": "array", + "items": { + "type": "string" + } + }, + "initial": [] + } + }, + "body": { + "entry": "pickFile", + "nodes": { + "pickFile": { + "kind": "task", + "task": "list.elementAt", + "inputSchema": { + "type": "object", + "required": ["list", "index"], + "properties": { + "list": { + "type": "array" + }, + "index": { + "type": "integer" + } + } + }, + "outputSchema": { + "type": "string" + }, + "inputs": { + "list": { + "$from": "input", + "name": "files" + }, + "index": { + "$from": "state", + "name": "i" + } + }, + "next": "getFileDiff", + "bind": "picked" + }, + "getFileDiff": { + "kind": "task", + "task": "shell.exec", + "inputSchema": { + "type": "object", + "required": ["command"], + "properties": { + "command": { + "type": "string" + }, + "args": { + "type": "array", + "items": { + "type": "string" + } + }, + "cwd": { + "type": "string" + } + } + }, + "outputSchema": { + "type": "object", + "required": ["stdout", "stderr", "exitCode"], + "properties": { + "stdout": { + "type": "string" + }, + "stderr": { + "type": "string" + }, + "exitCode": { + "type": "integer" + } + } + }, + "inputs": { + "command": "git", + "args": [ + "diff", + "--staged", + "--", + { + "$from": "scope", + "name": "picked" + } + ], + "cwd": { + "$from": "input", + "name": "repoPath" + } + }, + "next": "buildCategorizePrompt", + "bind": "fileDiff" + }, + "buildCategorizePrompt": { + "kind": "task", + "task": "text.template", + "inputSchema": { + "type": "object", + "required": ["template", "vars"], + "properties": { + "template": { + "type": "string" + }, + "vars": { + "type": "object" + } + } + }, + "outputSchema": { + "type": "string" + }, + "inputs": { + "template": { + "$from": "input", + "name": "categorizePromptTemplate" + }, + "vars": { + "file": { + "$from": "scope", + "name": "picked" + }, + "diff": { + "$from": "scope", + "name": "fileDiff", + "path": ["stdout"] + } + } + }, + "next": "categorize", + "bind": "categorizePrompt" + }, + "categorize": { + "kind": "task", + "task": "copilot.invoke", + "inputSchema": { + "type": "object", + "required": ["prompt"], + "properties": { + "prompt": { + "type": "string" + }, + "model": { + "type": "string" + }, + "allowedTools": { + "type": "array", + "items": { + "type": "string" + } + } + } + }, + "outputSchema": { + "type": "object", + "required": ["type", "scope", "summary"], + "properties": { + "type": { + "type": "string", + "enum": [ + "feat", + "fix", + "refactor", + "perf", + "docs", + "test", + "style", + "chore", + "build", + "ci" + ], + "description": "Conventional Commits type for this file." + }, + "scope": { + "type": "string", + "description": "Short lower-case scope, or empty string." + }, + "summary": { + "type": "string", + "description": "Imperative-mood phrase ≤ 80 chars describing the file change." + } + } + }, + "inputs": { + "prompt": { + "$from": "scope", + "name": "categorizePrompt" + }, + "model": "claude-sonnet-4.6", + "allowedTools": [] + }, + "next": "formatBullet", + "bind": "categorization" + }, + "formatBullet": { + "kind": "task", + "task": "text.template", + "inputSchema": { + "type": "object", + "required": ["template", "vars"], + "properties": { + "template": { + "type": "string" + }, + "vars": { + "type": "object" + } + } + }, + "outputSchema": { + "type": "string" + }, + "inputs": { + "template": "- {{type}} ({{scope}}) {{file}}: {{summary}}", + "vars": { + "type": { + "$from": "scope", + "name": "categorization", + "path": ["type"] + }, + "scope": { + "$from": "scope", + "name": "categorization", + "path": ["scope"] + }, + "file": { + "$from": "scope", + "name": "picked" + }, + "summary": { + "$from": "scope", + "name": "categorization", + "path": ["summary"] + } + } + }, + "next": "appendBullet", + "bind": "bullet" + }, + "appendBullet": { + "kind": "task", + "task": "list.append", + "inputSchema": { + "type": "object", + "required": ["list", "item"], + "properties": { + "list": { + "type": "array" + }, + "item": {} + } + }, + "outputSchema": { + "type": "array", + "items": { + "type": "string" + } + }, + "inputs": { + "list": { + "$from": "state", + "name": "bullets" + }, + "item": { + "$from": "scope", + "name": "bullet" + } + }, + "next": "stepIndex", + "bind": "appended" + }, + "stepIndex": { + "kind": "task", + "task": "math.add", + "inputSchema": { + "type": "object", + "required": ["left", "right"], + "properties": { + "left": { + "type": "number" + }, + "right": { + "type": "number" + } + } + }, + "outputSchema": { + "type": "number" + }, + "inputs": { + "left": { + "$from": "state", + "name": "i" + }, + "right": 1 + }, + "next": "computeLength", + "bind": "stepped" + }, + "computeLength": { + "kind": "task", + "task": "list.length", + "inputSchema": { + "type": "object", + "required": ["list"], + "properties": { + "list": { + "type": "array" + } + } + }, + "outputSchema": { + "type": "integer" + }, + "inputs": { + "list": { + "$from": "input", + "name": "files" + } + }, + "next": "compareIndex", + "bind": "fileCount" + }, + "compareIndex": { + "kind": "task", + "task": "compare.lessThan", + "inputSchema": { + "type": "object", + "required": ["left", "right"], + "properties": { + "left": { + "type": "number" + }, + "right": { + "type": "number" + } + } + }, + "outputSchema": { + "type": "boolean" + }, + "inputs": { + "left": { + "$from": "scope", + "name": "stepped" + }, + "right": { + "$from": "scope", + "name": "fileCount" + } + }, + "bind": "hasMore" + } + }, + "inputSchema": { + "type": "object", + "required": ["files", "repoPath", "categorizePromptTemplate"], + "properties": { + "files": { + "type": "array", + "items": { + "type": "string" + } + }, + "repoPath": { + "type": "string" + }, + "categorizePromptTemplate": { + "type": "string" + } + } + }, + "output": { + "$from": "scope", + "name": "appended" + }, + "outputSchema": { + "type": "array", + "items": { + "type": "string" + } + } + }, + "continueWhen": { + "$from": "scope", + "name": "hasMore" + }, + "iterateState": { + "i": { + "$from": "scope", + "name": "stepped" + }, + "bullets": { + "$from": "scope", + "name": "appended" + } + }, + "maxIterations": 200, + "next": "joinBullets", + "bind": "fileBullets" + }, + "joinBullets": { + "kind": "task", + "task": "string.join", + "inputSchema": { + "type": "object", + "required": ["list", "delimiter"], + "properties": { + "list": { + "type": "array", + "items": { + "type": "string" + } + }, + "delimiter": { + "type": "string" + } + } + }, + "outputSchema": { + "type": "string" + }, + "inputs": { + "list": { + "$from": "scope", + "name": "fileBullets" + }, + "delimiter": "\n" + }, + "next": "buildSynthesisPrompt", + "bind": "bulletsText" + }, + "buildSynthesisPrompt": { + "kind": "task", + "task": "text.template", + "inputSchema": { + "type": "object", + "required": ["template", "vars"], + "properties": { + "template": { + "type": "string" + }, + "vars": { + "type": "object" + } + } + }, + "outputSchema": { + "type": "string" + }, + "inputs": { + "template": { + "$from": "constant", + "name": "synthesizePromptTemplate" + }, + "vars": { + "bullets": { + "$from": "scope", + "name": "bulletsText" + } + } + }, + "next": "synthesize", + "bind": "synthesisPrompt" + }, + "synthesize": { + "kind": "task", + "task": "copilot.invoke", + "inputSchema": { + "type": "object", + "required": ["prompt"], + "properties": { + "prompt": { + "type": "string" + }, + "model": { + "type": "string" + }, + "allowedTools": { + "type": "array", + "items": { + "type": "string" + } + } + } + }, + "outputSchema": { + "type": "object", + "required": ["type", "scope", "subject", "body"], + "properties": { + "type": { + "type": "string", + "enum": [ + "feat", + "fix", + "refactor", + "perf", + "docs", + "test", + "style", + "chore", + "build", + "ci" + ], + "description": "Dominant Conventional Commits type across the staged files." + }, + "scope": { + "type": "string", + "description": "Shared scope, or empty string." + }, + "subject": { + "type": "string", + "description": "Imperative-mood subject line, ≤ 72 chars, no trailing period." + }, + "body": { + "type": "string", + "description": "1-3 paragraph plain-prose body explaining what changed and why." + } + } + }, + "inputs": { + "prompt": { + "$from": "scope", + "name": "synthesisPrompt" + }, + "model": "claude-sonnet-4.6", + "allowedTools": [] + }, + "next": "buildSubject", + "bind": "synthesis" + }, + "buildSubject": { + "kind": "branch", + "selector": { + "$from": "scope", + "name": "synthesis", + "path": ["scope"] + }, + "selectorSchema": { + "type": "string" + }, + "cases": { + "": { + "inputs": { + "type": { + "$from": "scope", + "name": "synthesis", + "path": ["type"] + }, + "subject": { + "$from": "scope", + "name": "synthesis", + "path": ["subject"] + }, + "tmpl": { + "$from": "constant", + "name": "subjectNoScopeTemplate" + } + }, + "scope": { + "inputSchema": { + "type": "object", + "required": ["type", "subject", "tmpl"], + "properties": { + "tmpl": { + "type": "string" + }, + "type": { + "type": "string" + }, + "subject": { + "type": "string" + } + } + }, + "entry": "build", + "nodes": { + "build": { + "kind": "task", + "task": "text.template", + "inputSchema": { + "type": "object", + "required": ["template", "vars"], + "properties": { + "template": { + "type": "string" + }, + "vars": { + "type": "object" + } + } + }, + "outputSchema": { + "type": "string" + }, + "inputs": { + "template": { + "$from": "input", + "name": "tmpl" + }, + "vars": { + "type": { + "$from": "input", + "name": "type" + }, + "subject": { + "$from": "input", + "name": "subject" + } + } + }, + "bind": "header" + } + }, + "output": { + "$from": "scope", + "name": "header" + }, + "outputSchema": { + "type": "string" + } + } + } + }, + "default": { + "inputs": { + "type": { + "$from": "scope", + "name": "synthesis", + "path": ["type"] + }, + "subject": { + "$from": "scope", + "name": "synthesis", + "path": ["subject"] + }, + "tmpl": { + "$from": "constant", + "name": "subjectWithScopeTemplate" + }, + "scope": { + "$from": "scope", + "name": "synthesis", + "path": ["scope"] + } + }, + "scope": { + "inputSchema": { + "type": "object", + "required": ["type", "scope", "subject", "tmpl"], + "properties": { + "tmpl": { + "type": "string" + }, + "type": { + "type": "string" + }, + "subject": { + "type": "string" + }, + "scope": { + "type": "string" + } + } + }, + "entry": "build", + "nodes": { + "build": { + "kind": "task", + "task": "text.template", + "inputSchema": { + "type": "object", + "required": ["template", "vars"], + "properties": { + "template": { + "type": "string" + }, + "vars": { + "type": "object" + } + } + }, + "outputSchema": { + "type": "string" + }, + "inputs": { + "template": { + "$from": "input", + "name": "tmpl" + }, + "vars": { + "type": { + "$from": "input", + "name": "type" + }, + "subject": { + "$from": "input", + "name": "subject" + }, + "scope": { + "$from": "input", + "name": "scope" + } + } + }, + "bind": "header" + } + }, + "output": { + "$from": "scope", + "name": "header" + }, + "outputSchema": { + "type": "string" + } + } + }, + "outputSchema": { + "type": "string" + }, + "bind": "header", + "next": "formatFinal" + }, + "formatFinal": { + "kind": "task", + "task": "text.template", + "inputSchema": { + "type": "object", + "required": ["template", "vars"], + "properties": { + "template": { + "type": "string" + }, + "vars": { + "type": "object" + } + } + }, + "outputSchema": { + "type": "string" + }, + "inputs": { + "template": { + "$from": "constant", + "name": "finalMessageTemplate" + }, + "vars": { + "header": { + "$from": "scope", + "name": "header" + }, + "body": { + "$from": "scope", + "name": "synthesis", + "path": ["body"] + }, + "bullets": { + "$from": "scope", + "name": "bulletsText" + } + } + }, + "bind": "finalMessage" + } + }, + "output": { + "message": { + "$from": "scope", + "name": "finalMessage" + }, + "type": { + "$from": "scope", + "name": "synthesis", + "path": ["type"] + }, + "scope": { + "$from": "scope", + "name": "synthesis", + "path": ["scope"] + }, + "subject": { + "$from": "scope", + "name": "synthesis", + "path": ["subject"] + }, + "body": { + "$from": "scope", + "name": "synthesis", + "path": ["body"] + }, + "fileBullets": { + "$from": "scope", + "name": "bulletsText" + } + } + } + } +} diff --git a/ts/pnpm-lock.yaml b/ts/pnpm-lock.yaml index 57c5fa9db..a0d107dba 100644 --- a/ts/pnpm-lock.yaml +++ b/ts/pnpm-lock.yaml @@ -1059,6 +1059,9 @@ importers: examples/workflow/engine: dependencies: + '@github/copilot-sdk': + specifier: ^0.3.0 + version: 0.3.0 aiclient: specifier: workspace:* version: link:../../../packages/aiclient @@ -7486,6 +7489,10 @@ packages: resolution: {integrity: sha512-fCEpD9W9xqcaCAJmatyNQ1PkET9P9liK2P4Vk0raDFoMXcvpIdqewa5JQeKtWCBUsN/HCz7ExkkFP8peQuo+DA==} engines: {node: '>=20.0.0'} + '@github/copilot-sdk@0.3.0': + resolution: {integrity: sha512-SUo35k56pzzgYgwmDPHcu7kZxPrzXbH66IWXaEf6pmb94DlA709F82HrrDeja087TL4djJ9OuvRFWWOKCosAsg==} + engines: {node: '>=20.0.0'} + '@github/copilot-win32-arm64@1.0.54': resolution: {integrity: sha512-muOX8qrJSi56BWQejkH0TgXpZYRO8Y9k1qIfMuRojZyLyATn1P4lIKb67ZqDCXJLkcPfVJ5eJYsSAeGwU3Qpww==} cpu: [arm64] @@ -18817,6 +18824,12 @@ snapshots: vscode-jsonrpc: 8.2.1 zod: 4.3.6 + '@github/copilot-sdk@0.3.0': + dependencies: + '@github/copilot': 1.0.54 + vscode-jsonrpc: 8.2.1 + zod: 4.3.6 + '@github/copilot-win32-arm64@1.0.54': optional: true