You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We've been using spec-kit on a real subsystem — an LLM agent tool dispatcher with authz, rate-limit, approval flow, idempotency, and audit requirements. The /specify → /clarify → /plan → /tasks flow worked well for user-story-driven features. During code review, we found about 14 categories of non-functional concerns that didn't have a clear home in spec-kit's artifacts. They ended up in a separate single-file engineering design document.
Before considering PRs, we'd like to understand maintainers' position on scope so we know whether to propose upstream additions or build externally.
What spec-kit covers well (verified by reading the templates)
User stories with Given-When-Then acceptance — spec.md
[NEEDS CLARIFICATION] flagging — /clarify flow
Tech-stack selection with rationale — plan.md Technical Context
Constitution Check gate — plan.md
Library / dependency research — Phase 0 research.md
Domain model — Phase 1 data-model.md
Interface signatures — Phase 1 contracts/
Phased tasks with [P] parallelism — tasks.md
This is genuinely a lot, and we relied on all of it.
What we couldn't place
Below are concrete content types from our real design document that did not fit any spec-kit template. Each has a one-line description of the failure mode if it's missing.
Error-code contract — a table of code × trigger × counts-toward-rate-limit × caller-visible message. Without it, each handler invents its own error structure and clients can't uniformly handle failures.
State machine — typically a stateDiagram-v2 with legal transitions and an explicit illegal-transition policy. Approval flows with TTL-based expiry are the common case.
Cross-cutting execution order — the precise order of authz, schema validation, rate-limit, approval gate, planning, and timeout-wrapped execution. Order is a contract; getting it wrong leaks "this resource exists" via schema errors before authz denies, or charges rate-limit budget against denied users.
Authorization invariants (pseudocode + invariants) — tenants AND roles, no implicit super-admin bypass, enumerated deny_reason for forensics. Plain-English "the feature must check authz" routinely results in tenants OR roles in practice.
Failure-mode scheduling rules — timeout, approval persistence + TTL, idempotency-key derivation, per-run hard cap (calls / tokens / duration), cancel propagation across transport layers.
Trust labels and prompt-injection mitigation — trusted / partially_trusted / untrusted labeling for content flowing into LLM context, plus the sanitization pipeline (truncation → control-char escape → boundary tagging → system-prompt hardening).
Network egress policy — application-layer allowlist for outbound HTTP from tools / handlers.
Observability schema — OTel span hierarchy with required attributes, metric names with label sets, alert thresholds. We treat span names and metric names as part of the interface contract.
Audit event schema — JSON event with required fields (traceId, runId, callId, principal, result.code, result.durationMs) and retention policy.
Framework adapter pattern — isolating third-party framework types behind an adapter so major-version upgrades don't propagate through business code.
Alternatives considered — options rejected, with reasons. Code review repeatedly asks "why not X?"; writing it down once saves cycles.
Risk register — known risks with mitigations, owners, and trigger conditions.
Anti-pattern citations — pointers to specific lines in reference codebases that motivated each principle. (e.g., "rejected because reference framework Foo Bar.java:NN-NN scattered cross-cutting led to a tenant-leak incident".)
Where these landed in practice
We used contracts/ for (1) and (8). Everything else went into a single sibling file we called engineering-design.md that sits between plan.md and tasks.md. Tasks in tasks.md reference section anchors in that file. This works, but it's outside spec-kit's standard layout.
Question
Are any of these in scope for upstream spec-kit? We see three possible answers:
In scope — happy to split into small, focused PRs:
Easiest first PRs (low controversy): new templates for alternatives.md, risks.md, an "Alternatives Considered" section in plan.md.
Medium: state-machine template, error-codes template (could land under contracts/).
Partially in scope — we'd appreciate guidance on which subset to PR and which to leave external.
Out of scope — we'll publish complementary templates externally; happy to coordinate so users have a clear "use spec-kit for X, see kit-name for Y" story rather than fragmented advice.
What we've prepared
We've published a complementary repo with templates and a worked example for the categories above, anonymized to the LLM tool dispatcher domain:
Worked example: examples/llm-tool-dispatcher/design.md
Integration playbook: playbooks/spec-kit-integration.md (positions our document between plan.md and tasks.md)
Why-this-exists doc: docs/why.md (the 14 categories enumerated above with failure modes)
The repo is positioned as a complement (Apache 2.0, same license as spec-kit, "we complement, we do not fork"). If maintainers signal that some of the categories are in-scope upstream, we'll migrate those over to PRs and link from our repo so users find the canonical location.
Why we're asking now
We'd rather contribute small focused PRs that you'd accept than maintain a parallel project forever. But before opening 6 PRs, we want to know which ones land in your "yes," "no," or "maybe" buckets.
Thanks for spec-kit — the underlying flow is exactly what we needed for everything that fit, and we're trying to extend the same discipline to what didn't fit.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Context
We've been using spec-kit on a real subsystem — an LLM agent tool dispatcher with authz, rate-limit, approval flow, idempotency, and audit requirements. The
/specify → /clarify → /plan → /tasksflow worked well for user-story-driven features. During code review, we found about 14 categories of non-functional concerns that didn't have a clear home in spec-kit's artifacts. They ended up in a separate single-file engineering design document.Before considering PRs, we'd like to understand maintainers' position on scope so we know whether to propose upstream additions or build externally.
What spec-kit covers well (verified by reading the templates)
spec.md[NEEDS CLARIFICATION]flagging —/clarifyflowplan.mdTechnical Contextplan.mdresearch.mddata-model.mdcontracts/[P]parallelism —tasks.mdThis is genuinely a lot, and we relied on all of it.
What we couldn't place
Below are concrete content types from our real design document that did not fit any spec-kit template. Each has a one-line description of the failure mode if it's missing.
Error-code contract — a table of
code × trigger × counts-toward-rate-limit × caller-visible message. Without it, each handler invents its own error structure and clients can't uniformly handle failures.State machine — typically a
stateDiagram-v2with legal transitions and an explicit illegal-transition policy. Approval flows with TTL-based expiry are the common case.Cross-cutting execution order — the precise order of authz, schema validation, rate-limit, approval gate, planning, and timeout-wrapped execution. Order is a contract; getting it wrong leaks "this resource exists" via schema errors before authz denies, or charges rate-limit budget against denied users.
Authorization invariants (pseudocode + invariants) —
tenants AND roles, no implicit super-admin bypass, enumerateddeny_reasonfor forensics. Plain-English "the feature must check authz" routinely results intenants OR rolesin practice.Failure-mode scheduling rules — timeout, approval persistence + TTL, idempotency-key derivation, per-run hard cap (calls / tokens / duration), cancel propagation across transport layers.
Trust labels and prompt-injection mitigation —
trusted/partially_trusted/untrustedlabeling for content flowing into LLM context, plus the sanitization pipeline (truncation → control-char escape → boundary tagging → system-prompt hardening).Network egress policy — application-layer allowlist for outbound HTTP from tools / handlers.
Observability schema — OTel span hierarchy with required attributes, metric names with label sets, alert thresholds. We treat span names and metric names as part of the interface contract.
Audit event schema — JSON event with required fields (
traceId,runId,callId,principal,result.code,result.durationMs) and retention policy.Framework adapter pattern — isolating third-party framework types behind an adapter so major-version upgrades don't propagate through business code.
Alternatives considered — options rejected, with reasons. Code review repeatedly asks "why not X?"; writing it down once saves cycles.
Rollout / canary / rollback — milestones, feature flags, rollback paths with RTO targets.
Risk register — known risks with mitigations, owners, and trigger conditions.
Anti-pattern citations — pointers to specific lines in reference codebases that motivated each principle. (e.g., "rejected because reference framework Foo
Bar.java:NN-NNscattered cross-cutting led to a tenant-leak incident".)Where these landed in practice
We used
contracts/for (1) and (8). Everything else went into a single sibling file we calledengineering-design.mdthat sits betweenplan.mdandtasks.md. Tasks intasks.mdreference section anchors in that file. This works, but it's outside spec-kit's standard layout.Question
Are any of these in scope for upstream spec-kit? We see three possible answers:
alternatives.md,risks.md, an "Alternatives Considered" section inplan.md.contracts/).What we've prepared
We've published a complementary repo with templates and a worked example for the categories above, anonymized to the LLM tool dispatcher domain:
templates/engineering-design.mdexamples/llm-tool-dispatcher/design.mdplaybooks/spec-kit-integration.md(positions our document betweenplan.mdandtasks.md)docs/why.md(the 14 categories enumerated above with failure modes)The repo is positioned as a complement (Apache 2.0, same license as spec-kit, "we complement, we do not fork"). If maintainers signal that some of the categories are in-scope upstream, we'll migrate those over to PRs and link from our repo so users find the canonical location.
Why we're asking now
We'd rather contribute small focused PRs that you'd accept than maintain a parallel project forever. But before opening 6 PRs, we want to know which ones land in your "yes," "no," or "maybe" buckets.
Thanks for spec-kit — the underlying flow is exactly what we needed for everything that fit, and we're trying to extend the same discipline to what didn't fit.
Beta Was this translation helpful? Give feedback.
All reactions