Skip to content

Commit d151e43

Browse files
committed
kitchenspeak: ADR 0001 — recommend Agda as proof-assistant target
Adds kitchenspeak/decisions/0001-proof-assistant.adoc and links it from README.adoc. Evaluates Agda, Lean 4, Coq/Rocq, and Isabelle/HOL against seven drivers (pedagogical clarity, match to the KitchenSpeak type zoo, teaching materials, tooling friction, postulate ergonomics, transferable skill, industrial credibility). Recommendation: Agda for v1.0 term deliverables, with Lean 4 as the explicit successor target if students continue after the term. Isabelle treated respectfully as the instructor's original instinct and correctly the tool for industrial verification, but not for teaching dependent type theory. Reversibility is high: nothing in SPEC.adoc or grammar.ebnf commits to a specific proof assistant; only the lowering does. https://claude.ai/code/session_0118688cwQ4a7YZQVvaUewPd
1 parent 0a8f306 commit d151e43

2 files changed

Lines changed: 379 additions & 0 deletions

File tree

kitchenspeak/README.adoc

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,10 @@ work.
5353
| Worked example. The minimum KitchenSpeak program that exercises all
5454
three of Linear (the egg), Tropical (the water temperature), and Echo
5555
(the visual witness). Heavily commented as a teaching artefact.
56+
57+
| `decisions/`
58+
| Architecture Decision Records (ADRs). One per material decision on
59+
how KitchenSpeak lowers, tools, or targets other systems.
5660
|===
5761

5862
== The seven types
Lines changed: 375 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,375 @@
1+
// SPDX-License-Identifier: PMPL-1.0-or-later
2+
// Copyright (c) 2026 Jonathan D.A. Jewell <j.d.a.jewell@open.ac.uk>
3+
= ADR 0001 — Proof Assistant Choice for KitchenSpeak
4+
:toc:
5+
:toclevels: 2
6+
:icons: font
7+
8+
== Status
9+
10+
*Proposed.* Awaiting instructor sign-off, to be ratified with the class.
11+
12+
== Context
13+
14+
KitchenSpeak's v1.0 specification (`SPEC.adoc`) is explicit that the DSL
15+
"must compile to Cook (Gallina) for formal proof" while also requiring
16+
"Zero 'Yellow' unproven goals" in the core library. The first phrase is
17+
Coq/Rocq terminology; the second is Agda terminology. The spec is
18+
therefore *implicitly undecided* on which proof assistant is the target.
19+
20+
Before the class begins discharging proof obligations against Dough,
21+
Emulsion, Sear, and the Poached Egg examples, a choice must be made.
22+
This ADR records the alternatives, the decision drivers, and the
23+
recommendation.
24+
25+
Historical note: the instructor's original instinct was *Isabelle/HOL*.
26+
That instinct is taken seriously in the analysis below.
27+
28+
== Decision drivers
29+
30+
Ranked from most to least important for *this* context — an
31+
undergraduate-ish cohort where advanced type theory was, until this
32+
term, widely perceived as irrelevant. Pedagogy dominates.
33+
34+
. *Pedagogical clarity.* Does the construct-to-proof mapping feel like
35+
programming or like a separate skill? Can students see the Curry-Howard
36+
correspondence concretely?
37+
. *Match to the KitchenSpeak type zoo.* Linear, refinement, session /
38+
choreographic, postulated oracles, units of measure, termination by
39+
measure, total failure handlers. Which system encodes these most
40+
naturally?
41+
. *Teaching materials specifically aligned with this curriculum.*
42+
Textbook availability, freeness, coverage of the types in §1.
43+
. *Tooling friction for the student.* Install-to-first-green-goal time.
44+
IDE integration. Platform support.
45+
. *Postulate / axiom ergonomics.* Echo-types are oracles. How cleanly
46+
does the system support "I assume this; proceed."?
47+
. *Transferable skill after the term.* If a student continues into
48+
formal methods work — research, industry, or graduate study — which
49+
system keeps earning dividends?
50+
. *Industrial credibility.* Least important for *this* course, but worth
51+
naming because KitchenSpeak has an external audience (SE teacher,
52+
hypothetical Miele pitch).
53+
54+
== Considered options
55+
56+
=== Option A — Agda
57+
58+
*What it is.* Dependently-typed functional language and proof assistant
59+
in the Martin-Löf tradition. Proofs are programs; the term-mode style
60+
dominates.
61+
62+
*Against the drivers.*
63+
64+
* *Pedagogical clarity.* Highest of the four. Every proof is a
65+
term-construction exercise. The Curry-Howard correspondence is not
66+
taught on top of Agda — it *is* Agda.
67+
* *Type zoo match.* Native refinement via dependent types; native
68+
`postulate` keyword for oracles (echo-types); native termination
69+
checker with visible "yellow" unsolved goals (already the class's
70+
vocabulary, per `SPEC.adoc` §6). Linearity is *not* native: it must be
71+
encoded, either manually (linear monads / typed-channel encodings) or
72+
via a Quantitative Type Theory (QTT) extension. Units of measure are
73+
a straightforward dependent-type exercise. Session types: there is a
74+
solid body of Agda work (Thiemann, Gay & Vasconcelos encodings).
75+
* *Teaching materials.* *Programming Language Foundations in Agda*
76+
(PLFA, Wadler) is free online and covers dependent types, lambda
77+
calculi, inference, and linearity. It is the single best-aligned
78+
textbook available for any of the four options.
79+
* *Tooling friction.* Emacs-centric by culture, though VS Code support
80+
via `agda-mode` is improving. Unicode-heavy (a feature for experts,
81+
a barrier for beginners). Installation is straightforward via Nix or
82+
a package manager.
83+
* *Postulate ergonomics.* First-class keyword. Cleaner than any
84+
alternative.
85+
* *Transferable skill.* Agda is the language of working type theorists
86+
and a subset of dependent-type-aware PL researchers. Narrower than
87+
Lean or Coq in the broader formal-methods job market.
88+
* *Industrial credibility.* Modest — Agda is more academic than
89+
industrial.
90+
91+
=== Option B — Lean 4
92+
93+
*What it is.* Modern dependently-typed proof assistant and programming
94+
language. Tactic-mode and term-mode both first-class. Native
95+
compilation to machine code. Mathlib is the world's largest machine-
96+
checked mathematics library.
97+
98+
*Against the drivers.*
99+
100+
* *Pedagogical clarity.* High. Tactic-mode *looks like* how mathematical
101+
proofs are written, which reduces the "I have to learn a new style of
102+
writing" barrier. Term-mode is available when needed. The mental
103+
model is less purely Curry-Howard than Agda's, which is a feature
104+
pedagogically: students see both views.
105+
* *Type zoo match.* Refinement via subtype / dependent types: strong.
106+
Postulates exist (`axiom` keyword) but feel less idiomatic than
107+
Agda's. Termination: native, with `decreasing_by` hints. Linearity:
108+
*experimental support* in Lean 4; not yet mainstream. Mathlib does
109+
not rely on linearity. Session types: community work exists but is
110+
not canonical. Units of measure: straightforward, but no
111+
canonical library (unlike F#).
112+
* *Teaching materials.* *Theorem Proving in Lean 4* and *Mathematics in
113+
Lean* are free, excellent, and current. *Logical Verification* (VU
114+
Amsterdam) is a full undergraduate course. Coverage is stronger on
115+
pure mathematics than on PL-specific content (linearity, session
116+
types, refinement for programs).
117+
* *Tooling friction.* Lowest of the four. Native VS Code support, fast
118+
feedback, cross-platform installers, `elan` version manager.
119+
* *Postulate ergonomics.* Usable but less clean than Agda.
120+
* *Transferable skill.* Highest. Lean 4 is currently the most rapidly
121+
growing proof assistant, especially in mathematics, and is seeing
122+
industrial adoption for formal methods.
123+
* *Industrial credibility.* High and rising.
124+
125+
*Caveat.* Lean 3 → Lean 4 was a hard break within recent memory.
126+
Material older than about 2022 is generally Lean 3 and needs
127+
translation. This is a real hazard for mining internet resources.
128+
129+
=== Option C — Coq / Rocq
130+
131+
*What it is.* The classical dependently-typed proof assistant. Tactic-
132+
based (Ltac / Ltac2). Inductive types. Extraction to OCaml, Haskell,
133+
Scheme. Recently renamed from Coq to Rocq (2024–2025).
134+
135+
*Against the drivers.*
136+
137+
* *Pedagogical clarity.* Moderate. The tactic language adds a *second*
138+
skill students must learn alongside the type theory. Term-mode exists
139+
(`refine`, `exact`) but is not the primary idiom. Students can make
140+
progress without deeply understanding the underlying type theory,
141+
which is sometimes a pedagogical bug rather than a feature.
142+
* *Type zoo match.* Strong for refinement, postulates (`Axiom`
143+
keyword), termination (structural + `Program Fixpoint` +
144+
`Function`). Linearity: no native support, but there are libraries
145+
(Iris, LinearCoq, and a body of literature). Session types: mature
146+
ecosystem (e.g. Metacoq-adjacent work, and session-type embeddings).
147+
Units of measure: encodable.
148+
* *Teaching materials.* *Software Foundations* (Pierce et al.) is the
149+
gold standard for PL teaching in Coq, free, and covers linear types
150+
in a later volume. *Certified Programming with Dependent Types* (CPDT,
151+
Chlipala) is advanced.
152+
* *Tooling friction.* Moderate. `vscoq` is adequate, CoqIDE is
153+
functional, and the tactic language has good feedback. Installation
154+
via opam is standard but is one more thing.
155+
* *Postulate ergonomics.* Good (`Axiom`), though slightly less idiomatic
156+
than Agda's `postulate`.
157+
* *Transferable skill.* High. Widely used in PL research, formal
158+
verification industry (seL4 precursors, CompCert, hacspec). Declining
159+
relative to Lean in the mathematics community.
160+
* *Industrial credibility.* High.
161+
162+
=== Option D — Isabelle/HOL
163+
164+
*What it is.* Higher-order logic proof assistant, classical rather than
165+
constructive. Excellent proof automation (Sledgehammer), the Archive of
166+
Formal Proofs, and a long industrial track record (seL4, Amazon's
167+
s2n-tls, Unified Modeling Language semantics).
168+
169+
*Against the drivers.*
170+
171+
* *Pedagogical clarity.* Different in kind from the other three.
172+
Proofs are written in Isar, a structured proof language that reads
173+
more like a mathematics textbook. This is excellent for learning
174+
*proof writing*, but the mapping from KitchenSpeak constructs to
175+
HOL obligations is less direct than in a dependently-typed system —
176+
there is no Curry-Howard programming-as-proof story to lean on.
177+
* *Type zoo match.* Here Isabelle's classical logic starts to work
178+
against us. HOL does not natively support linear / substructural
179+
reasoning. *Separation logic* is available (and is how seL4-class
180+
work is done), but teaching separation logic on top of HOL adds a
181+
full additional layer the class does not need this term. Session
182+
types: not a native concept, though encodings exist. Postulates:
183+
axioms are well-supported. Units of measure: encodable. Refinement:
184+
Isabelle has excellent *refinement calculus* support, but in a
185+
different (program-refinement) sense than KitchenSpeak's type-level
186+
refinement.
187+
* *Teaching materials.* *Concrete Semantics* (Nipkow & Klein) is
188+
excellent but targeted at semantics of imperative languages rather
189+
than dependent / linear type theory. *Programming and Proving in
190+
Isabelle/HOL* is a good introduction.
191+
* *Tooling friction.* Isabelle/jEdit is a mature bespoke IDE.
192+
Installation is one download. Sledgehammer often lets students
193+
finish proofs they could not finish by hand, which is *either* a
194+
godsend (morale, pedagogical encouragement) *or* a crutch
195+
(students never learn what the proof actually looks like).
196+
* *Postulate ergonomics.* Good (`axiomatization`).
197+
* *Transferable skill.* High in industrial formal methods (avionics,
198+
microkernels, cryptography). Narrower in PL-theory research and
199+
narrower still in the modern type-theory community.
200+
* *Industrial credibility.* Highest of the four. This is the system
201+
that *actually ships* real verified software in hostile industries.
202+
203+
*Why the instructor's original instinct toward Isabelle deserves
204+
respect.* If the deliverable were a verified kitchen-hardware controller
205+
to be shown to a regulator, Isabelle would be the right tool. The
206+
deliverable is not that. The deliverable is *students who have felt
207+
type theory*.
208+
209+
== Comparison against drivers
210+
211+
[cols="3,1,1,1,1"]
212+
|===
213+
| Driver | Agda | Lean 4 | Coq/Rocq | Isabelle
214+
215+
| Pedagogical clarity (programs = proofs)
216+
| High
217+
| High
218+
| Moderate
219+
| Different paradigm (Isar)
220+
221+
| Type zoo match (linear, refinement, session, postulates, units)
222+
| Strong
223+
| Good, some experimental
224+
| Strong
225+
| Weaker without separation logic
226+
227+
| Teaching materials aligned with this curriculum
228+
| PLFA — best-in-class
229+
| TPiL + Maths-in-Lean
230+
| Software Foundations
231+
| Concrete Semantics (adjacent)
232+
233+
| Tooling friction
234+
| Moderate (Emacs-ish)
235+
| Lowest (VS Code-native)
236+
| Moderate (vscoq)
237+
| Low (Isabelle/jEdit)
238+
239+
| Postulate ergonomics
240+
| Best (native keyword)
241+
| Good (`axiom`)
242+
| Good (`Axiom`)
243+
| Good (`axiomatization`)
244+
245+
| Transferable skill
246+
| Narrow
247+
| Broadest and growing
248+
| Broad (declining share)
249+
| Broad within industrial FM
250+
251+
| Industrial credibility
252+
| Low
253+
| Rising
254+
| High
255+
| Highest
256+
|===
257+
258+
== Decision
259+
260+
*Recommendation: Option A — Agda — for KitchenSpeak's v1.0 term
261+
deliverables, with Option B — Lean 4 — reserved as the explicit
262+
successor target.*
263+
264+
The rationale in one paragraph: the class's primary deliverable is
265+
understanding, not shipping. Agda's Curry-Howard immediacy makes each
266+
load-bearing KitchenSpeak construct (`max_duration`, `on_fail`,
267+
`proving @w`, linear consumption, units) map onto a term-level proof
268+
obligation the student can see and manipulate directly. Wadler's PLFA
269+
covers the exact territory KitchenSpeak sits in (linear types, session
270+
types, lambda calculi, inference) and is free. The `postulate` keyword
271+
is the cleanest fit for echo-oracles. Termination checking is built in
272+
and speaks the vocabulary the spec already uses ("yellow" goals).
273+
Linearity is the single weak spot, addressed by either a taught
274+
encoding (which teaches the students what linearity *is*, not just that
275+
it exists) or by reaching for a QTT-enabled Agda build.
276+
277+
Lean 4 is a close second and a better *next* investment. Students who
278+
continue into formal methods after this term should be encouraged to
279+
port their KitchenSpeak proofs to Lean 4 as a second-semester or
280+
summer project. The lift is non-trivial but mostly mechanical once the
281+
shape of the proof is already understood in Agda.
282+
283+
Isabelle is correctly the instructor's instinct for industrial
284+
verification work, and should be the chosen tool if the class's
285+
trajectory shifts toward kitchen-hardware regulatory compliance. It is
286+
the wrong tool for teaching type theory itself, because HOL lacks the
287+
dependent-type machinery the KitchenSpeak type zoo wants.
288+
289+
Coq/Rocq is a reasonable third choice but offers no clear advantage
290+
over Agda for this curriculum while adding tactic-language overhead.
291+
Choose it only if the instructor personally has a strong Coq/Rocq
292+
preference and correspondingly faster lecture-prep time.
293+
294+
== Consequences
295+
296+
=== For teaching materials
297+
298+
* Primary textbook: *Programming Language Foundations in Agda* (PLFA).
299+
* Primary reference: Ulf Norell's *Dependently Typed Programming in
300+
Agda* tutorial.
301+
* Supplementary: Conor McBride's lecture notes on linear types and
302+
quantitative type theory, for the linearity chapter.
303+
304+
=== For tooling setup
305+
306+
Students will need:
307+
308+
* A working Agda installation (Nix or distro package).
309+
* `agda-mode` for either Emacs or VS Code. Recommend VS Code for this
310+
cohort unless an Emacs user is already present.
311+
* The Agda standard library.
312+
* A Unicode-capable terminal and font.
313+
314+
Expected install-to-first-green-goal time: 30–60 minutes, with the
315+
instructor present for the first session.
316+
317+
=== For the KitchenSpeak lowering
318+
319+
* Each `sync_block` lowers to an Agda function returning
320+
`{success @w} ⊎ {aborted, recovered, warmed}`, with the `proving @w`
321+
existential witness supplied by a `postulate`.
322+
* Each `step`'s `max_duration` lowers to a well-founded-recursion
323+
measure on elapsed wall-clock time.
324+
* Each `Linear` resource lowers to a value in a linearity-tracking
325+
monad (taught encoding for v1.0; swap for QTT-Agda later).
326+
* Each `Tropical` refinement lowers to a `Σ`-type pairing a
327+
dimensioned quantity with a proof that its trajectory stays within
328+
the safe envelope.
329+
* Each `Echo` witness lowers to a `postulate`-ed proposition indexed
330+
by a time-stamped classifier output.
331+
* Each `Ceremony` context lowers to a reader-monad-style parameter on
332+
the `orchestrate` body.
333+
334+
`SPEC.adoc` §4 is left untouched — the "Cook (Gallina)" phrase is the
335+
class's artefact. The `COMMENTARY.adoc` document already notes that the
336+
companion analysis uses Agda; this ADR promotes that note to a decision.
337+
338+
=== For the eventual Lean 4 migration
339+
340+
* Preserve the Agda proof *shape* (which introduction rules, which
341+
elimination rules, which witnesses) even if tactics would have
342+
finished the proof faster in Lean.
343+
* Record each `postulate` as a candidate `axiom` in the Lean port.
344+
* The tensor-product shape of Dyadic bind maps cleanly to a Lean 4
345+
structure with two linear fields, should Lean 4's experimental
346+
linearity support mature.
347+
348+
== Reversibility
349+
350+
*High.* Nothing in `SPEC.adoc`, `grammar.ebnf`, or any of the `.ks`
351+
worked examples commits to a specific proof assistant. The spec is
352+
proof-assistant-agnostic; the lowering is what this ADR fixes.
353+
354+
A future ADR can swap the target. The cost is re-proving the core
355+
library (Dough, Emulsion, Sear, Poached Egg), which is the students'
356+
pedagogical work anyway — not a sunk-cost problem.
357+
358+
== Open questions
359+
360+
. *Linearity encoding.* Teach the class to encode linearity by hand
361+
(monadic or channel-based), or reach for a QTT-Agda fork? Hand-
362+
encoding is more pedagogical; QTT is less friction. Proposal: hand-
363+
encode in weeks 3–5, mention QTT as extension reading.
364+
. *Installation strategy.* Provide a Nix flake for the class, or let
365+
students use distro packages? Nix is reproducible but adds a
366+
dependency. Proposal: Nix flake for the instructor, distro packages
367+
documented for students, fallback to a web-based Agda if anyone is
368+
blocked.
369+
. *Dual-proving Poached Egg.* After Agda is working, optionally port
370+
Poached Egg to Lean 4 as a single end-of-term exercise so the class
371+
sees the same proof in two systems. This is high-value pedagogy if
372+
time allows.
373+
. *What to tell the SE teacher.* "We chose Agda because the class is
374+
learning type theory, and Isabelle, while industrially stronger, is
375+
not a teaching tool for type theory." Direct.

0 commit comments

Comments
 (0)