|
| 1 | +// SPDX-License-Identifier: PMPL-1.0-or-later |
| 2 | +// Copyright (c) 2026 Jonathan D.A. Jewell <j.d.a.jewell@open.ac.uk> |
| 3 | += ADR 0001 — Proof Assistant Choice for KitchenSpeak |
| 4 | +:toc: |
| 5 | +:toclevels: 2 |
| 6 | +:icons: font |
| 7 | + |
| 8 | +== Status |
| 9 | + |
| 10 | +*Proposed.* Awaiting instructor sign-off, to be ratified with the class. |
| 11 | + |
| 12 | +== Context |
| 13 | + |
| 14 | +KitchenSpeak's v1.0 specification (`SPEC.adoc`) is explicit that the DSL |
| 15 | +"must compile to Cook (Gallina) for formal proof" while also requiring |
| 16 | +"Zero 'Yellow' unproven goals" in the core library. The first phrase is |
| 17 | +Coq/Rocq terminology; the second is Agda terminology. The spec is |
| 18 | +therefore *implicitly undecided* on which proof assistant is the target. |
| 19 | + |
| 20 | +Before the class begins discharging proof obligations against Dough, |
| 21 | +Emulsion, Sear, and the Poached Egg examples, a choice must be made. |
| 22 | +This ADR records the alternatives, the decision drivers, and the |
| 23 | +recommendation. |
| 24 | + |
| 25 | +Historical note: the instructor's original instinct was *Isabelle/HOL*. |
| 26 | +That instinct is taken seriously in the analysis below. |
| 27 | + |
| 28 | +== Decision drivers |
| 29 | + |
| 30 | +Ranked from most to least important for *this* context — an |
| 31 | +undergraduate-ish cohort where advanced type theory was, until this |
| 32 | +term, widely perceived as irrelevant. Pedagogy dominates. |
| 33 | + |
| 34 | +. *Pedagogical clarity.* Does the construct-to-proof mapping feel like |
| 35 | + programming or like a separate skill? Can students see the Curry-Howard |
| 36 | + correspondence concretely? |
| 37 | +. *Match to the KitchenSpeak type zoo.* Linear, refinement, session / |
| 38 | + choreographic, postulated oracles, units of measure, termination by |
| 39 | + measure, total failure handlers. Which system encodes these most |
| 40 | + naturally? |
| 41 | +. *Teaching materials specifically aligned with this curriculum.* |
| 42 | + Textbook availability, freeness, coverage of the types in §1. |
| 43 | +. *Tooling friction for the student.* Install-to-first-green-goal time. |
| 44 | + IDE integration. Platform support. |
| 45 | +. *Postulate / axiom ergonomics.* Echo-types are oracles. How cleanly |
| 46 | + does the system support "I assume this; proceed."? |
| 47 | +. *Transferable skill after the term.* If a student continues into |
| 48 | + formal methods work — research, industry, or graduate study — which |
| 49 | + system keeps earning dividends? |
| 50 | +. *Industrial credibility.* Least important for *this* course, but worth |
| 51 | + naming because KitchenSpeak has an external audience (SE teacher, |
| 52 | + hypothetical Miele pitch). |
| 53 | + |
| 54 | +== Considered options |
| 55 | + |
| 56 | +=== Option A — Agda |
| 57 | + |
| 58 | +*What it is.* Dependently-typed functional language and proof assistant |
| 59 | +in the Martin-Löf tradition. Proofs are programs; the term-mode style |
| 60 | +dominates. |
| 61 | + |
| 62 | +*Against the drivers.* |
| 63 | + |
| 64 | +* *Pedagogical clarity.* Highest of the four. Every proof is a |
| 65 | + term-construction exercise. The Curry-Howard correspondence is not |
| 66 | + taught on top of Agda — it *is* Agda. |
| 67 | +* *Type zoo match.* Native refinement via dependent types; native |
| 68 | + `postulate` keyword for oracles (echo-types); native termination |
| 69 | + checker with visible "yellow" unsolved goals (already the class's |
| 70 | + vocabulary, per `SPEC.adoc` §6). Linearity is *not* native: it must be |
| 71 | + encoded, either manually (linear monads / typed-channel encodings) or |
| 72 | + via a Quantitative Type Theory (QTT) extension. Units of measure are |
| 73 | + a straightforward dependent-type exercise. Session types: there is a |
| 74 | + solid body of Agda work (Thiemann, Gay & Vasconcelos encodings). |
| 75 | +* *Teaching materials.* *Programming Language Foundations in Agda* |
| 76 | + (PLFA, Wadler) is free online and covers dependent types, lambda |
| 77 | + calculi, inference, and linearity. It is the single best-aligned |
| 78 | + textbook available for any of the four options. |
| 79 | +* *Tooling friction.* Emacs-centric by culture, though VS Code support |
| 80 | + via `agda-mode` is improving. Unicode-heavy (a feature for experts, |
| 81 | + a barrier for beginners). Installation is straightforward via Nix or |
| 82 | + a package manager. |
| 83 | +* *Postulate ergonomics.* First-class keyword. Cleaner than any |
| 84 | + alternative. |
| 85 | +* *Transferable skill.* Agda is the language of working type theorists |
| 86 | + and a subset of dependent-type-aware PL researchers. Narrower than |
| 87 | + Lean or Coq in the broader formal-methods job market. |
| 88 | +* *Industrial credibility.* Modest — Agda is more academic than |
| 89 | + industrial. |
| 90 | + |
| 91 | +=== Option B — Lean 4 |
| 92 | + |
| 93 | +*What it is.* Modern dependently-typed proof assistant and programming |
| 94 | +language. Tactic-mode and term-mode both first-class. Native |
| 95 | +compilation to machine code. Mathlib is the world's largest machine- |
| 96 | +checked mathematics library. |
| 97 | + |
| 98 | +*Against the drivers.* |
| 99 | + |
| 100 | +* *Pedagogical clarity.* High. Tactic-mode *looks like* how mathematical |
| 101 | + proofs are written, which reduces the "I have to learn a new style of |
| 102 | + writing" barrier. Term-mode is available when needed. The mental |
| 103 | + model is less purely Curry-Howard than Agda's, which is a feature |
| 104 | + pedagogically: students see both views. |
| 105 | +* *Type zoo match.* Refinement via subtype / dependent types: strong. |
| 106 | + Postulates exist (`axiom` keyword) but feel less idiomatic than |
| 107 | + Agda's. Termination: native, with `decreasing_by` hints. Linearity: |
| 108 | + *experimental support* in Lean 4; not yet mainstream. Mathlib does |
| 109 | + not rely on linearity. Session types: community work exists but is |
| 110 | + not canonical. Units of measure: straightforward, but no |
| 111 | + canonical library (unlike F#). |
| 112 | +* *Teaching materials.* *Theorem Proving in Lean 4* and *Mathematics in |
| 113 | + Lean* are free, excellent, and current. *Logical Verification* (VU |
| 114 | + Amsterdam) is a full undergraduate course. Coverage is stronger on |
| 115 | + pure mathematics than on PL-specific content (linearity, session |
| 116 | + types, refinement for programs). |
| 117 | +* *Tooling friction.* Lowest of the four. Native VS Code support, fast |
| 118 | + feedback, cross-platform installers, `elan` version manager. |
| 119 | +* *Postulate ergonomics.* Usable but less clean than Agda. |
| 120 | +* *Transferable skill.* Highest. Lean 4 is currently the most rapidly |
| 121 | + growing proof assistant, especially in mathematics, and is seeing |
| 122 | + industrial adoption for formal methods. |
| 123 | +* *Industrial credibility.* High and rising. |
| 124 | + |
| 125 | +*Caveat.* Lean 3 → Lean 4 was a hard break within recent memory. |
| 126 | +Material older than about 2022 is generally Lean 3 and needs |
| 127 | +translation. This is a real hazard for mining internet resources. |
| 128 | + |
| 129 | +=== Option C — Coq / Rocq |
| 130 | + |
| 131 | +*What it is.* The classical dependently-typed proof assistant. Tactic- |
| 132 | +based (Ltac / Ltac2). Inductive types. Extraction to OCaml, Haskell, |
| 133 | +Scheme. Recently renamed from Coq to Rocq (2024–2025). |
| 134 | + |
| 135 | +*Against the drivers.* |
| 136 | + |
| 137 | +* *Pedagogical clarity.* Moderate. The tactic language adds a *second* |
| 138 | + skill students must learn alongside the type theory. Term-mode exists |
| 139 | + (`refine`, `exact`) but is not the primary idiom. Students can make |
| 140 | + progress without deeply understanding the underlying type theory, |
| 141 | + which is sometimes a pedagogical bug rather than a feature. |
| 142 | +* *Type zoo match.* Strong for refinement, postulates (`Axiom` |
| 143 | + keyword), termination (structural + `Program Fixpoint` + |
| 144 | + `Function`). Linearity: no native support, but there are libraries |
| 145 | + (Iris, LinearCoq, and a body of literature). Session types: mature |
| 146 | + ecosystem (e.g. Metacoq-adjacent work, and session-type embeddings). |
| 147 | + Units of measure: encodable. |
| 148 | +* *Teaching materials.* *Software Foundations* (Pierce et al.) is the |
| 149 | + gold standard for PL teaching in Coq, free, and covers linear types |
| 150 | + in a later volume. *Certified Programming with Dependent Types* (CPDT, |
| 151 | + Chlipala) is advanced. |
| 152 | +* *Tooling friction.* Moderate. `vscoq` is adequate, CoqIDE is |
| 153 | + functional, and the tactic language has good feedback. Installation |
| 154 | + via opam is standard but is one more thing. |
| 155 | +* *Postulate ergonomics.* Good (`Axiom`), though slightly less idiomatic |
| 156 | + than Agda's `postulate`. |
| 157 | +* *Transferable skill.* High. Widely used in PL research, formal |
| 158 | + verification industry (seL4 precursors, CompCert, hacspec). Declining |
| 159 | + relative to Lean in the mathematics community. |
| 160 | +* *Industrial credibility.* High. |
| 161 | + |
| 162 | +=== Option D — Isabelle/HOL |
| 163 | + |
| 164 | +*What it is.* Higher-order logic proof assistant, classical rather than |
| 165 | +constructive. Excellent proof automation (Sledgehammer), the Archive of |
| 166 | +Formal Proofs, and a long industrial track record (seL4, Amazon's |
| 167 | +s2n-tls, Unified Modeling Language semantics). |
| 168 | + |
| 169 | +*Against the drivers.* |
| 170 | + |
| 171 | +* *Pedagogical clarity.* Different in kind from the other three. |
| 172 | + Proofs are written in Isar, a structured proof language that reads |
| 173 | + more like a mathematics textbook. This is excellent for learning |
| 174 | + *proof writing*, but the mapping from KitchenSpeak constructs to |
| 175 | + HOL obligations is less direct than in a dependently-typed system — |
| 176 | + there is no Curry-Howard programming-as-proof story to lean on. |
| 177 | +* *Type zoo match.* Here Isabelle's classical logic starts to work |
| 178 | + against us. HOL does not natively support linear / substructural |
| 179 | + reasoning. *Separation logic* is available (and is how seL4-class |
| 180 | + work is done), but teaching separation logic on top of HOL adds a |
| 181 | + full additional layer the class does not need this term. Session |
| 182 | + types: not a native concept, though encodings exist. Postulates: |
| 183 | + axioms are well-supported. Units of measure: encodable. Refinement: |
| 184 | + Isabelle has excellent *refinement calculus* support, but in a |
| 185 | + different (program-refinement) sense than KitchenSpeak's type-level |
| 186 | + refinement. |
| 187 | +* *Teaching materials.* *Concrete Semantics* (Nipkow & Klein) is |
| 188 | + excellent but targeted at semantics of imperative languages rather |
| 189 | + than dependent / linear type theory. *Programming and Proving in |
| 190 | + Isabelle/HOL* is a good introduction. |
| 191 | +* *Tooling friction.* Isabelle/jEdit is a mature bespoke IDE. |
| 192 | + Installation is one download. Sledgehammer often lets students |
| 193 | + finish proofs they could not finish by hand, which is *either* a |
| 194 | + godsend (morale, pedagogical encouragement) *or* a crutch |
| 195 | + (students never learn what the proof actually looks like). |
| 196 | +* *Postulate ergonomics.* Good (`axiomatization`). |
| 197 | +* *Transferable skill.* High in industrial formal methods (avionics, |
| 198 | + microkernels, cryptography). Narrower in PL-theory research and |
| 199 | + narrower still in the modern type-theory community. |
| 200 | +* *Industrial credibility.* Highest of the four. This is the system |
| 201 | + that *actually ships* real verified software in hostile industries. |
| 202 | + |
| 203 | +*Why the instructor's original instinct toward Isabelle deserves |
| 204 | +respect.* If the deliverable were a verified kitchen-hardware controller |
| 205 | +to be shown to a regulator, Isabelle would be the right tool. The |
| 206 | +deliverable is not that. The deliverable is *students who have felt |
| 207 | +type theory*. |
| 208 | + |
| 209 | +== Comparison against drivers |
| 210 | + |
| 211 | +[cols="3,1,1,1,1"] |
| 212 | +|=== |
| 213 | +| Driver | Agda | Lean 4 | Coq/Rocq | Isabelle |
| 214 | + |
| 215 | +| Pedagogical clarity (programs = proofs) |
| 216 | +| High |
| 217 | +| High |
| 218 | +| Moderate |
| 219 | +| Different paradigm (Isar) |
| 220 | + |
| 221 | +| Type zoo match (linear, refinement, session, postulates, units) |
| 222 | +| Strong |
| 223 | +| Good, some experimental |
| 224 | +| Strong |
| 225 | +| Weaker without separation logic |
| 226 | + |
| 227 | +| Teaching materials aligned with this curriculum |
| 228 | +| PLFA — best-in-class |
| 229 | +| TPiL + Maths-in-Lean |
| 230 | +| Software Foundations |
| 231 | +| Concrete Semantics (adjacent) |
| 232 | + |
| 233 | +| Tooling friction |
| 234 | +| Moderate (Emacs-ish) |
| 235 | +| Lowest (VS Code-native) |
| 236 | +| Moderate (vscoq) |
| 237 | +| Low (Isabelle/jEdit) |
| 238 | + |
| 239 | +| Postulate ergonomics |
| 240 | +| Best (native keyword) |
| 241 | +| Good (`axiom`) |
| 242 | +| Good (`Axiom`) |
| 243 | +| Good (`axiomatization`) |
| 244 | + |
| 245 | +| Transferable skill |
| 246 | +| Narrow |
| 247 | +| Broadest and growing |
| 248 | +| Broad (declining share) |
| 249 | +| Broad within industrial FM |
| 250 | + |
| 251 | +| Industrial credibility |
| 252 | +| Low |
| 253 | +| Rising |
| 254 | +| High |
| 255 | +| Highest |
| 256 | +|=== |
| 257 | + |
| 258 | +== Decision |
| 259 | + |
| 260 | +*Recommendation: Option A — Agda — for KitchenSpeak's v1.0 term |
| 261 | +deliverables, with Option B — Lean 4 — reserved as the explicit |
| 262 | +successor target.* |
| 263 | + |
| 264 | +The rationale in one paragraph: the class's primary deliverable is |
| 265 | +understanding, not shipping. Agda's Curry-Howard immediacy makes each |
| 266 | +load-bearing KitchenSpeak construct (`max_duration`, `on_fail`, |
| 267 | +`proving @w`, linear consumption, units) map onto a term-level proof |
| 268 | +obligation the student can see and manipulate directly. Wadler's PLFA |
| 269 | +covers the exact territory KitchenSpeak sits in (linear types, session |
| 270 | +types, lambda calculi, inference) and is free. The `postulate` keyword |
| 271 | +is the cleanest fit for echo-oracles. Termination checking is built in |
| 272 | +and speaks the vocabulary the spec already uses ("yellow" goals). |
| 273 | +Linearity is the single weak spot, addressed by either a taught |
| 274 | +encoding (which teaches the students what linearity *is*, not just that |
| 275 | +it exists) or by reaching for a QTT-enabled Agda build. |
| 276 | + |
| 277 | +Lean 4 is a close second and a better *next* investment. Students who |
| 278 | +continue into formal methods after this term should be encouraged to |
| 279 | +port their KitchenSpeak proofs to Lean 4 as a second-semester or |
| 280 | +summer project. The lift is non-trivial but mostly mechanical once the |
| 281 | +shape of the proof is already understood in Agda. |
| 282 | + |
| 283 | +Isabelle is correctly the instructor's instinct for industrial |
| 284 | +verification work, and should be the chosen tool if the class's |
| 285 | +trajectory shifts toward kitchen-hardware regulatory compliance. It is |
| 286 | +the wrong tool for teaching type theory itself, because HOL lacks the |
| 287 | +dependent-type machinery the KitchenSpeak type zoo wants. |
| 288 | + |
| 289 | +Coq/Rocq is a reasonable third choice but offers no clear advantage |
| 290 | +over Agda for this curriculum while adding tactic-language overhead. |
| 291 | +Choose it only if the instructor personally has a strong Coq/Rocq |
| 292 | +preference and correspondingly faster lecture-prep time. |
| 293 | + |
| 294 | +== Consequences |
| 295 | + |
| 296 | +=== For teaching materials |
| 297 | + |
| 298 | +* Primary textbook: *Programming Language Foundations in Agda* (PLFA). |
| 299 | +* Primary reference: Ulf Norell's *Dependently Typed Programming in |
| 300 | + Agda* tutorial. |
| 301 | +* Supplementary: Conor McBride's lecture notes on linear types and |
| 302 | + quantitative type theory, for the linearity chapter. |
| 303 | + |
| 304 | +=== For tooling setup |
| 305 | + |
| 306 | +Students will need: |
| 307 | + |
| 308 | +* A working Agda installation (Nix or distro package). |
| 309 | +* `agda-mode` for either Emacs or VS Code. Recommend VS Code for this |
| 310 | + cohort unless an Emacs user is already present. |
| 311 | +* The Agda standard library. |
| 312 | +* A Unicode-capable terminal and font. |
| 313 | + |
| 314 | +Expected install-to-first-green-goal time: 30–60 minutes, with the |
| 315 | +instructor present for the first session. |
| 316 | + |
| 317 | +=== For the KitchenSpeak lowering |
| 318 | + |
| 319 | +* Each `sync_block` lowers to an Agda function returning |
| 320 | + `{success @w} ⊎ {aborted, recovered, warmed}`, with the `proving @w` |
| 321 | + existential witness supplied by a `postulate`. |
| 322 | +* Each `step`'s `max_duration` lowers to a well-founded-recursion |
| 323 | + measure on elapsed wall-clock time. |
| 324 | +* Each `Linear` resource lowers to a value in a linearity-tracking |
| 325 | + monad (taught encoding for v1.0; swap for QTT-Agda later). |
| 326 | +* Each `Tropical` refinement lowers to a `Σ`-type pairing a |
| 327 | + dimensioned quantity with a proof that its trajectory stays within |
| 328 | + the safe envelope. |
| 329 | +* Each `Echo` witness lowers to a `postulate`-ed proposition indexed |
| 330 | + by a time-stamped classifier output. |
| 331 | +* Each `Ceremony` context lowers to a reader-monad-style parameter on |
| 332 | + the `orchestrate` body. |
| 333 | + |
| 334 | +`SPEC.adoc` §4 is left untouched — the "Cook (Gallina)" phrase is the |
| 335 | +class's artefact. The `COMMENTARY.adoc` document already notes that the |
| 336 | +companion analysis uses Agda; this ADR promotes that note to a decision. |
| 337 | + |
| 338 | +=== For the eventual Lean 4 migration |
| 339 | + |
| 340 | +* Preserve the Agda proof *shape* (which introduction rules, which |
| 341 | + elimination rules, which witnesses) even if tactics would have |
| 342 | + finished the proof faster in Lean. |
| 343 | +* Record each `postulate` as a candidate `axiom` in the Lean port. |
| 344 | +* The tensor-product shape of Dyadic bind maps cleanly to a Lean 4 |
| 345 | + structure with two linear fields, should Lean 4's experimental |
| 346 | + linearity support mature. |
| 347 | + |
| 348 | +== Reversibility |
| 349 | + |
| 350 | +*High.* Nothing in `SPEC.adoc`, `grammar.ebnf`, or any of the `.ks` |
| 351 | +worked examples commits to a specific proof assistant. The spec is |
| 352 | +proof-assistant-agnostic; the lowering is what this ADR fixes. |
| 353 | + |
| 354 | +A future ADR can swap the target. The cost is re-proving the core |
| 355 | +library (Dough, Emulsion, Sear, Poached Egg), which is the students' |
| 356 | +pedagogical work anyway — not a sunk-cost problem. |
| 357 | + |
| 358 | +== Open questions |
| 359 | + |
| 360 | +. *Linearity encoding.* Teach the class to encode linearity by hand |
| 361 | + (monadic or channel-based), or reach for a QTT-Agda fork? Hand- |
| 362 | + encoding is more pedagogical; QTT is less friction. Proposal: hand- |
| 363 | + encode in weeks 3–5, mention QTT as extension reading. |
| 364 | +. *Installation strategy.* Provide a Nix flake for the class, or let |
| 365 | + students use distro packages? Nix is reproducible but adds a |
| 366 | + dependency. Proposal: Nix flake for the instructor, distro packages |
| 367 | + documented for students, fallback to a web-based Agda if anyone is |
| 368 | + blocked. |
| 369 | +. *Dual-proving Poached Egg.* After Agda is working, optionally port |
| 370 | + Poached Egg to Lean 4 as a single end-of-term exercise so the class |
| 371 | + sees the same proof in two systems. This is high-value pedagogy if |
| 372 | + time allows. |
| 373 | +. *What to tell the SE teacher.* "We chose Agda because the class is |
| 374 | + learning type theory, and Isabelle, while industrially stronger, is |
| 375 | + not a teaching tool for type theory." Direct. |
0 commit comments