|
| 1 | +# Proposal: HIR Lowering — Rewrite in TML (Phase 15a) |
| 2 | + |
| 3 | +## Why |
| 4 | + |
| 5 | +HIR (High-level Intermediate Representation) is the essential bridge between the parsed AST and |
| 6 | +the MIR builder. It takes the raw syntax tree and a fully-resolved TypeEnv from the type checker |
| 7 | +(phase14d) and produces a typed, desugared tree where every expression node carries its concrete |
| 8 | +type, all syntactic sugar has been eliminated, and generic functions have been monomorphized into |
| 9 | +concrete specializations. Without HIR, the MIR builder would need to re-implement type resolution, |
| 10 | +desugaring, and monomorphization — the entire point of the pipeline split is that HIR does this |
| 11 | +work once, cleanly, so MIR can focus on control flow and SSA construction. |
| 12 | + |
| 13 | +This phase ports ~15,207 LOC of C++ across 8 source files and 4 serialization files to ~9,900 |
| 14 | +lines of TML, completing the first major pass of the ERA 1 compiler port. |
| 15 | + |
| 16 | +## What Changes |
| 17 | + |
| 18 | +Port the following C++ files to TML: |
| 19 | + |
| 20 | +- `compiler/src/hir/hir_builder.cpp` (1,511 LOC) — main HIR builder, entry point |
| 21 | +- `compiler/src/hir/hir_builder_expr.cpp` (1,485 LOC) — expression lowering (largest single file) |
| 22 | +- `compiler/src/hir/hir_builder_stmt.cpp` (433 LOC) — statement lowering |
| 23 | +- `compiler/src/hir/hir_builder_pattern.cpp` (362 LOC) — pattern lowering |
| 24 | +- `compiler/src/hir/hir_pass.cpp` (728 LOC) — HIR optimization passes |
| 25 | +- `compiler/src/hir/hir_pass_inline.cpp` (1,292 LOC) — HIR inlining pass |
| 26 | +- `compiler/src/hir/hir_printer.cpp` (619 LOC) — HIR pretty-printer |
| 27 | +- `compiler/src/hir/hir_expr.cpp` (225 LOC), `hir_pattern.cpp` (142 LOC), |
| 28 | + `hir_module.cpp` (90 LOC), `hir_stmt.cpp` (76 LOC) — HIR node type definitions |
| 29 | +- `compiler/src/hir/serializer/` (4 files, ~3,561 LOC) — binary serialization for pipeline bridge |
| 30 | + |
| 31 | +New TML modules produced: |
| 32 | + |
| 33 | +- `compiler-tml/src/hir/mod.tml` — module root |
| 34 | +- `compiler-tml/src/hir/expr.tml` — `HirExpr` enum (~40 variants, each carrying resolved type) |
| 35 | +- `compiler-tml/src/hir/stmt.tml` — `HirStmt` enum |
| 36 | +- `compiler-tml/src/hir/pattern.tml` — `HirPattern` enum |
| 37 | +- `compiler-tml/src/hir/module.tml` — `HirModule` struct |
| 38 | +- `compiler-tml/src/hir/builder.tml` — `HirBuilder` struct and `lower_module` entry point |
| 39 | +- `compiler-tml/src/hir/lower_expr.tml` — expression lowering (largest module) |
| 40 | +- `compiler-tml/src/hir/monomorph.tml` — generic instantiation engine |
| 41 | +- `compiler-tml/src/hir/printer.tml` — HIR pretty-printer |
| 42 | + |
| 43 | +## Key Decisions |
| 44 | + |
| 45 | +**HirExpr as enum with resolved types on every node.** Every `HirExpr` variant carries its |
| 46 | +concrete `Type` as a field. This is the central design decision of the HIR: type resolution is |
| 47 | +done once here, and every downstream pass (THIR, MIR, codegen) can read the type directly from |
| 48 | +the node without consulting the TypeEnv. This matches the C++ implementation where `HirExpr` |
| 49 | +nodes carry a `resolved_type` field populated during lowering. |
| 50 | + |
| 51 | +**Monomorphization as a separate pass, not interleaved.** When expression lowering encounters a |
| 52 | +call to `foo[I32](x)`, it does not immediately generate the specialized `foo$I32$` function. |
| 53 | +Instead, it records the instantiation in a queue and lowers the call using the mangled name as |
| 54 | +a placeholder. A separate monomorphization pass then drains the queue, generating each |
| 55 | +specialization exactly once (handling recursive generics by checking the queue before adding). |
| 56 | +This matches the C++ design and avoids infinite recursion on recursive generic types like |
| 57 | +`List[List[T]]`. |
| 58 | + |
| 59 | +**Closure capture analysis generates closure struct types.** When a closure `do(x) expr` is |
| 60 | +encountered, capture analysis walks the body and identifies all free variables. For each closure, |
| 61 | +a synthetic struct type is generated (e.g., `__Closure_42`) with one field per captured variable, |
| 62 | +using the capture mode (ref, value, or move) determined by how the variable is used. The closure |
| 63 | +body is lowered as a separate `HirFunc` with the closure struct as its first parameter. This |
| 64 | +matches the C++ `hir_builder.cpp` approach and produces the struct layout that MIR codegen |
| 65 | +expects. |
| 66 | + |
| 67 | +**Desugaring is exhaustive and irreversible.** After HIR lowering, `for`/`while`/`var` syntax |
| 68 | +does not exist in the output. All `for x in iter` loops become explicit iterator protocol calls |
| 69 | +(`iter.next()` in a loop). All `var` declarations become `let mut`. All `if let Just(x) = e` |
| 70 | +patterns become explicit `when` expressions. Downstream passes never need to handle these sugar |
| 71 | +forms. |
| 72 | + |
| 73 | +## Architecture |
| 74 | + |
| 75 | +``` |
| 76 | +compiler-tml/src/hir/ |
| 77 | + mod.tml -- module root, re-exports HirModule, HirExpr, HirBuilder |
| 78 | + expr.tml -- HirExpr enum: Lit, Var, Field, Index, Call, When, Loop, Closure, ... |
| 79 | + stmt.tml -- HirStmt enum: Let, Expr, Return, ... |
| 80 | + pattern.tml -- HirPattern enum: Wildcard, Bind, Struct, Enum, Tuple, ... |
| 81 | + module.tml -- HirModule: List[HirFunc] + List[HirTypeDef] + List[HirImpl] |
| 82 | + builder.tml -- HirBuilder + lower_module() entry point |
| 83 | + lower_expr.tml -- lower_expr(), lower_call(), lower_closure(), lower_when(), ... |
| 84 | + monomorph.tml -- MonomorphQueue, drain_queue(), mangle_name() |
| 85 | + printer.tml -- print_module(), print_func(), print_expr() for debug output |
| 86 | +``` |
| 87 | + |
| 88 | +## Pipeline Integration |
| 89 | + |
| 90 | +After phase15a completes, the pipeline is: |
| 91 | + |
| 92 | +``` |
| 93 | +AST + TypeEnv (from phase14d) |
| 94 | + | phase15a: type resolution, desugaring, capture analysis, monomorphization |
| 95 | + v |
| 96 | +HirModule → THIR lowerer (Phase 15b) |
| 97 | +``` |
| 98 | + |
| 99 | +The `HirModule` output is a complete, typed, desugared representation of the source. Every |
| 100 | +expression carries its concrete type. No generic functions remain — only their specializations. |
| 101 | +No syntactic sugar remains. This is what THIR consumes. |
| 102 | + |
| 103 | +## Success Criteria |
| 104 | + |
| 105 | +Differential HIR comparison: run the TML HIR builder on all 1,700+ test files and all stdlib |
| 106 | +modules. Serialize the resulting `HirModule` (using the binary serializer from phase 5) and |
| 107 | +compare field-by-field with the C++ HIR builder output. Zero diffs required before phase 15b |
| 108 | +begins. |
| 109 | + |
| 110 | +## Risk Assessment |
| 111 | + |
| 112 | +High. Monomorphization has well-known edge cases: recursive generics (`List[List[T]]`), |
| 113 | +mutually recursive generic functions, generic types appearing only in associated type positions, |
| 114 | +and trait object erasure (where monomorphization does not apply). The C++ implementation handles |
| 115 | +these via a worklist algorithm with cycle detection; the TML port must replicate this exactly. |
| 116 | + |
| 117 | +Closure capture analysis is also subtle: the capture mode (ref vs value vs move) affects the |
| 118 | +generated closure struct layout and must match what the MIR builder expects. A mismatch here |
| 119 | +produces runtime use-after-free bugs that are difficult to trace. |
| 120 | + |
| 121 | +Plan: implement data types and builder core first, test against simple non-generic modules, |
| 122 | +then add monomorphization and test with generic stdlib modules, then add closure support last. |
| 123 | + |
| 124 | +## Dependencies |
| 125 | + |
| 126 | +- **Requires**: phase14d complete (full TypeEnv with behavior dispatch and coercion annotations) |
| 127 | +- **Blocks**: phase15b (THIR lowerer consumes HirModule) |
| 128 | +- **Blocks**: phase15c (MIR builder HIR→MIR path consumes HirModule directly) |
0 commit comments