Skip to content

Commit 64141c0

Browse files
Andre Ferreiraclaude
andcommitted
chore(tasks): ERA 2+3+4 complete — 14 tasks, 295 items, FULL INDEPENDENCE PLANNED
ERA 2 — Custom Native Backend (eliminate LLVM): - phase18a: MachIR lowering (20), phase18b: x86_64 encoder (22), phase18c: PE/COFF emission (20) - phase19a: Linear scan register allocator (22) - phase20a: Production x86_64 (22), phase20b: AArch64 backend (21) - phase21a: Debug info PDB+DWARF (24) → LLVM ELIMINATED ERA 3 — Custom Linker (eliminate LLD): - phase22a: PE/COFF linker (22), phase22b: ELF linker (20) - phase22c: Mach-O linker (18), phase22d: Incremental linker <10ms (18) → LLD ELIMINATED ERA 4 — C/C++ Frontend (complete toolchain): - phase23a: C preprocessor (20), phase23b: C17 frontend (24) - phase23c: C++ subset frontend (22) → FULL INDEPENDENCE All 14 tasks include real proposals (40-155 lines each). Grand total: 39 independence tasks, 839 checklist items across 4 eras. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent e9d2c41 commit 64141c0

43 files changed

Lines changed: 1936 additions & 4 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.rulebook/tasks/TASKS-INDEX.md

Lines changed: 48 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# TML Project — Task Index
22

33
**Last updated**: 2026-04-05
4-
**Active tasks**: 46 | **Archived**: 5+
4+
**Active tasks**: 60 | **Archived**: 5+
55

66
---
77

@@ -196,6 +196,47 @@ Wire everything together, port tooling, execute three-stage bootstrap verificati
196196

197197
**Order**: 17a → 17b → 17c (sequential). 17c = **TML COMPILES ITSELF**
198198

199+
## Phase 18–21 — Custom Native Backend (ERA 2)
200+
201+
Replace LLVM with custom code generator. Binary drops from 140MB to ~15MB.
202+
203+
| ID | Task | Status | Priority | Progress |
204+
|----|------|--------|----------|----------|
205+
| 18a | [MachIR Lowering](phase18a_debug-backend-machir/) | Planned | P1 | 0/20 |
206+
| 18b | [x86_64 Encoder](phase18b_x86-encoder/) | Planned | P1 | 0/22 |
207+
| 18c | [PE/COFF Object Emission](phase18c_pe-object-emission/) | Planned | P1 | 0/20 |
208+
| 19a | [Register Allocator](phase19a_register-allocator/) | Planned | P1 | 0/22 |
209+
| 20a | [Production x86_64 Backend](phase20a_production-backend-x86/) | Planned | P1 | 0/22 |
210+
| 20b | [AArch64 Backend](phase20b_aarch64-backend/) | Planned | P1 | 0/21 |
211+
| 21a | [Debug Info (PDB+DWARF)](phase21a_debug-info-pdb-dwarf/) | Planned | P1 | 0/24 |
212+
213+
**Order**: 18a → 18b+18c → 19a → 20a+20b → 21a. Completion = **LLVM ELIMINATED**
214+
215+
## Phase 22 — Custom Linker (ERA 3)
216+
217+
Replace LLD with tml-link. Target: sub-10ms incremental linking.
218+
219+
| ID | Task | Status | Priority | Progress |
220+
|----|------|--------|----------|----------|
221+
| 22a | [PE/COFF Linker (Windows)](phase22a_pe-coff-linker/) | Planned | P2 | 0/22 |
222+
| 22b | [ELF Linker (Linux)](phase22b_elf-linker/) | Planned | P2 | 0/20 |
223+
| 22c | [Mach-O Linker (macOS)](phase22c_macho-linker/) | Planned | P2 | 0/18 |
224+
| 22d | [Incremental Linker](phase22d_incremental-linker/) | Planned | P2 | 0/18 |
225+
226+
**Order**: 22a → 22b → 22c → 22d. Completion = **LLD ELIMINATED**
227+
228+
## Phase 23 — C/C++ Frontend (ERA 4)
229+
230+
TML compiles C and C++ code directly. Complete toolchain independence.
231+
232+
| ID | Task | Status | Priority | Progress |
233+
|----|------|--------|----------|----------|
234+
| 23a | [C Preprocessor](phase23a_c-preprocessor/) | Planned | P2 | 0/20 |
235+
| 23b | [C17 Frontend](phase23b_c-frontend/) | Planned | P2 | 0/24 |
236+
| 23c | [C++ Subset Frontend](phase23c_cpp-subset-frontend/) | Planned | P2 | 0/22 |
237+
238+
**Order**: 23a → 23b → 23c. Completion = **FULL TOOLCHAIN INDEPENDENCE**
239+
199240
## Research
200241

201242
| ID | Task | Status | Priority | Progress |
@@ -224,7 +265,10 @@ Wire everything together, port tooling, execute three-stage bootstrap verificati
224265
```
225266
Active now: Phase 1 (language), Phase 4 (tooling), Phase 8 (DB), Phase 10 (HTTP/build)
226267
Next: Phase 12 (self-hosting foundation) — can start immediately
227-
Then: Phase 13 (TML frontend) — after Phase 12 complete
228-
Future: Phase 14+ (type checker, IR pipeline, codegen, bootstrap)
229-
Long-term: Custom backend, custom linker, C/C++ frontend (see independence plan)
268+
Then: Phase 13-17 (ERA 1: TML compiles itself) — 25 tasks, 544 items
269+
Then: Phase 18-21 (ERA 2: custom backend, eliminate LLVM) — 7 tasks, 151 items
270+
Then: Phase 22 (ERA 3: custom linker, eliminate LLD) — 4 tasks, 78 items
271+
Finally: Phase 23 (ERA 4: C/C++ frontend, full independence) — 3 tasks, 66 items
272+
273+
TOTAL INDEPENDENCE PLAN: 39 tasks, 839 items across 4 eras
230274
```
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
{
2+
"status": "pending",
3+
"createdAt": "2026-04-06T01:36:58.343Z",
4+
"updatedAt": "2026-04-06T01:36:58.343Z"
5+
}
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
# Proposal: phase18a — MIR → MachIR Lowering
2+
3+
## Why
4+
5+
The TML compiler currently depends entirely on LLVM for code generation. LLVM is a ~500MB binary dependency with complex build requirements, slow compilation of the compiler itself, and an API surface that couples TML tightly to LLVM internals. ERA 2 eliminates this dependency by building a native backend in pure TML. Phase 18a establishes the foundation: a machine-level intermediate representation (MachIR) that sits between the existing MIR and raw bytes.
6+
7+
MachIR is the architectural separation point that makes the rest of ERA 2 possible. Phases 18b (encoding) and 19a (register allocation) operate entirely on MachIR — they never touch MIR. This means the register allocator can be swapped from stack-only (Phase 18, MVP) to linear scan (Phase 19) without changing any lowering logic.
8+
9+
## What Changes
10+
11+
- New TML module `compiler/native/machir.tml` — MachIR data types (VirtualReg, MachInst, MachBlock, MachFunc)
12+
- New TML module `compiler/native/mir_lower.tml` — MIR → MachIR lowering pass
13+
- New TML module `compiler/native/stack_alloc.tml` — stack-only register allocator (every VirtualReg → stack slot)
14+
- New TML module `compiler/native/frame.tml` — stack frame layout, prologue/epilogue emission
15+
- MachIR is NOT emitted to disk; it is an in-memory structure consumed by phase 18b encoder
16+
17+
## Design Decisions
18+
19+
**Unlimited virtual registers**: VirtualReg is a U64 counter. The lowering phase never reuses registers. This simplifies correctness — no SSA destruction needed. The allocator (phase 19) handles physical register assignment.
20+
21+
**Stack-only allocation as Phase 18 MVP**: Every VirtualReg gets its own 8-byte stack slot. This is correct but slow. The tradeoff is acceptable for phase 18 because correctness is the only goal. Linear scan in phase 19 replaces this path without changing MachIR.
22+
23+
**Phi node destruction via parallel copies**: MIR phi nodes are lowered to parallel-copy sequences inserted at block predecessors. This matches the standard SSA destruction algorithm and avoids swap cycles.
24+
25+
## Impact
26+
27+
- Affected specs: docs/specs/native-backend.md (new)
28+
- Affected code: compiler/src/backend/ (new native/ subdirectory), compiler/src/cli/ (--backend=native flag stub)
29+
- Breaking change: NO — LLVM backend remains default, native backend is opt-in via --backend=native
30+
- User benefit: First step toward eliminating the 500MB LLVM dependency; faster compiler builds
31+
32+
## Risk
33+
34+
LOW. MachIR is a pure data structure transformation. It does not touch the parser, type checker, or existing codegen. If MachIR lowering produces wrong output, the only symptom is incorrect machine code — the existing LLVM path is unaffected. Tests verify MachIR structure directly without executing the output.
35+
36+
## Reference
37+
38+
- chibicc: IR → code generation in ~500 LOC (codegen.c)
39+
- qbe: SSA → machine code lowering in amd64/isel.c
40+
- TCC: tccgen.c register and stack slot management
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
## Status: 0/20 items complete
2+
3+
## Phase 1: MachIR Data Types
4+
- [ ] 1.1 Define `VirtualReg` type (wraps U64 ID, never reused, unlimited count)
5+
- [ ] 1.2 Define `MachInst` enum (Mov, Add, Sub, Imul, Idiv, Cmp, Jcc, Call, Ret, Push, Pop, Lea, Spill, Reload)
6+
- [ ] 1.3 Define `MachBlock` (ID, list of MachInst, successor block IDs)
7+
- [ ] 1.4 Define `MachFunc` (name, list of MachBlock, virtual reg count, stack frame size)
8+
9+
## Phase 2: MIR → MachIR Lowering
10+
- [ ] 2.1 Lower MIR arithmetic (BinOp Add/Sub/Mul/Div) → MachInst with fresh VirtualRegs
11+
- [ ] 2.2 Lower MIR memory ops (Load, Store, Alloca) → MachInst with stack slot references
12+
- [ ] 2.3 Lower MIR control flow (Goto, Branch, Switch) → MachBlock edges + Jcc/JMP
13+
- [ ] 2.4 Lower MIR function calls (CallInst) → MachInst::Call + arg/return VirtualReg assignments
14+
- [ ] 2.5 Lower MIR comparisons (Eq, Ne, Lt, Le, Gt, Ge) → CMP + Jcc sequence
15+
- [ ] 2.6 Lower MIR phi nodes → parallel-copy sequences at block predecessors
16+
17+
## Phase 3: Stack-Only Register Allocation
18+
- [ ] 3.1 Assign each VirtualReg a unique stack slot (8-byte aligned, no sharing)
19+
- [ ] 3.2 Insert Spill before each instruction that defines a VirtualReg
20+
- [ ] 3.3 Insert Reload before each instruction that uses a VirtualReg
21+
- [ ] 3.4 Verify every VirtualReg reference is replaced by a stack slot offset
22+
23+
## Phase 4: Stack Frame Layout
24+
- [ ] 4.1 Compute total frame size: count stack slots × 8 bytes, align to 16 bytes
25+
- [ ] 4.2 Emit function prologue (PUSH RBP, MOV RBP RSP, SUB RSP frame_size)
26+
- [ ] 4.3 Emit function epilogue (ADD RSP frame_size, POP RBP, RET)
27+
- [ ] 4.4 Encode [RBP - offset] addressing for all stack slot references
28+
29+
## Phase 5: Testing
30+
- [ ] 5.1 Lower 5 MIR programs (factorial, fib, hello world, struct return, loop) — verify MachIR structure matches expected block/inst count
31+
- [ ] 5.2 Verify prologue/epilogue generated correctly for each test function — frame size divisible by 16, all VirtualRegs assigned slots
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
{
2+
"status": "pending",
3+
"createdAt": "2026-04-06T01:36:59.299Z",
4+
"updatedAt": "2026-04-06T01:36:59.299Z"
5+
}
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
# Proposal: phase18b — x86_64 Instruction Encoding
2+
3+
## Why
4+
5+
Phase 18a produces MachIR — an in-memory list of abstract machine instructions with virtual registers. Phase 18b converts that list to raw bytes. This is the lowest layer of the native backend: given a MachInst and physical register assignments (or stack slots from phase18a's stack-only allocator), produce the correct sequence of bytes that the x86_64 CPU will execute.
6+
7+
x86_64 instruction encoding is notoriously complex: variable-length instructions (1–15 bytes), REX prefixes for 64-bit operands, ModRM and SIB bytes for memory addressing, RIP-relative addressing for position-independent code. Getting these right is a prerequisite for every subsequent phase. Phase 18c (COFF emission) and Phase 19 (register allocator) both depend on this encoder being correct.
8+
9+
## What Changes
10+
11+
- New TML module `compiler/native/x86_encode.tml` — encoding functions for each instruction class
12+
- New TML module `compiler/native/x86_emit.tml``emit_func(MachFunc) -> Buffer` top-level emitter with two-pass branch patching
13+
- Helper types: `ModRM`, `SIB`, `REX`, `PhysReg` enum, `MemOperand` (base + displacement)
14+
- No changes to existing LLVM backend or MIR
15+
16+
## Design Decisions
17+
18+
**Core subset only (Phase 18)**: MOV, ADD, SUB, IMUL, IDIV, NEG, NOT, AND, OR, XOR, SHL, SHR, SAR, CMP, TEST, Jcc, JMP, CALL, RET, PUSH, POP, LEA. SSE/AVX deferred to Phase 20a. This subset is sufficient to compile any TML program that uses only integers and pointers.
19+
20+
**Stack-slot operands only (Phase 18)**: The encoder in phase 18 works with the output of the stack-only allocator from phase 18a. Every operand is either a physical register (RSP, RBP, RAX for IDIV convention) or a [RBP-offset] memory reference. Phase 19 replaces the allocator; the encoder does not change.
21+
22+
**Two-pass branch patching**: Forward references require knowing the target block's byte offset before it is emitted. Pass 1 emits all instructions using placeholder displacements. Pass 2 patches rel8 and rel32 fields once all block offsets are known. rel8 vs rel32 selection: use rel8 if |displacement| <= 127, otherwise rel32 (re-emit with longer form — rare for small functions).
23+
24+
**RIP-relative addressing deferred**: Global variable references use RIP-relative addressing. For Phase 18, all globals are accessed via absolute addresses passed as imm64. RIP-relative for globals is added in Phase 20a.
25+
26+
## Impact
27+
28+
- Affected specs: docs/specs/native-backend.md (encoding reference tables)
29+
- Affected code: compiler/native/ (new), no changes to existing paths
30+
- Breaking change: NO — native backend is separate, LLVM path unaffected
31+
- User benefit: Native backend can emit working x86_64 machine code for integer programs
32+
33+
## Risk
34+
35+
MEDIUM. x86_64 encoding has many edge cases: REX.W required for all 64-bit ops, RSP/RBP have special ModRM encodings, some opcodes encode the register in the low 3 bits of the opcode byte. Errors produce incorrect bytes that crash at runtime with no error message. The reference test (task 6.1) against known-correct bytes is the primary correctness guard.
36+
37+
## Reference
38+
39+
- Intel SDM Vol 2 (instruction set reference) — encoding fields defined per instruction
40+
- chibicc codegen.c — minimal x86_64 encoder, ~300 lines, excellent reference
41+
- TCC tccasm.c — complete encoder including all ModRM/SIB cases
42+
- AMD64 ABI Vol 1 §3.2 — calling convention that drives register usage
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
## Status: 0/22 items complete
2+
3+
## Phase 1: Encoding Infrastructure
4+
- [ ] 1.1 Implement `ModRM` byte builder (mod[2], reg[3], rm[3] fields, addressing mode enum)
5+
- [ ] 1.2 Implement `SIB` byte builder (scale[2], index[3], base[3] fields)
6+
- [ ] 1.3 Implement `REX` prefix builder (REX.W for 64-bit ops, REX.R/X/B for register extension to R8-R15)
7+
- [ ] 1.4 Implement immediate encoding helpers (imm8, imm16, imm32, imm64 → little-endian bytes appended to Buffer)
8+
- [ ] 1.5 Define physical register enum (RAX=0, RCX=1, RDX=2, RBX=3, RSP=4, RBP=5, RSI=6, RDI=7, R8-R15)
9+
10+
## Phase 2: Data Movement
11+
- [ ] 2.1 Encode `MOV r64, r64` (REX.W 0x89 ModRM) and `MOV r64, imm64` (REX.W 0xB8+rd imm64)
12+
- [ ] 2.2 Encode `MOV r64, [RBP-disp]` and `MOV [RBP-disp], r64` with disp8 and disp32 forms
13+
- [ ] 2.3 Encode `LEA r64, [RBP-disp]` (REX.W 0x8D ModRM displacement) for stack address loads
14+
- [ ] 2.4 Encode `PUSH r64` (0x50+rd, REX.B prefix for R8-R15) and `POP r64` (0x58+rd)
15+
16+
## Phase 3: Arithmetic
17+
- [ ] 3.1 Encode `ADD r64, r64`, `ADD r64, imm32`, `SUB r64, r64`, `SUB r64, imm32`
18+
- [ ] 3.2 Encode `IMUL r64, r64` (REX.W 0x0F 0xAF ModRM) and `IDIV r64` (REX.W 0xF7 /7 — dividend in RDX:RAX)
19+
- [ ] 3.3 Encode `NEG r64`, `NOT r64`, `AND r64, r64`, `OR r64, r64`, `XOR r64, r64`
20+
- [ ] 3.4 Encode `SHL r64, CL`, `SHL r64, imm8`, `SHR r64, CL`, `SAR r64, CL` (shift group D2/D3/C1)
21+
22+
## Phase 4: Comparison and Branches
23+
- [ ] 4.1 Encode `CMP r64, r64`, `CMP r64, imm32`, `TEST r64, r64`
24+
- [ ] 4.2 Encode all Jcc rel8 (short) and rel32 (near) forms: JE/JNE/JL/JLE/JG/JGE/JB/JBE/JA/JAE
25+
- [ ] 4.3 Encode `JMP rel8`, `JMP rel32`, `JMP r64` and `CALL rel32`, `CALL r64`
26+
- [ ] 4.4 Encode `RET` (0xC3) and `RET imm16` (0xC2 + imm16 for callee-cleanup conventions)
27+
28+
## Phase 5: MachIR → Bytes Emission
29+
- [ ] 5.1 Implement `emit_func(MachFunc) -> Buffer` — iterate MachBlocks, emit each MachInst to a growing Buffer
30+
- [ ] 5.2 Implement two-pass branch patching: pass 1 records block start offsets, pass 2 patches all Jcc/JMP displacements
31+
- [ ] 5.3 Handle forward references: use 32-bit displacement placeholders (0x00000000), back-patch after all blocks emitted
32+
33+
## Phase 6: Testing
34+
- [ ] 6.1 Encode 20 known instruction sequences, compare output bytes byte-for-byte against nasm/objdump reference
35+
- [ ] 6.2 End-to-end: lower factorial MIR → MachIR (phase18a) → x86 bytes → write to executable memory page → call via FFI → verify return value
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
{
2+
"status": "pending",
3+
"createdAt": "2026-04-06T01:36:59.743Z",
4+
"updatedAt": "2026-04-06T01:36:59.743Z"
5+
}
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
# Proposal: phase18c — PE/COFF Object File Emission
2+
3+
## Why
4+
5+
Phases 18a and 18b produce x86_64 machine code bytes in memory. Phase 18c wraps those bytes in a PE/COFF object file so the existing LLD linker (already embedded in TML) can link them into an executable. This completes the Phase 18 MVP: a working end-to-end native backend path for Windows.
6+
7+
PE/COFF is well-documented (Microsoft PE/COFF specification, version 11.0) and the format is relatively simple for object files (as opposed to executable images). Object files do not need an Optional Header, a PE signature, or an import directory. They need only: a COFF file header, section headers, raw section data (.text, .data, .rdata), a symbol table, relocations, and a string table.
8+
9+
## What Changes
10+
11+
- New TML module `compiler/native/coff_emit.tml` — COFF file header, section header, symbol, relocation structures and writer
12+
- New TML module `compiler/native/obj_writer.tml` — top-level `write_obj(MachModule) -> Buffer` that orchestrates all COFF components
13+
- CLI: `--backend=native` flag (stubbed in 18a) now drives the full pipeline on Windows
14+
- No changes to the LLVM backend
15+
16+
## Design Decisions
17+
18+
**Use LLD for linking (Phase 18)**: The existing LLD linker in `compiler/src/backend/lld_linker.cpp` accepts standard COFF .obj files. Phase 18c targets LLD compatibility. A custom linker (ERA 3) replaces LLD later. This means phase 18c gets a working end-to-end system immediately, without waiting for linker work.
19+
20+
**.pdata section (Windows SEH)**: Windows requires `.pdata` with RUNTIME_FUNCTION entries for any function that modifies RSP (i.e., every non-leaf function). Without .pdata, stack unwinding fails and C++ exceptions / debuggers cannot walk the stack. Phase 18c emits minimal RUNTIME_FUNCTION entries pointing to a trivial unwind code.
21+
22+
**Relocation types**: Two types cover Phase 18's needs. IMAGE_REL_AMD64_REL32 (0x0004) patches CALL rel32 instructions that reference external symbols. IMAGE_REL_AMD64_ADDR64 (0x0001) patches 64-bit absolute addresses for global data. Both are standard and supported by LLD and MSVC link.exe.
23+
24+
**String table for long names**: COFF symbol names are 8 bytes. Names longer than 8 bytes use the string table format: the Name field contains 0x00000000 followed by a 4-byte offset into the string table that follows the symbol table. All TML function names (which include module paths) will likely exceed 8 bytes.
25+
26+
## Impact
27+
28+
- Affected specs: docs/specs/native-backend.md (object file layout section)
29+
- Affected code: compiler/native/coff_emit.tml (new), compiler/native/obj_writer.tml (new), compiler/src/cli/commands/build.cpp (--backend=native routing)
30+
- Breaking change: NO — native backend is opt-in, LLVM path unchanged
31+
- User benefit: `tml build --backend=native` produces working executables on Windows without LLVM
32+
33+
## Risk
34+
35+
MEDIUM. The COFF format has strict byte-level layout requirements. Off-by-one errors in section offsets or symbol table pointers cause LLD to reject the object with cryptic errors. The integration test (task 6.1) is the primary correctness signal. Testing against both LLD and MSVC link.exe (task 6.3) provides confidence in format correctness.
36+
37+
## Reference
38+
39+
- Microsoft PE/COFF Specification v11.0 — authoritative byte-level format definition
40+
- LLVM lib/MC/WinCOFFObjectWriter.cpp — reference implementation
41+
- chibicc codegen.c, pe_object.c — minimal COFF writer in ~400 LOC
42+
- Windows SDK winnt.h — IMAGE_SECTION_HEADER, IMAGE_SYMBOL, IMAGE_RELOCATION definitions

0 commit comments

Comments
 (0)