Zig's comptime feature executes code at compile time, generating optimized runtime code with zero overhead. Agave uses this extensively for lookup tables, feature detection, and type-specialized dispatch.
comptime means "computed at compile time". The compiler evaluates the expression during compilation, and the result is baked into the binary.
flowchart LR
classDef setup fill:#dbeafe,stroke:#3b82f6,color:#1e3a5f
classDef sync fill:#dcfce7,stroke:#22c55e,color:#14532d
classDef migration fill:#fef9c3,stroke:#eab308,color:#713f12
classDef success fill:#bbf7d0,stroke:#16a34a,color:#14532d
classDef danger fill:#fee2e2,stroke:#ef4444,color:#7f1d1d
classDef optional fill:#f3e8ff,stroke:#9333ea,color:#581c87
Source["Source Code\n(comptime expression)"]:::setup
Compiler["Zig Compiler\n(compile time)"]:::sync
Value["Constant Value\nbaked into binary"]:::migration
Binary["Executable Binary\n(.rodata section)"]:::success
Runtime["Runtime\n(user runs program)"]:::setup
Result["Instant result\n(no computation)"]:::success
Source --> Compiler
Compiler --> Value
Value --> Binary
Runtime --> Binary
Binary --> Result
subgraph CompilePhase["Compile Phase (your machine, once)"]
Source
Compiler
Value
end
subgraph RunPhase["Run Phase (user's machine, many times)"]
Runtime
Binary
Result
end
const table_size = 256; // Regular constant
const doubled = comptime table_size * 2; // Computed at compile time (512)
// The binary contains the value 512, not the multiplicationWhen to use comptime:
- Building lookup tables
- Feature detection based on target platform
- Type-level computations
- Format string validation
Pre-computing values at compile time eliminates runtime arithmetic.
flowchart TD
classDef setup fill:#dbeafe,stroke:#3b82f6,color:#1e3a5f
classDef sync fill:#dcfce7,stroke:#22c55e,color:#14532d
classDef migration fill:#fef9c3,stroke:#eab308,color:#713f12
classDef success fill:#bbf7d0,stroke:#16a34a,color:#14532d
classDef danger fill:#fee2e2,stroke:#ef4444,color:#7f1d1d
classDef optional fill:#f3e8ff,stroke:#9333ea,color:#581c87
NaiveInput["8-bit FP8 value\n(e.g. 0xA7)"]:::setup
NaiveOps["Runtime: extract bits,\nbranch, pow(), multiply\n~30 instructions"]:::danger
NaiveOut["f32 result"]:::migration
ComptimeLoop["Compiler: loop 0..256\nfp8e4m3Compute(i)"]:::sync
LUT["[256]f32 table\nin .rodata\n(1 KB)"]:::migration
FastInput["8-bit FP8 value\n(e.g. 0xA7)"]:::setup
LUTLookup["Runtime: array[val]\n1 instruction"]:::sync
FastOut["f32 result"]:::success
NaiveInput --> NaiveOps
NaiveOps --> NaiveOut
ComptimeLoop --> LUT
LUT --> LUTLookup
FastInput --> LUTLookup
LUTLookup --> FastOut
subgraph Naive["Naive (runtime per call)"]
NaiveInput
NaiveOps
NaiveOut
end
subgraph LUTPath["LUT (comptime table, runtime lookup)"]
ComptimeLoop
LUT
FastInput
LUTLookup
FastOut
end
Naive approach (runtime conversion):
pub fn fp8e4m3ToF32(val: u8) f32 {
// Extract sign, exponent, mantissa from 8-bit value
const sign = (val >> 7) & 1;
const exp = (val >> 3) & 0xF;
const mant = val & 0x7;
// Compute float value
const bias = 7;
const sign_mult = if (sign == 1) -1.0 else 1.0;
if (exp == 0) {
// Subnormal
return sign_mult * (@as(f32, @floatFromInt(mant)) / 8.0) * std.math.pow(f32, 2.0, 1 - bias);
} else {
// Normal
const frac = 1.0 + (@as(f32, @floatFromInt(mant)) / 8.0);
return sign_mult * frac * std.math.pow(f32, 2.0, @as(f32, @floatFromInt(exp)) - bias);
}
}Cost per call: ~20-30 instructions (bit shifts, branches, floating-point arithmetic, pow() call).
Optimized approach (comptime lookup table):
// Build 256-entry lookup table at compile time
const fp8e4m3_lut: [256]f32 = blk: {
var table: [256]f32 = undefined;
for (0..256) |i| {
table[i] = fp8e4m3Compute(@intCast(i)); // Computed once at compile time
}
break :blk table;
};
// Runtime dequantization is a single array lookup
pub inline fn fp8e4m3ToF32(val: u8) f32 {
return fp8e4m3_lut[val];
}Cost per call: 1 instruction (load from .rodata section).
Speedup: 20-30× faster for the dequantization itself. In a full GEMV, this saves ~5-10% total time.
const table = blk: {
var result: [N]T = undefined;
// ... compute result ...
break :blk result; // Return from comptime block
};Key points:
blk:is a labeled blockbreak :blk valuereturns from the block- The entire block runs at compile time
resultbecomes a compile-time constant
IQ4_NL uses a fixed dequantization table (not computed, but verified at comptime):
pub const iq4nl_table: [16]i8 = .{
-127, -104, -83, -65, -49, -35, -22, -10,
1, 13, 25, 38, 53, 69, 89, 113,
};
// Illustrative usage (not a real API function — callers use iq4nl_table directly):
// const val = @as(f32, @floatFromInt(iq4nl_table[nibble])) * scale;Why a table? IQ4_NL uses non-linear quantization — the step sizes aren't uniform. Small values have fine steps, large values have coarse steps. This gives better accuracy than linear Q4.
comptime verification:
comptime {
std.debug.assert(iq4nl_table.len == 16); // 4-bit = 16 values
for (iq4nl_table, 0..) |v, i| {
if (i > 0) {
std.debug.assert(v > iq4nl_table[i - 1]); // Strictly increasing
}
}
}This runs at compile time. If the table is malformed, compilation fails.
Zig's builtin module provides platform information at comptime.
flowchart LR
classDef setup fill:#dbeafe,stroke:#3b82f6,color:#1e3a5f
classDef sync fill:#dcfce7,stroke:#22c55e,color:#14532d
classDef migration fill:#fef9c3,stroke:#eab308,color:#713f12
classDef success fill:#bbf7d0,stroke:#16a34a,color:#14532d
classDef danger fill:#fee2e2,stroke:#ef4444,color:#7f1d1d
classDef optional fill:#f3e8ff,stroke:#9333ea,color:#581c87
BuildCmd["zig build\n-Dtarget=aarch64-macos"]:::setup
Builtin["builtin.os.tag\nbuiltin.cpu.arch\nbuild_options.*"]:::migration
MetalBranch["MetalBackend\ncompiled in"]:::sync
VulkanBranch["VulkanBackend\ncompiled in"]:::sync
CPUBranch["CpuBackend\ncompiled in"]:::sync
Binary["macOS Binary\n(Metal only,\nLinux code absent)"]:::success
LinuxBin["Linux Binary\n(Vulkan only,\nMetal code absent)"]:::success
CPUBin["Other Binary\n(CPU fallback)"]:::success
BuildCmd --> Builtin
Builtin --> MacOS{{"os == .macos?"}}
MacOS -- yes --> MetalBranch
MacOS -- no --> Linux{{"os == .linux?"}}
Linux -- yes --> VulkanBranch
Linux -- no --> CPUBranch
MetalBranch --> Binary
VulkanBranch --> LinuxBin
CPUBranch --> CPUBin
subgraph CompileTime["Compile Time: dead code eliminated"]
MacOS
Linux
MetalBranch
VulkanBranch
CPUBranch
end
const builtin = @import("builtin");
pub fn initBackend() !Backend {
if (comptime builtin.os.tag == .macos) {
return Backend{ .metal = try MetalBackend.init() };
} else if (comptime builtin.os.tag == .linux) {
return Backend{ .vulkan = try VulkanBackend.init() };
} else {
return Backend{ .cpu = try CpuBackend.init() };
}
}Dead code elimination: The compiler generates only the code for the target platform. If compiling for macOS, the Linux and CPU branches are completely removed from the binary.
const has_avx2 = comptime builtin.cpu.features.isEnabled(@import("std").Target.x86.Feature.avx2);
pub fn gemv(...) void {
if (comptime has_avx2) {
gemvAVX2(...); // 256-bit SIMD
} else {
gemvSSE2(...); // 128-bit SIMD fallback
}
}Benefit: No runtime CPU detection overhead. The compiler knows at build time which CPU features are available (based on -mcpu flag or target triple).
// build.zig
const backend_options = b.addOptions();
backend_options.addOption(bool, "enable_metal", true);
backend_options.addOption(bool, "enable_cuda", false);
// backend.zig
const build_options = @import("build_options");
pub const MetalBackend = if (build_options.enable_metal)
@import("metal.zig").MetalBackend
else
NullBackend;Effect: If enable_metal=false, the Metal backend is not compiled at all — @import("metal.zig") never happens, reducing binary size and compile time.
Shader source code can be embedded directly into the binary at compile time.
flowchart LR
classDef setup fill:#dbeafe,stroke:#3b82f6,color:#1e3a5f
classDef sync fill:#dcfce7,stroke:#22c55e,color:#14532d
classDef migration fill:#fef9c3,stroke:#eab308,color:#713f12
classDef success fill:#bbf7d0,stroke:#16a34a,color:#14532d
classDef danger fill:#fee2e2,stroke:#ef4444,color:#7f1d1d
classDef optional fill:#f3e8ff,stroke:#9333ea,color:#581c87
MSL1["common.metal\n(MSL source)"]:::setup
MSL2["elementwise.metal\n(MSL source)"]:::setup
MSL3["gemv.metal\n(MSL source)"]:::setup
MSLN["... (5 more .metal files)"]:::setup
SPV["gemv.spv\n(SPIR-V binary)"]:::setup
EF["@embedFile\n(compile step)"]:::sync
EF2["@embedFile\n(compile step)"]:::sync
Concat["++ concatenation\n(zero-cost, compile time)"]:::sync
ROData[".rodata section\nin binary\n([]const u8 pointer)"]:::migration
ROData2[".rodata section\nin binary\n([]const u8 pointer)"]:::migration
Init["MetalBackend.init()\nnewLibraryWithSource(src)\n(driver compiles to GPU bytecode)"]:::success
Init2["VulkanBackend.init()\ncreateShaderModule(code)\n(SPIR-V loaded directly)"]:::success
MSL1 --> EF
MSL2 --> EF
MSL3 --> EF
MSLN --> EF
SPV --> EF2
EF --> Concat
Concat --> ROData
EF2 --> ROData2
ROData --> Init
ROData2 --> Init2
subgraph SourceFiles["Source Files (on disk, compile time only)"]
MSL1
MSL2
MSL3
MSLN
SPV
end
subgraph CompileStep["Zig Compiler"]
EF
EF2
Concat
end
subgraph Binary["Agave Binary (.rodata — no external files needed)"]
ROData
ROData2
end
subgraph Runtime["Runtime (zero file I/O)"]
Init
Init2
end
// Concatenate all MSL files at compile time
const msl_source = @embedFile("kernels/metal/common.metal") ++
@embedFile("kernels/metal/elementwise.metal") ++
@embedFile("kernels/metal/norm.metal") ++
@embedFile("kernels/metal/rope.metal") ++
@embedFile("kernels/metal/gemv.metal") ++
@embedFile("kernels/metal/gemm.metal") ++
@embedFile("kernels/metal/sdpa.metal") ++
@embedFile("kernels/metal/deltanet.metal");
pub fn init(allocator: Allocator) !MetalBackend {
// Compile MSL source at runtime (driver compiles to GPU bytecode)
const library = device.newLibraryWithSource(msl_source, null, &err);
// ...
}Benefits:
- Single binary: No need to ship separate
.metalfiles - No file I/O: No
std.fs.cwd().openFile()at runtime - Compile-time concatenation: Multiple files merged into one string at zero cost
Alternative (runtime file loading):
// BAD: Runtime file I/O
const file = try std.fs.cwd().openFile("shaders/gemv.metal", .{});
defer file.close();
const source = try file.readToEndAlloc(allocator, 1024 * 1024);
defer allocator.free(source);Problems:
- Requires shipping shader files alongside binary
- File path resolution (where is the binary run from?)
- Runtime allocation + I/O
- Error handling (file not found, permission denied)
@embedFile eliminates all of these.
Vulkan uses pre-compiled SPIR-V bytecode:
const gemv_spirv = @embedFile("kernels/vulkan/gemv.spv");
pub fn init() !VulkanBackend {
const shader_module = vk.createShaderModule(device, .{
.code_size = gemv_spirv.len,
.code = @ptrCast(gemv_spirv.ptr),
});
// ...
}SPIR-V is binary data — @embedFile works with any file type, not just text.
Generate different code for each type at compile time.
pub fn dequantize(comptime T: type, quant: []const u8, output: []f32) void {
switch (T) {
Q4_0 => dequantizeQ4_0(quant, output),
Q8_0 => dequantizeQ8_0(quant, output),
BF16 => dequantizeBF16(quant, output),
else => @compileError("Unsupported quantization type"),
}
}
// Usage:
dequantize(Q4_0, quant_data, f32_output); // Compiles to direct call to dequantizeQ4_0No runtime dispatch — the switch is resolved at compile time, and only the relevant function is called.
flowchart TD
classDef setup fill:#dbeafe,stroke:#3b82f6,color:#1e3a5f
classDef sync fill:#dcfce7,stroke:#22c55e,color:#14532d
classDef migration fill:#fef9c3,stroke:#eab308,color:#713f12
classDef success fill:#bbf7d0,stroke:#16a34a,color:#14532d
classDef danger fill:#fee2e2,stroke:#ef4444,color:#7f1d1d
classDef optional fill:#f3e8ff,stroke:#9333ea,color:#581c87
Generic["dequantize(comptime T: type, ...)\ngeneric call site"]:::setup
Q4["T == Q4_0\n→ dequantizeQ4_0()\nmonomorphized copy"]:::sync
Q8["T == Q8_0\n→ dequantizeQ8_0()\nmonomorphized copy"]:::sync
BF["T == BF16\n→ dequantizeBF16()\nmonomorphized copy"]:::sync
ERR["T == other\n→ @compileError()\nhalts compilation"]:::danger
BQ4["dequantizeQ4_0\n(direct call, inlined)"]:::success
BQ8["dequantizeQ8_0\n(direct call, inlined)"]:::success
BBF["dequantizeBF16\n(direct call, inlined)"]:::success
subgraph CompileTime["Compiler — resolved at compile time (T is known)"]
direction LR
SW{"switch T"}
Q4
Q8
BF
ERR
SW --> Q4 & Q8 & BF & ERR
end
subgraph Binary["Binary — only called variant present"]
BQ4
BQ8
BBF
end
Generic --> SW
Q4 --> BQ4
Q8 --> BQ8
BF --> BBF
pub const Backend = union(enum) {
cpu: *CpuBackend,
metal: *MetalBackend,
// ...
pub fn gemv(self: Backend, ...) void {
switch (self) {
inline else => |be| be.gemv(...), // Expands to separate case per variant
}
}
};What inline else does:
// Expands to:
switch (self) {
.cpu => |be| be.gemv(...),
.metal => |be| be.gemv(...),
.vulkan => |be| be.gemv(...),
.cuda => |be| be.gemv(...),
.rocm => |be| be.gemv(...),
.webgpu => |be| be.gemv(...),
}Benefit: Compiler sees all calls, can inline them. No function pointer indirection.
flowchart TD
classDef setup fill:#dbeafe,stroke:#3b82f6,color:#1e3a5f
classDef sync fill:#dcfce7,stroke:#22c55e,color:#14532d
classDef migration fill:#fef9c3,stroke:#eab308,color:#713f12
classDef success fill:#bbf7d0,stroke:#16a34a,color:#14532d
classDef danger fill:#fee2e2,stroke:#ef4444,color:#7f1d1d
classDef optional fill:#f3e8ff,stroke:#9333ea,color:#581c87
Call["backend.gemv(args)\n(call site in model code)"]:::setup
IE_Tag["read union tag\n(cheap branch)"]:::migration
IE_CPU["tag == .cpu\nCpuBackend.gemv(args)\n(inlined by compiler)"]:::sync
IE_Metal["tag == .metal\nMetalBackend.gemv(args)\n(inlined by compiler)"]:::sync
IE_Vulkan["tag == .vulkan\nVulkanBackend.gemv(args)\n(inlined by compiler)"]:::sync
VT_Ptr["load vtable pointer\nfrom object header"]:::danger
VT_Offset["add method offset\n(e.g. +8 bytes for gemv)"]:::danger
VT_Load["load function pointer\nfrom vtable memory"]:::danger
VT_Call["indirect call\nvia register\n(branch predictor miss risk)"]:::danger
Res1["direct kernel code\n(zero indirection)"]:::success
Res2["kernel code\n(1 indirect branch)"]:::migration
subgraph InlineElse["inline else dispatch (Zig)"]
direction TB
IE_Tag
IE_CPU
IE_Metal
IE_Vulkan
IE_Tag --> IE_CPU & IE_Metal & IE_Vulkan
end
subgraph VTable["vtable dispatch (C++ / runtime)"]
direction TB
VT_Ptr
VT_Offset
VT_Load
VT_Call
VT_Ptr --> VT_Offset --> VT_Load --> VT_Call
end
Call --> IE_Tag
Call --> VT_Ptr
IE_CPU --> Res1
IE_Metal --> Res1
IE_Vulkan --> Res1
VT_Call --> Res2
Compile-time format string checking prevents runtime errors.
// GOOD: Format string validated at compile time
std.log.info("Temperature: {d}, Tokens: {d}", .{temp, n_tokens});
// BAD: Wrong number of arguments — compile error!
std.log.info("Temperature: {d}, Tokens: {d}", .{temp});
// error: expected 2 format arguments, found 1
// BAD: Wrong type specifier — compile error!
std.log.info("Temperature: {d}", .{"0.5"});
// error: cannot format string with 'd' (expected number)C comparison:
printf("Temperature: %d, Tokens: %d\n", temp); // Runtime crash or garbageZig catches this at compile time.
Validate assumptions at compile time.
flowchart TD
classDef setup fill:#dbeafe,stroke:#3b82f6,color:#1e3a5f
classDef sync fill:#dcfce7,stroke:#22c55e,color:#14532d
classDef migration fill:#fef9c3,stroke:#eab308,color:#713f12
classDef success fill:#bbf7d0,stroke:#16a34a,color:#14532d
classDef danger fill:#fee2e2,stroke:#ef4444,color:#7f1d1d
classDef optional fill:#f3e8ff,stroke:#9333ea,color:#581c87
CA_Eval["evaluate condition\nat compile time"]:::sync
CA_Silent["(nothing emitted)\nbinary produced normally"]:::success
CA_Fail["compile error\n'assertion failed'\nbuild stops immediately\nno binary produced"]:::danger
RA_Eval["evaluate condition\nat runtime"]:::sync
RA_Silent["execution continues"]:::success
RA_Fail["@panic / illegal instruction\nprocess crashes\n(only in Debug/ReleaseSafe)"]:::danger
note1["user never sees bad binary"]:::success
note2["may ship silently in ReleaseFast"]:::optional
subgraph ComptimeAssert["comptime { std.debug.assert(cond) }"]
direction TB
CA_Eval
CA_Pass{"condition\ntrue?"}
CA_Silent
CA_Fail
CA_Eval --> CA_Pass
CA_Pass -- yes --> CA_Silent
CA_Pass -- no --> CA_Fail
end
subgraph RuntimeAssert["std.debug.assert(cond) at runtime"]
direction TB
RA_Eval
RA_Pass{"condition\ntrue?"}
RA_Silent
RA_Fail
RA_Eval --> RA_Pass
RA_Pass -- yes --> RA_Silent
RA_Pass -- no --> RA_Fail
end
CA_Fail -. "catches bug before\nshipping any binary" .-> note1
RA_Fail -. "caught only if\ntest covers that path" .-> note2
const quant_block_elems = 32;
const Q4_0_Block = extern struct {
scale: f16,
quants: [16]u8, // 16 bytes = 32 nibbles
};
comptime {
std.debug.assert(@sizeOf(Q4_0_Block) == 18); // 2 + 16 = 18 bytes
std.debug.assert(16 * 2 == quant_block_elems); // 16 bytes × 2 nibbles/byte
}Effect: If you change quants to [15]u8, compilation fails with an assertion error.
comptime {
std.debug.assert(@alignOf(KVCache) == 64); // Must be cache-line aligned
}comptime {
std.debug.assert(@sizeOf(f32) == 4);
std.debug.assert(@sizeOf(bf16) == 2);
std.debug.assert(@sizeOf(V8) == 32); // 8 × f32
}Why? If porting to a weird platform where f32 isn't 32 bits, these fail at compile time instead of producing silent data corruption at runtime.
// MXFP4 uses E2M1 format (2-bit exponent, 1-bit mantissa)
// 4-bit nibble → 16 possible values stored as a literal constant table
pub fn mxfp4Lookup(nibble: u8) f32 {
const table: [16]f32 = .{
0.0, 0.5, 1.0, 1.5, 2.0, 3.0, 4.0, 6.0,
0.0, -0.5, -1.0, -1.5, -2.0, -3.0, -4.0, -6.0,
};
return table[nibble & 0xF];
}
// For the scaled variant (nibble value × block scale), see nvfp4Dequant.
// The mantissa term for E2M1 is 0.5 * mant (not 1.0 * mant):
// mant=0 → 0.0 addend, mant=1 → 0.5 addend, giving 1.0 and 1.5 for normal values.Single-level lookup: nibble → base value via literal table (no module-level symbol). For NVFP4 scaled dequantization, nvfp4Dequant combines mxfp4Lookup with a block scale.
Block byte sizes are defined as named module-level constants in backend.zig:
pub const q4_0_block_bytes: usize = 18; // 2-byte scale + 16 bytes of nibbles
pub const q8_0_block_bytes: usize = 34; // 2-byte scale + 32 bytes of i8 values
pub const q4_k_block_bytes: usize = 144;
pub const q6_k_block_bytes: usize = 210;
// ...Usage: reference the constant directly by name:
const bytes_per_block = backend.q4_0_block_bytes; // 18
const num_blocks = (total_bytes + backend.q4_0_block_bytes - 1) / backend.q4_0_block_bytes;Benefit: Named constants are self-documenting, always available at comptime, and require no function call overhead.
FP8 dequantization (measured on Apple M4):
| Method | Cycles/call | Speedup |
|---|---|---|
| Runtime computation | ~30 cycles | 1× |
| Comptime LUT | ~1 cycle | 30× |
Binary size impact:
| Feature | Binary size increase |
|---|---|
| FP8 E4M3 LUT (256 × 4 bytes) | +1 KB |
| MXFP4 LUT (16 × 4 bytes) | +64 bytes |
| IQ4_NL LUT (16 × 1 byte) | +16 bytes |
| Embedded Metal shaders (~50 KB source) | +50 KB |
Trade-off: Small binary size increase for significant runtime speedup.
const use_simd = comptime builtin.cpu.arch == .x86_64 or builtin.cpu.arch == .aarch64;
pub fn dotProduct(a: []const f32, b: []const f32) f32 {
if (comptime use_simd) {
return dotProductSIMD(a, b);
} else {
return dotProductScalar(a, b);
}
}pub fn RingBuffer(comptime T: type, comptime size: usize) type {
return struct {
data: [size]T,
head: usize = 0,
pub fn push(self: *@This(), item: T) void {
self.data[self.head] = item;
self.head = (self.head + 1) % size;
}
};
}
// Usage:
var conv_state = RingBuffer(f32, 4).init(); // 4-element f32 ring bufferEach instantiation (RingBuffer(f32, 4), RingBuffer(u32, 8)) generates separate specialized code.
const kernel_name = "gemv_" ++ dtype_name; // Comptime string concat
pub fn loadKernel(comptime dtype: DType) !Pipeline {
const name = comptime kernelName(dtype); // e.g., "gemv_q4_0"
return library.newFunctionWithName(name);
}
fn kernelName(comptime dtype: DType) []const u8 {
return "gemv_" ++ @tagName(dtype); // "gemv_" + "q4_0" → "gemv_q4_0"
}BAD: Using comptime for simple runtime values
const temperature = comptime 0.7; // Pointless — it's already a constantGOOD: Just use const
const temperature: f32 = 0.7;BAD: Large nested loops at comptime slow down compilation
const huge_table = comptime blk: {
var table: [1000000]f32 = undefined;
for (0..1000000) |i| {
table[i] = expensiveComputation(i); // Runs at compile time!
}
break :blk table;
};Effect: Compilation takes minutes instead of seconds.
Better: Use codegen (separate script generates the table, output checked into repo) or load from file at runtime.
WRONG: This doesn't work
var comptime_counter: usize = 0; // Error: comptime variables can't be var
pub fn getNextId() usize {
comptime {
comptime_counter += 1; // Error: comptime mutation not allowed
return comptime_counter;
}
}comptime is for constants, not mutable state.
- Use comptime for lookup tables when the table is small (<10 KB) and frequently accessed
- Use comptime for feature detection to eliminate dead code
- Use @embedFile for resources that ship with the binary
- Use comptime assertions to validate invariants
- Don't use comptime for runtime configuration — use
constor runtime parameters instead
In the code: src/ops/quant.zig (fp8e4m3_lut, iq4nl_table), src/backend/metal.zig (@embedFile for MSL shaders), src/backend/backend.zig (inline else dispatch), build.zig (build_options)
Related: Zig Language Reference — comptime, Chapter 9: CPU SIMD Optimization (uses comptime LUTs)
Next: Appendix: Profiling and Debugging → | Back: Appendix: Mathematical Operations ←