Skip to content

Commit d4b8368

Browse files
committed
feat(optimization): add comprehensive performance optimization strategy and JIT compiler design documentation
1 parent 7dcdca9 commit d4b8368

6 files changed

Lines changed: 317 additions & 0 deletions

File tree

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
# ProXPL Performance Optimization Strategy (phased)
2+
3+
Prepared: 2025-12-12
4+
5+
Overview
6+
- Goal: transform ProXPL interpreter into a high-performance runtime with production-grade tooling.
7+
- Approach: phased milestones from low-risk interpreter optimizations → efficient runtime → JIT integration → ecosystem.
8+
9+
Milestones
10+
1. Hotspot Analysis & Microbenchmarks (2 weeks)
11+
- Build benchmark harness targeting arithmetic, calls, string ops, allocation.
12+
- Collect per-op timings, branch misprediction, cache miss rates.
13+
2. Bytecode & Dispatch Redesign (3 weeks)
14+
- Move to register-friendly bytecode where appropriate.
15+
- Use compact operand encodings and operand folding for common cases.
16+
- Implement computed-goto dispatch for supported compilers and a hot-path inline dispatch for Windows/MSVC.
17+
3. Inline Caching & Type Specialization (4 weeks)
18+
- Add monomorphic/polymorphic inline caches for property access and call targets.
19+
- Specialize arithmetic ops by observed operand types and emit fast paths.
20+
4. Memory Management & GC (4-6 weeks)
21+
- Implement a generational, moving GC (semi-space or copying nursery) + incremental major collector.
22+
- Add write barriers and object header tagging for fast type checks.
23+
5. JIT Compiler (LLVM-backed) (8-12 weeks)
24+
- Lower hot traces or functions to a compact IR, apply type-specialization, and emit native code via LLVM ORC JIT.
25+
6. Tooling & Ecosystem (ongoing)
26+
- LSP, VSCode extension, package manager improvements, CI/CD, docs, examples.
27+
28+
Examples and small designs are in `src/vm/vm_core_opt.c` and `deliverables/03_jit/JIT_DESIGN.md`.

deliverables/03_jit/JIT_DESIGN.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
# JIT Compiler Design (LLVM-based) - Overview
2+
3+
Target: an LLVM-backed JIT that compiles hot functions/traces to native code and integrates with the interpreter.
4+
5+
High-level flow
6+
- Frontend bytecode → High-level IR (typed SSA) → Optimizations (const-folding, type specialization) → LLVM IR → Native code.
7+
8+
IR translation notes
9+
- Map ProXPL bytecode ops to typed IR ops (i32/float64/ptr) with explicit boxing/unboxing inserted only at boundaries.
10+
- Represent closures with environment pointer + function pointer pair; lower closures to structs.
11+
12+
Sample lowering (arithmetic + call)
13+
- Bytecode: PUSH_CONST a; PUSH_CONST b; ADD; CALL print, argc=1
14+
- IR: tmp0 = load_const(a); tmp1 = load_const(b); tmp2 = add_f64(tmp0,tmp1); call print(tmp2)
15+
16+
Memory management
17+
- Generated code will use runtime calling conventions to allocate and root values.
18+
- Use stack maps and LLVM GC metadata or explicit root registration for conservative root scanning.
19+
20+
Integration plan
21+
- Keep interpreter as fallback; implement OSR and tiering: interpreter → JIT for hot functions.
22+
- Use a call boundary ABI that allows compiled/native code to call interpreter functions and vice versa.
23+
24+
See `deliverables/04_jit_examples/` for prototype code and build instructions (to be added).

src/vm/vm_core_opt.c

Lines changed: 163 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,163 @@
1+
/*
2+
* vm_core_opt.c
3+
*
4+
* Optimized VM core loop prototype for ProXPL.
5+
* - computed-goto dispatch (gcc/clang)
6+
* - inline caching skeleton for call targets/property lookup
7+
* - fast numeric path for binary arithmetic (type-specialized)
8+
*
9+
* This file is a drop-in experimental core; it is intentionally self-contained
10+
* and documents the optimizations to be integrated into the main VM.
11+
*/
12+
13+
#include "../../include/bytecode.h"
14+
#include <stdio.h>
15+
#include <string.h>
16+
17+
/* Inline cache entry for call targets / property access */
18+
typedef struct {
19+
uint32_t site_pc; /* bytecode pc of the call/property */
20+
const char *target_name; /* cached name (for simple engines) */
21+
void *native_ptr; /* direct native function pointer if resolved */
22+
uint32_t hit_count;
23+
} ICEntry;
24+
25+
/* Micro-optimization: expect/likely macros for branch prediction */
26+
#if defined(__GNUC__)
27+
#define likely(x) __builtin_expect(!!(x), 1)
28+
#define unlikely(x) __builtin_expect(!!(x), 0)
29+
#else
30+
#define likely(x) (x)
31+
#define unlikely(x) (x)
32+
#endif
33+
34+
/* Optimized dispatch; similar to vm_dispatch but with fast-paths and IC hooks */
35+
int vm_execute_optimized(const Chunk *chunk) {
36+
size_t ip = 0;
37+
Value stack[2048];
38+
int sp = 0;
39+
40+
/* small polymorphic inline cache for demonstration */
41+
ICEntry icache[64];
42+
memset(icache, 0, sizeof(icache));
43+
44+
#if defined(__GNUC__) || defined(__clang__)
45+
static void *dispatch_table[256] = { &&do_NOP };
46+
#define DISPATCH() goto *dispatch_table[op]
47+
#else
48+
#define DISPATCH() goto dispatch_switch
49+
#endif
50+
51+
for (;;) {
52+
if (ip >= chunk->code_len) return 0;
53+
uint8_t op = chunk->code[ip++];
54+
55+
#if defined(__GNUC__) || defined(__clang__)
56+
/* fill common handlers once */
57+
dispatch_table[OP_NOP] = &&do_NOP;
58+
dispatch_table[OP_PUSH_CONST] = &&do_PUSH_CONST;
59+
dispatch_table[OP_CALL] = &&do_CALL;
60+
dispatch_table[OP_ADD] = &&do_ADD_FAST;
61+
dispatch_table[OP_HALT] = &&do_HALT;
62+
DISPATCH();
63+
64+
do_NOP: { continue; }
65+
66+
do_PUSH_CONST: {
67+
size_t read = 0;
68+
uint64_t idx = read_uleb128_from(chunk->code + ip, chunk->code_len - ip, &read);
69+
ip += read;
70+
stack[sp++] = consttable_get(&chunk->constants, (size_t)idx);
71+
continue;
72+
}
73+
74+
/* Fast numeric add - type-specialized path */
75+
do_ADD_FAST: {
76+
if (unlikely(sp < 2)) return -1;
77+
Value b = stack[--sp];
78+
Value a = stack[--sp];
79+
if (likely(a.type == VAL_NUMBER && b.type == VAL_NUMBER)) {
80+
Value r; r.type = VAL_NUMBER; r.as.number = a.as.number + b.as.number;
81+
stack[sp++] = r; continue;
82+
}
83+
/* fallback to generic add handler - simple example */
84+
fprintf(stderr, "slow path: non-number add\n");
85+
return -1;
86+
}
87+
88+
do_CALL: {
89+
/* inline cache lookup (very simple) */
90+
size_t saved_ip = ip;
91+
size_t read = 0;
92+
uint64_t idx = read_uleb128_from(chunk->code + ip, chunk->code_len - ip, &read);
93+
ip += read;
94+
if (ip >= chunk->code_len) return -1;
95+
uint8_t argc = chunk->code[ip++];
96+
97+
Value callee = consttable_get(&chunk->constants, (size_t)idx);
98+
/* probe icache for site */
99+
uint32_t site = (uint32_t)(saved_ip & 63);
100+
ICEntry *ent = &icache[site];
101+
if (ent->site_pc == (uint32_t)saved_ip && ent->native_ptr) {
102+
ent->hit_count++;
103+
/* call native_ptr directly (demo only) */
104+
typedef Value (*native_fn)(Value*, int);
105+
native_fn fn = (native_fn)ent->native_ptr;
106+
Value ret = fn(&stack[sp-argc], (int)argc);
107+
sp -= argc; stack[sp++] = ret; continue;
108+
}
109+
110+
/* resolve and fill cache (demo: only 'print') */
111+
if (callee.type == VAL_STRING && strcmp(callee.as.string.chars, "print") == 0) {
112+
/* cache a sentinel to native print handler */
113+
ent->site_pc = (uint32_t)saved_ip;
114+
ent->target_name = callee.as.string.chars;
115+
ent->native_ptr = NULL; /* optionally assign a function pointer */
116+
/* fallback: implement print directly */
117+
for (int i = (int)argc - 1; i >= 0; --i) {
118+
Value arg = stack[--sp];
119+
if (arg.type == VAL_STRING) fputs(arg.as.string.chars, stdout);
120+
else if (arg.type == VAL_NUMBER) printf("%g", arg.as.number);
121+
else if (arg.type == VAL_BOOL) fputs(arg.as.boolean?"true":"false", stdout);
122+
else fputs("<obj>", stdout);
123+
if (i) putchar(' ');
124+
}
125+
putchar('\n');
126+
Value r = make_null_const(); stack[sp++] = r; continue;
127+
}
128+
129+
fprintf(stderr, "unsupported call target\n"); return -1;
130+
}
131+
132+
do_HALT: { return 0; }
133+
134+
#else
135+
dispatch_switch:
136+
switch (op) {
137+
case OP_NOP: break;
138+
case OP_PUSH_CONST: {
139+
size_t read=0; uint64_t idx = read_uleb128_from(chunk->code+ip, chunk->code_len-ip, &read); ip+=read;
140+
stack[sp++] = consttable_get(&chunk->constants, (size_t)idx);
141+
} break;
142+
case OP_ADD: {
143+
if (sp<2) return -1; Value b = stack[--sp]; Value a = stack[--sp];
144+
if (a.type==VAL_NUMBER && b.type==VAL_NUMBER) { Value r; r.type=VAL_NUMBER; r.as.number=a.as.number+b.as.number; stack[sp++]=r; }
145+
else { fprintf(stderr,"slow path: non-number add\n"); return -1; }
146+
} break;
147+
case OP_CALL: {
148+
size_t read=0; uint64_t idx = read_uleb128_from(chunk->code+ip, chunk->code_len-ip, &read); ip+=read;
149+
if (ip>=chunk->code_len) return -1; uint8_t argc = chunk->code[ip++];
150+
Value callee = consttable_get(&chunk->constants,(size_t)idx);
151+
if (callee.type==VAL_STRING && strcmp(callee.as.string.chars,"print")==0) {
152+
for (int i=(int)argc-1;i>=0;--i){ Value arg=stack[--sp]; if (arg.type==VAL_STRING) fputs(arg.as.string.chars, stdout); else if (arg.type==VAL_NUMBER) printf("%g",arg.as.number); else fputs("<obj>",stdout); if (i) putchar(' ');} putchar('\n'); Value r = make_null_const(); stack[sp++]=r; break;
153+
}
154+
fprintf(stderr,"unsupported call target\n"); return -1;
155+
}
156+
case OP_HALT: return 0;
157+
default: fprintf(stderr,"unhandled opcode %u\n",op); return -1;
158+
}
159+
#endif
160+
}
161+
162+
return 0;
163+
}

tools/bench/bench_simple.c

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
/* bench_simple.c
2+
* A tiny microbenchmark that constructs a numeric-add heavy Chunk and
3+
* executes it repeatedly to measure interpreter throughput.
4+
* This file expects the project to expose Chunk/bytecode helpers and
5+
* a vm_run_chunk_simple() entry in `src/vm/vm_dispatch.c`.
6+
*/
7+
8+
#include <stdio.h>
9+
#include <stdlib.h>
10+
#include <time.h>
11+
#include "../../include/bytecode.h"
12+
13+
extern int vm_run_chunk_simple(const Chunk *chunk);
14+
15+
#if defined(_WIN32)
16+
#include <windows.h>
17+
static double now_seconds(void) {
18+
static LARGE_INTEGER freq;
19+
LARGE_INTEGER cnt;
20+
if (freq.QuadPart == 0) QueryPerformanceFrequency(&freq);
21+
QueryPerformanceCounter(&cnt);
22+
return (double)cnt.QuadPart / (double)freq.QuadPart;
23+
}
24+
#else
25+
static double now_seconds(void) {
26+
struct timespec t;
27+
clock_gettime(CLOCK_MONOTONIC, &t);
28+
return t.tv_sec + t.tv_nsec * 1e-9;
29+
}
30+
#endif
31+
32+
int main(int argc, char **argv) {
33+
(void)argc; (void)argv;
34+
/* Build a simple chunk: push 100 constants then add them pairwise */
35+
Chunk c; initChunk(&c);
36+
for (int i = 0; i < 100; ++i) {
37+
emit_constant(&c, NUMBER_VAL((double)i));
38+
}
39+
/* Create add sequence */
40+
for (int i = 0; i < 99; ++i) emit_opcode(&c, OP_ADD);
41+
emit_opcode(&c, OP_HALT);
42+
43+
int runs = 500;
44+
double t0 = now_seconds();
45+
for (int i = 0; i < runs; ++i) {
46+
int r = vm_run_chunk_simple(&c);
47+
if (r != 0) { fprintf(stderr, "vm returned %d\n", r); break; }
48+
}
49+
double t1 = now_seconds();
50+
printf("runs=%d total=%.6fs avg=%.6fms\n", runs, t1-t0, (t1-t0)/runs*1000.0);
51+
52+
freeChunk(&c);
53+
return 0;
54+
}

tools/bench/build_msvc.ps1

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
#!/usr/bin/env pwsh
2+
<#
3+
build_msvc.ps1
4+
Build the bench_simple.exe using MSVC `cl.exe`.
5+
6+
Usage: run this inside a "Developer Command Prompt for VS" or after running
7+
`VsDevCmd.bat` so `cl.exe` and the toolchain are on PATH.
8+
#>
9+
10+
if (-not (Get-Command cl.exe -ErrorAction SilentlyContinue)) {
11+
Write-Error "cl.exe not found in PATH. Run this script from 'Developer Command Prompt for VS' or run VsDevCmd.bat first."
12+
exit 1
13+
}
14+
15+
$root = (Resolve-Path "$PSScriptRoot\..\..").Path
16+
$include = Join-Path $root "include"
17+
$src = Join-Path $root "src"
18+
$bench = Join-Path $PSScriptRoot "bench_simple.c"
19+
$files = @(
20+
'"' + $bench + '"',
21+
'"' + (Join-Path $src "vm\vm_dispatch.c") + '"',
22+
'"' + (Join-Path $src "vm\bytecode.c") + '"'
23+
)
24+
25+
$incs = "/I""$include"" /I""$src"""
26+
$clopts = "/nologo /O2 /W3 /std:c11"
27+
$outfile = "bench_simple.exe"
28+
29+
$cmd = "cl.exe $clopts $incs /Fe:$outfile " + ($files -join ' ')
30+
Write-Host "Running: $cmd"
31+
Invoke-Expression $cmd

tools/lsp/README.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
# ProXPL LSP - Starter Notes
2+
3+
This folder will contain the language server implementation (LSP) supporting:
4+
- syntax highlighting
5+
- auto-completion
6+
- diagnostics
7+
- go-to-definition
8+
- signature help
9+
10+
Planned approach
11+
- Use Node.js + TypeScript with `vscode-languageserver` for rapid development.
12+
- Alternatively provide a lightweight Python `pygls` server for contributors preferring Python.
13+
14+
Next steps
15+
1. Add grammar-based parser bindings (re-use `src/parser` AST).
16+
2. Implement basic diagnostics by running the existing parser and type checker.
17+
3. Implement completion/signature by walking the AST and symbol tables.

0 commit comments

Comments
 (0)