Skip to content

Commit b6f9309

Browse files
committed
feat: Introduce benchmarking plan, performance modernization design, and a register-based VM prototype.
1 parent 94bd37f commit b6f9309

3 files changed

Lines changed: 293 additions & 0 deletions

File tree

benchmarks/PLAN.md

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
# ProXPL Benchmarking Plan
2+
3+
## 1. Objectives
4+
- Measure baseline performance of the current interpreter.
5+
- Track improvements with the new Register VM and JIT.
6+
- Compare against CPython 3.11, Lua 5.4, and Node.js.
7+
8+
## 2. Benchmark Suite Strategy
9+
10+
### 2.1 Microbenchmarks (CPU Core)
11+
Focus on specific VM optimizations (dispatch, arithmetic, calls).
12+
13+
| Benchmark | Description | Target Speedup (vs Current) |
14+
|-----------|-------------|----------------------------|
15+
| `fib.prox` | Recursive Fibonacci (Call overhead) | 3x |
16+
| `loop_sum.prox` | Tight loop addition (1M iters) | 5x |
17+
| `array_access.prox` | Read/Write array elements | 2x |
18+
| `dict_get.prox` | Dictionary lookups (String keys) | 2x |
19+
20+
### 2.2 Macrobenchmarks (Real Workload)
21+
| Benchmark | Description | Target Speedup |
22+
|-----------|-------------|----------------|
23+
| `json_bench.prox` | Parse/Serialize simulated JSON | 3x |
24+
| `http_sim.prox` | Simulated request routing/handling | 2x |
25+
| `nbody.prox` | Physics simulation (Float math) | 10x (with JIT) |
26+
27+
## 3. Tools & Methodology
28+
29+
We will use `hyperfine` for statistical execution time measurement.
30+
31+
### Pre-requisites
32+
- `hyperfine` (install via `cargo install hyperfine` or `apt-get install hyperfine`)
33+
- `python3` (CPython 3.11+)
34+
- `lua` (Lua 5.4)
35+
- `node` (Node.js 20+)
36+
37+
### Execution Commands
38+
39+
Run the following command from the repository root:
40+
41+
```bash
42+
# Example: Running the Fibonacci Benchmark
43+
hyperfine --warmup 3 \
44+
"bin/proxpl run benchmarks/fib.prox" \
45+
"python3 benchmarks/reference/fib.py" \
46+
"lua benchmarks/reference/fib.lua" \
47+
--export-markdown benchmarks/results/fib_results.md
48+
```
49+
50+
## 4. Directory Structure
51+
52+
```
53+
benchmarks/
54+
├── micro/
55+
│ ├── fib.prox
56+
│ ├── loop_sum.prox
57+
│ └── ...
58+
├── macro/
59+
│ └── nbody.prox
60+
├── reference/ <-- Equivalents in Py/Lua/Node
61+
│ ├── fib.py
62+
│ ├── fib.lua
63+
│ └── ...
64+
└── run_all.sh
65+
```

docs/design.md

Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
# ProXPL Performance Modernization Design Doc
2+
3+
## 1. System Architecture (Target State)
4+
5+
```text
6+
+------------------+
7+
| Source Code | (.prox)
8+
+--------+---------+
9+
|
10+
+--------v---------+
11+
| Lexer/Parser | (C / Modernized)
12+
+--------+---------+
13+
|
14+
+--------v---------+
15+
| AST Builder |
16+
+--------+---------+
17+
|
18+
+--------v---------+
19+
| Bytecode Compiler| (AST -> REG-ISA)
20+
| (Reg Allocator) |
21+
+--------+---------+
22+
|
23+
+--------v---------+
24+
| Bytecode Module|<-----+ Native Runtime |
25+
| (Instructions) | | (C/Rust Strings, |
26+
+--------+---------+ | Arrays, Dicts) |
27+
| +-------------------+
28+
+-----------v-----------+
29+
| Register VM | <--- Profiling Events
30+
| (Interpreter Loop) |
31+
+-----------+-----------+
32+
|
33+
[Hot Path?]
34+
|
35+
+-----------v-----------+
36+
| Baseline JIT (C) | (Template/Copying JIT)
37+
| (Machine Code Gen) |
38+
+-----------+-----------+
39+
|
40+
[Very Hot?]
41+
|
42+
+-----------v-----------+ +------------------+
43+
| Optimizer (LLVM) | | Inline Caches |
44+
| (Type Specialization) |<----+ (Polymorphic IC) |
45+
+-----------------------+ +------------------+
46+
```
47+
48+
## 2. Design Doc Outline & Modules
49+
50+
### 2.1 Register-Based VM ISA
51+
**Motivation**: Reduce dispatch overhead (fewer instructions than stack VM) and improve cache locality.
52+
**Structure**:
53+
- `Instruction`: 32-bit word.
54+
- `Opcode`: 8 bits.
55+
- `A` (Dest): 8 bits.
56+
- `B` (Src1): 8 bits.
57+
- `C` (Src2/Imm): 8 bits.
58+
59+
**Core Instructions**:
60+
- `MOV R_dest, R_src`
61+
- `ADD R_dest, R_src1, R_src2`
62+
- `LOADK R_dest, K_idx`
63+
- `CALL R_dest, R_func, NumArgs`
64+
- `RET R_src`
65+
66+
### 2.2 Baseline JIT (Template JIT)
67+
**Strategy**:
68+
- Pre-compile machine code snippets for each opcode (templates).
69+
- **Benefit**: Very fast implementation, 2-5x speedup over interpreter.
70+
71+
### 2.3 Optimizing JIT (LLVM / DynASM)
72+
**Strategy**:
73+
- Triggered for hot loops (>10k executions).
74+
- **Type Specialization**: Guard checks for types.
75+
76+
### 2.4 Data Model & Memory Layout
77+
- **Value**: NaN-boxing (64-bit).
78+
- **GC**: Generational Mark-and-Sweep.
79+
80+
## 3. Risks & Tradeoffs
81+
1. **Complexity**: LLVM is heavy. **Mitigation**: Start with Template JIT.
82+
2. **GC Pauses**: Generational GC adds complexity. **Mitigation**: Incremental marking.
83+
84+
## 4. Benchmark Plan
85+
86+
**Microbenchmarks**:
87+
1. `arith_loop.prox`: Tight loop summing integers.
88+
2. `call_depth.prox`: Recursive fibonacci.
89+
3. `str_cat.prox`: String concatenation.

src/protos/vm_register.c

Lines changed: 139 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,139 @@
1+
/*
2+
Prototype Register-Based VM for ProXPL
3+
--------------------------------------
4+
Structure: 32-bit instructions (Opcode:8, A:8, B:8, C:8)
5+
*/
6+
7+
#include <stdio.h>
8+
#include <stdint.h>
9+
#include <stdlib.h>
10+
11+
// --- Instruction Format ---
12+
// | Opcode (8) | A (8) | B (8) | C (8) |
13+
// A: Destination Register
14+
// B: Source Register 1
15+
// C: Source Register 2 / Immediate
16+
17+
typedef uint32_t Instruction;
18+
19+
#define OP_MASK 0xFF
20+
#define REG_MASK 0xFF
21+
22+
#define GET_OP(i) ((i) & OP_MASK)
23+
#define GET_A(i) (((i) >> 8) & REG_MASK)
24+
#define GET_B(i) (((i) >> 16) & REG_MASK)
25+
#define GET_C(i) (((i) >> 24) & REG_MASK)
26+
27+
#define MK_INS(op, a, b, c) \
28+
((op) | ((a) << 8) | ((b) << 16) | ((c) << 24))
29+
30+
// --- Opcodes ---
31+
enum OpCode {
32+
OP_HALT = 0,
33+
OP_LOADK, // R[A] = Consts[B]
34+
OP_MOV, // R[A] = R[B]
35+
OP_ADD, // R[A] = R[B] + R[C]
36+
OP_SUB,
37+
OP_PRINT // print R[A]
38+
};
39+
40+
// --- VM State ---
41+
#define MAX_REGS 256
42+
#define MAX_CONSTS 256
43+
44+
typedef struct {
45+
double numbers[MAX_CONSTS];
46+
} ConstTable;
47+
48+
typedef struct {
49+
Instruction* code;
50+
size_t count;
51+
ConstTable* consts;
52+
} ProtoChunk;
53+
54+
typedef struct {
55+
double registers[MAX_REGS]; // Simplified Value type for prototype
56+
Instruction* ip;
57+
} RegisterVM;
58+
59+
// --- Interpreter Loop ---
60+
void run_register_vm(RegisterVM* vm, ProtoChunk* chunk) {
61+
vm->ip = chunk->code;
62+
63+
printf("Starting Register VM execution...\n");
64+
65+
for (;;) {
66+
Instruction ins = *vm->ip++;
67+
uint8_t op = GET_OP(ins);
68+
69+
// Computed goto would go here in production
70+
switch (op) {
71+
case OP_HALT:
72+
printf("HALT encountered.\n");
73+
return;
74+
75+
case OP_LOADK: {
76+
uint8_t target = GET_A(ins);
77+
uint8_t k_idx = GET_B(ins);
78+
vm->registers[target] = chunk->consts->numbers[k_idx];
79+
// printf("LOADK R[%d] = %f\n", target, vm->registers[target]);
80+
break;
81+
}
82+
83+
case OP_MOV: {
84+
uint8_t dest = GET_A(ins);
85+
uint8_t src = GET_B(ins);
86+
vm->registers[dest] = vm->registers[src];
87+
break;
88+
}
89+
90+
case OP_ADD: {
91+
uint8_t dest = GET_A(ins);
92+
uint8_t src1 = GET_B(ins);
93+
uint8_t src2 = GET_C(ins);
94+
// Type checking would happen here in full VM
95+
vm->registers[dest] = vm->registers[src1] + vm->registers[src2];
96+
// printf("ADD R[%d] = %f + %f = %f\n", dest, vm->registers[src1], vm->registers[src2], vm->registers[dest]);
97+
break;
98+
}
99+
100+
case OP_PRINT: {
101+
uint8_t src = GET_A(ins);
102+
printf("OUT: %f\n", vm->registers[src]);
103+
break;
104+
}
105+
106+
default:
107+
printf("Unknown Opcode: %d\n", op);
108+
return;
109+
}
110+
}
111+
}
112+
113+
// --- Test Driver ---
114+
int main() {
115+
// Defines a simple program:
116+
// val1 = 10.5
117+
// val2 = 20.5
118+
// result = val1 + val2
119+
// print result
120+
121+
Instruction code[] = {
122+
MK_INS(OP_LOADK, 0, 0, 0), // R0 = Const[0] (10.5)
123+
MK_INS(OP_LOADK, 1, 1, 0), // R1 = Const[1] (20.5)
124+
MK_INS(OP_ADD, 2, 0, 1), // R2 = R0 + R1
125+
MK_INS(OP_PRINT, 2, 0, 0), // PRINT R2
126+
MK_INS(OP_HALT, 0, 0, 0)
127+
};
128+
129+
ConstTable constants;
130+
constants.numbers[0] = 10.5;
131+
constants.numbers[1] = 20.5;
132+
133+
ProtoChunk chunk = { .code = code, .count = 5, .consts = &constants };
134+
RegisterVM vm;
135+
136+
run_register_vm(&vm, &chunk);
137+
138+
return 0;
139+
}

0 commit comments

Comments
 (0)