Skip to content

Commit 27184e2

Browse files
authored
Merge pull request #244 from ryanbreen/feat/full-gpu-compositing
feat: Full GPU compositing pipeline with VirGL
2 parents b48a0db + fb99046 commit 27184e2

26 files changed

Lines changed: 6485 additions & 725 deletions

.claude/memory/MEMORY.md

Lines changed: 144 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,144 @@
1+
# Breenix Project Memory
2+
3+
## Core Debugging Principle: Parallels Is Hardware
4+
5+
**NEVER treat Parallels as something strange or as something that will violate a spec in unexpected ways.** Parallels is hardware as far as we're concerned. It implements the VirtIO GPU spec correctly — Linux proves this by achieving 1000+ FPS with VirGL on the same platform. When VirGL commands don't produce expected results, the problem is in OUR protocol usage, not in Parallels. Blaming the hardware is giving up on hard problem solving.
6+
7+
## 🚨 SMOKING GUN: Raw VirGL Rendering WORKS on Parallels 🚨
8+
9+
**PROVEN March 2026:** A hand-crafted VirGL CLEAR command (our bytes, NOT Mesa) produced a
10+
visible BLUE screen on the Parallels display. Screenshot saved at `~/Downloads/linux-probe-virgl-test.png`.
11+
12+
**What the test did (`/tmp/gbm_virgl_test.c` on linux-probe VM):**
13+
1. Used GBM/EGL to create the resource and set up the VirGL context (Mesa handles plumbing)
14+
2. Injected a **raw VirGL CLEAR command** (hand-crafted, identical encoding to Breenix)
15+
3. The BLUE clear overwrote Mesa's GREEN clear — the M3 Max GPU executed OUR command
16+
17+
**What this proves:**
18+
- Our VirGL CLEAR command encoding is CORRECT
19+
- The Apple M3 Max GPU executes raw VirGL commands through Parallels
20+
- Hardware-accelerated GL rendering is fully achievable on Parallels
21+
22+
**Why virgl_raw_test.c (standalone, no Mesa) shows BLACK:**
23+
- Mesa's RESOURCE_CREATE returns `size=3145728, stride=4096` (shared guest/host backing)
24+
- Our raw RESOURCE_CREATE with IDENTICAL params returns `size=0, stride=0` (no guest backing)
25+
- The difference: Mesa/GBM does capability negotiation (GETPARAM, GET_CAPS) before creating
26+
resources, which enables shared-memory resource mode
27+
- Without shared backing, the host renders to its own GPU memory but the display can't see it
28+
29+
**The fix for Breenix:** Replicate Mesa's resource creation flow — specifically the shared-memory
30+
backing that allows both VirGL rendering (host-side) and display scanout (guest-side) to access
31+
the same memory. Our VirGL command encoding is already correct.
32+
33+
**NEVER AGAIN waste time questioning whether VirGL works on Parallels. It does. Period.**
34+
35+
## Bounce Demo Performance (March 2026)
36+
37+
### Batch Flush Optimization: 12 FPS → 76 FPS (6.3x speedup!)
38+
- **Before:** 13 individual syscalls + 13 DSB barriers + 16ms sleep per frame = 12 FPS
39+
- **After:** 1 batch flush syscall (op=8) + 1 DSB barrier + 1ms sleep = 76 FPS (13ms/frame)
40+
- Batch flush (op=8) in `kernel/src/syscall/graphics.rs` copies multiple dirty rects in one syscall
41+
- `libs/libbreenix/src/graphics.rs`: FlushRect struct + fb_flush_rects() wrapper
42+
- `userspace/programs/src/bounce.rs`: Uses batch flush, sleep reduced 16ms→1ms
43+
- Init auto-launches bounce: `userspace/programs/src/init.rs`
44+
45+
### BSS/Stack Overlap Bug → Heap Allocation Fix (RESOLVED)
46+
- **Root cause:** PCI_3D_FRAMEBUFFER in BSS at phys ~0x41AEC000 overlapped Parallels boot stack at 0x42000000
47+
- DMA reads from BSS backing returned corrupted/zero data → display showed BLACK
48+
- **Fix:** Replaced BSS static with heap-allocated, page-aligned (4096) backing via `alloc::alloc::alloc_zeroed`
49+
- Heap allocation lands at ~0x502E2000, well clear of stack region
50+
- **Result:** BLUE screen displayed successfully via TRANSFER_TO_HOST_3D + SET_SCANOUT + RESOURCE_FLUSH
51+
- PCI_FRAMEBUFFER (2D) is still BSS but never used for DMA on GOP path
52+
53+
## VirGL Display on Parallels — IN PROGRESS (March 2026)
54+
55+
### CRITICAL: VirGL Encoding Must Match Mesa Exactly
56+
**Methodology that works:** Write a test program on Linux, validate it produces visible output,
57+
then port the exact same bytes to Breenix. Use LD_PRELOAD ioctl interception to capture
58+
Mesa's actual VirGL command bytes and compare against hand-crafted commands.
59+
**Tool:** `scripts/parallels/virgl_intercept.c` — LD_PRELOAD .so that hex-dumps EXECBUFFER payloads.
60+
61+
### Known VirGL Encoding Bugs (FOUND March 2026)
62+
1. **Blend colormask shift is WRONG:** We used `0xF << 28` but correct is `0xF << 27`
63+
- `virgl.rs` line 237: `self.push(0xF << 28)` → should be `self.push(0xF << 27)` = `0x78000000`
64+
- Mesa sends `0x78000000`, we sent `0xF0000000` — 1-bit shift error
65+
- This causes the GPU to NOT write color channels properly → BLACK screen
66+
- See `VIRGL_OBJ_BLEND_S2_RT_COLORMASK(x) = ((x) & 0xf) << 27` in virgl_hw.h
67+
2. **Rasterizer flags differ from Mesa:** Our `0x20004002` vs Mesa's `0x60008082`
68+
- Mesa adds POINT_QUAD_RAST(bit7), FRONT_CCW(bit15), BOTTOM_EDGE_RULE(bit30)
69+
- Our scissor enable (bit14) is fine but Mesa doesn't use it for simple clears
70+
3. **Blend S0 missing dither:** Mesa sends `0x00000004` (dither=bit2), we send `0x00000000`
71+
4. **Missing commands Mesa sends:** SET_TWEAKS, SET_POLYGON_STIPPLE, SET_BLEND_COLOR,
72+
SET_MIN_SAMPLES — may not be critical but should be added for compatibility
73+
74+
### What Works
75+
- Heap-allocated backing fixes DMA (BSS at 0x41AEC000 overlapped stack)
76+
- RESOURCE_CREATE_3D with B8G8R8X8_UNORM + BIND_SCANOUT succeeds
77+
- SUBMIT_3D (VirGL clear/draw) returns OK with fence completion
78+
- XRGB8888 (B8G8R8X8_UNORM) is REQUIRED — ARGB8888 causes EINVAL
79+
- **gl_display.c (EGL/Mesa) renders at 120+ FPS on Linux probe VM** — proves VirGL works
80+
81+
### Linux Probe VM Findings
82+
- VirGL rendering works at 120+ FPS via `virgl (Apple M3 Max (Compat))`
83+
- gl_display.c (EGL/Mesa) shows bouncing balls — WORKING reference
84+
- **Raw VirGL CLEAR on Mesa's context → BLUE screen** — our encoding is correct
85+
- virgl_raw_test.c (standalone, no Mesa) shows BLACK — resource creation issue, NOT encoding
86+
- The standalone test creates host-managed resources (size=0, stride=0, MAP fails)
87+
- Mesa creates shared-memory resources (size=3145728, stride=4096, MAP works)
88+
- Mesa does NO TRANSFER_TO_HOST for VirGL resources — rendering + display share memory
89+
- Linux DRM SetCrtc + PageFlip → host SET_SCANOUT + RESOURCE_FLUSH internally
90+
91+
## VirGL Infrastructure
92+
93+
- `gpu_pci.rs`: VirGL 3D pipeline (CTX_CREATE, SUBMIT_3D with 3-desc chain, etc.)
94+
- `virgl.rs`: VirGL command encoder (clear, shaders, draw, surfaces, etc.)
95+
- SUBMIT_3D requires 3-descriptor virtqueue chain (header, payload, response)
96+
- VirtioGpuCmdSubmit must be `repr(C, packed)` — 28 bytes, NOT 32
97+
- GET_CAPSET_INFO returns 0x1200 (unsupported) despite num_capsets=1
98+
99+
## Linux Probe VM (Parallels)
100+
101+
- **OS:** Ubuntu 24.04.4 Server ARM64 (Linux 6.8.0-101-generic)
102+
- **Name:** linux-probe, **IP:** 10.211.55.149
103+
- **SSH:** `sshpass -p root ssh wrb@10.211.55.149`
104+
- **Snapshot:** "baseline-with-devtools" — gcc, libdrm-dev, virgl_raw_test built
105+
- **DRM:** card1 (virtio_gpu), renderD128. 3D accel = highest
106+
- **Programs:** virgl_raw_test, dumb_blue_test, gl_display, modetest all built and tested
107+
- **Finding:** DRM SetCrtc works for GBM-created resources (shared backing), fails for raw RESOURCE_CREATE (host-only backing)
108+
109+
## 🚨🚨🚨 Parallels VM Testing — FRESH VM EVERY TIME 🚨🚨🚨
110+
111+
**ABSOLUTE RULE: NEVER reuse the same VM name twice. NEVER use `deploy-to-vm.sh --boot`.**
112+
**NEVER use `prlctl stop breenix-dev --kill` then restart the same VM.**
113+
Every single test MUST create a brand new VM with a unique name (timestamp-based).
114+
Reusing a VM name leads to stale disk images, cached framebuffers, and WRONG test results.
115+
116+
**Use `scripts/parallels/quick-test.sh`** which:
117+
1. Deletes all old `breenix-*` VMs
118+
2. Creates a fresh VM with a timestamped or unique name
119+
3. Attaches freshly built disk images
120+
4. Starts the VM and takes a screenshot
121+
122+
**The dispatch to agents MUST include this instruction.** When dispatching a build+test
123+
agent, always tell it to use quick-test.sh or create a fresh uniquely-named VM.
124+
NEVER tell agents to use `deploy-to-vm.sh --boot` or to restart `breenix-dev`.
125+
126+
## VirGL Debugging Methodology (PROVEN March 2026)
127+
128+
**Always validate on Linux FIRST, then port to Breenix.** The workflow:
129+
1. Write/modify test program on Linux probe VM
130+
2. Run it, screenshot, confirm it produces expected visual output
131+
3. If it works on Linux, port the exact same VirGL bytes to Breenix
132+
4. If it doesn't work on Linux, fix the VirGL encoding until it does
133+
5. Use `virgl_intercept.c` LD_PRELOAD to capture Mesa's reference bytes
134+
135+
**Never blame Parallels.** If Mesa works (it does, 120+ FPS), the problem is always
136+
in our VirGL encoding. Intercept Mesa's bytes and match them exactly.
137+
138+
## Key Architecture Notes
139+
140+
- Page tables set in `parallels-loader/src/page_tables.rs` — kernel inherits them
141+
- MAIR: idx0=Device(0x00), idx1=Cacheable(0xFF), idx2=NC(0x44)
142+
- GOP BAR0 region (0x10000000-0x10FFFFFF) uses NC_BLOCK, rest uses DEVICE_BLOCK
143+
- `run.sh --parallels` handles full build+deploy+boot cycle
144+
- Manual deploy: build-efi.sh + create_ext2_disk.sh + copy HDS files + prlctl start

kernel/src/drivers/pci.rs

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,8 @@ pub const VIRTIO_GPU_DEVICE_ID_MODERN: u16 = 0x1050;
5959

6060
/// PCI Capability ID for MSI
6161
pub const PCI_CAP_ID_MSI: u8 = 0x05;
62+
/// PCI Capability ID for MSI-X
63+
pub const PCI_CAP_ID_MSIX: u8 = 0x11;
6264

6365
/// Intel vendor ID (for reference - common in QEMU)
6466
pub const INTEL_VENDOR_ID: u16 = 0x8086;
@@ -345,6 +347,75 @@ impl Device {
345347
pci_write_config_word(self.bus, self.device, self.function, cap_offset + 2, new_ctrl);
346348
}
347349

350+
/// Find the MSI-X capability in the PCI capability list.
351+
///
352+
/// Returns the config space offset of the MSI-X capability, or None if not found.
353+
pub fn find_msix_capability(&self) -> Option<u8> {
354+
self.find_capability(PCI_CAP_ID_MSIX)
355+
}
356+
357+
/// Read MSI-X table size from the capability.
358+
/// Returns the number of MSI-X vectors (Table Size + 1).
359+
pub fn msix_table_size(&self, cap_offset: u8) -> u16 {
360+
let msg_ctrl = pci_read_config_word(self.bus, self.device, self.function, cap_offset + 2);
361+
(msg_ctrl & 0x07FF) + 1 // Bits 10:0 = Table Size (N-1)
362+
}
363+
364+
/// Read MSI-X Table BAR index and offset.
365+
/// Returns (bar_index, offset_within_bar).
366+
pub fn msix_table_location(&self, cap_offset: u8) -> (u8, u32) {
367+
let table_offset_bir = pci_read_config_dword(self.bus, self.device, self.function, cap_offset + 4);
368+
let bar_index = (table_offset_bir & 0x07) as u8;
369+
let offset = table_offset_bir & !0x07;
370+
(bar_index, offset)
371+
}
372+
373+
/// Enable MSI-X (set Enable bit in Message Control, clear Function Mask).
374+
pub fn enable_msix(&self, cap_offset: u8) {
375+
let msg_ctrl = pci_read_config_word(self.bus, self.device, self.function, cap_offset + 2);
376+
// Bit 15: MSI-X Enable, Bit 14: Function Mask (clear to unmask)
377+
let new_ctrl = (msg_ctrl | (1 << 15)) & !(1 << 14);
378+
pci_write_config_word(self.bus, self.device, self.function, cap_offset + 2, new_ctrl);
379+
}
380+
381+
/// Disable MSI-X (clear Enable bit in Message Control).
382+
pub fn disable_msix(&self, cap_offset: u8) {
383+
let msg_ctrl = pci_read_config_word(self.bus, self.device, self.function, cap_offset + 2);
384+
let new_ctrl = msg_ctrl & !(1 << 15);
385+
pci_write_config_word(self.bus, self.device, self.function, cap_offset + 2, new_ctrl);
386+
}
387+
388+
/// Configure a single MSI-X table entry.
389+
///
390+
/// `cap_offset`: config space offset of the MSI-X capability
391+
/// `vector_index`: which MSI-X vector to program (0-based)
392+
/// `address`: MSI target address (e.g. GICv2m doorbell)
393+
/// `data`: MSI data value (e.g. SPI number)
394+
///
395+
/// The MSI-X table is memory-mapped in the BAR indicated by the capability.
396+
/// Each entry is 16 bytes: addr_lo(4) + addr_hi(4) + data(4) + vector_ctrl(4).
397+
pub fn configure_msix_entry(&self, cap_offset: u8, vector_index: u16, address: u64, data: u32) {
398+
let (bar_index, table_offset) = self.msix_table_location(cap_offset);
399+
if bar_index as usize >= 6 || !self.bars[bar_index as usize].is_valid() {
400+
return;
401+
}
402+
let bar_base = self.bars[bar_index as usize].address;
403+
const HHDM_BASE: u64 = 0xFFFF_0000_0000_0000;
404+
let virt_base = if bar_base >= HHDM_BASE { bar_base } else { HHDM_BASE + bar_base };
405+
let entry_addr = virt_base + table_offset as u64 + (vector_index as u64 * 16);
406+
407+
unsafe {
408+
// Address low (offset 0)
409+
core::ptr::write_volatile(entry_addr as *mut u32, address as u32);
410+
// Address high (offset 4)
411+
core::ptr::write_volatile((entry_addr + 4) as *mut u32, (address >> 32) as u32);
412+
// Data (offset 8)
413+
core::ptr::write_volatile((entry_addr + 8) as *mut u32, data);
414+
// Vector Control (offset 12): 0 = unmasked
415+
core::ptr::write_volatile((entry_addr + 12) as *mut u32, 0);
416+
}
417+
}
418+
348419
/// Find any PCI capability by ID. Returns the config space offset, or None.
349420
pub fn find_capability(&self, cap_id: u8) -> Option<u8> {
350421
let status = pci_read_config_word(self.bus, self.device, self.function, 0x06);

0 commit comments

Comments
 (0)