Skip to content

Latest commit

 

History

History
458 lines (346 loc) · 44.1 KB

File metadata and controls

458 lines (346 loc) · 44.1 KB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

ARM64 Type-1 bare-metal hypervisor written in Rust (no_std) with ARM64 assembly. Runs at EL2 (hypervisor exception level) and manages guest VMs at EL1. Targets QEMU virt machine. Boots Linux 6.12.12 to BusyBox shell with 4 vCPUs, virtio-blk storage, and virtio-net inter-VM networking. Supports multi-VM with per-VM Stage-2, VMID-tagged TLBs, two-level scheduling, and L2 virtual switch. Includes FF-A v1.1 proxy with stub SPMC, page ownership validation via Stage-2 PTE SW bits, FF-A v1.1 descriptor parsing, SMC forwarding to EL3, and VM-to-VM memory sharing (MEM_RETRIEVE/RELINQUISH with dynamic Stage-2 page mapping). Android boot with PL031 RTC emulation, Binder IPC, binderfs, minimal init, 1GB guest RAM. Dual boot modes: NS-EL2 hypervisor via make run-tfa-linux (BL33, tfa_boot feature) and S-EL2 SPMC via make run-spmc (BL32). TF-A boot chain: BL1→BL2→BL31(SPMD)→BL32(SPMC)→BL33 with manifest FDT parsing. SPMC boots SP1 (Hello) + SP2 (IRQ) at S-EL1 via ERET with per-SP Secure Stage-2, dispatches NWd→SP DIRECT_REQ/RESP messaging to multiple SPs. End-to-end FF-A DIRECT_REQ: NS proxy → SPMD → SPMC → SP1/SP2 (SP modifies x4 += 0x1000 as proof). SP-to-SP DIRECT_REQ with CallStack cycle detection, recursive dispatch, and chain preemption. E2E memory sharing: NWd SHARE → SP RETRIEVE → SP write → SP RELINQUISH → NWd verify → NWd RECLAIM (SP-initiated FF-A calls via handle_sp_exit() loop). SP-to-SP MEM_SHARE: SP1 shares Secure DRAM page with SP2 via SPMC (MEM_SHARE→RETRIEVE→read/write→RELINQUISH→RECLAIM). MEM_DONATE: irrevocable ownership transfer (RECLAIM/RELINQUISH blocked). 20/20 BL33 integration tests pass. FFA_CONSOLE_LOG (SP debug logging to UART), SRI/NPI feature IDs (donated SGI INTIDs), MEM_FRAG_TX/RX (descriptor fragmentation). NS interrupt preemption: IRQ during SP → FFA_INTERRUPT → NWd calls FFA_RUN → SPMC resumes SP (CNTHP timer, SP_IRQ_PREEMPTED flag, Preempted state). Secure virtual interrupt injection: per-SP INTID ownership, CNTHP poll timer at S-EL2, HCR_EL2.VI + HF_INTERRUPT_GET paravirt (Hafnium-compatible), cross-SP preemption. SPMC manages NWd RXTX state (SPMD forwards RXTX_MAP/UNMAP/RX_RELEASE from NWd to SPMC per TF-A v2.12), NS proxy registers its own RXTX with SPMD, PARTITION_INFO_GET writes 24-byte FF-A v1.1 descriptors to NWd's RX buffer, Linux FF-A driver support (CONFIG_ARM_FFA_TRANSPORT, guest DTB arm,ffa node).

Build Commands

make              # Build hypervisor
make run          # Build + run in QEMU — runs 33 test suites automatically (exit: Ctrl+A then X)
make run-linux    # Build + boot Linux guest (--features linux_guest, 4 vCPUs on 1 pCPU, virtio-blk)
make run-linux-smp # Build + boot Linux guest (--features multi_pcpu, 4 vCPUs on 4 pCPUs)
make run-multi-vm # Build + boot 2 Linux VMs time-sliced (--features multi_vm)
make run-android  # Build + boot Android-configured kernel (PL031 RTC, Binder, minimal init, 1GB RAM)
make run-guest GUEST_ELF=/path/to/zephyr.elf  # Boot Zephyr guest (--features guest)
make run-sel2     # Boot TF-A with trivial BL32 at S-EL2 (requires build-tfa first)
make run-tfa-linux # Boot TF-A → hypervisor (BL33) → Linux (requires build-tfa-bl33 first)
make run-spmc     # Boot TF-A → our SPMC (BL32) at S-EL2 (requires build-tfa-spmc first)
make build-tfa-full # Build TF-A with real SPMC (BL32) + preloaded BL33 hypervisor
make run-tfa-linux-ffa # Boot TF-A → SPMC → hypervisor (BL33) → Linux (FF-A discovery)
make build-qemu   # Build QEMU 9.2.3 from source (one-time, Docker)
make build-tfa    # Build TF-A flash.bin with SPD=spmd (Docker)
make build-tfa-bl33 # Build TF-A flash.bin with PRELOADED_BL33_BASE=0x40200000
make build-spmc   # Build hypervisor as S-EL2 SPMC binary (--features sel2)
make build-sp-hello # Build SP Hello binary (S-EL1 Secure Partition)
make build-sp-irq  # Build SP IRQ binary (S-EL1, interrupt handling)
make build-sp-relay # Build SP Relay binary (S-EL1, SP-to-SP DIRECT_REQ relay)
make build-tfa-spmc # Build TF-A with real SPMC as BL32 + SP Hello + SP IRQ
make build-pkvm-kernel # Build AOSP android16-6.12 kernel for pKVM (Docker, ~15-30min first time)
make build-tfa-pkvm # Build TF-A flash-pkvm.bin (ARM_LINUX_KERNEL_AS_BL33, Linux as BL33)
make run-pkvm     # Boot pKVM (NS-EL2) + our SPMC (S-EL2) — AOSP kernel as BL33 (requires build-pkvm-kernel + build-tfa-pkvm)
make run-pkvm-ffa-test # Boot pKVM with FF-A test module (35/35 PASS)
make build-crosvm # Build crosvm VMM for aarch64 (Docker, ~5-10min first time)
make build-crosvm-initramfs # Build pKVM initramfs with crosvm + pVM kernel
make run-crosvm   # Boot pKVM (nVHE) + crosvm pVM (AVF validation, requires ARM64 host for KVM accel)
make debug        # Build + run with GDB server on port 1234
make clean        # Clean build artifacts
make check        # Check code without building
make clippy       # Run linter
make fmt          # Format code

Feature flags (Cargo features, selected via Makefile targets):

  • (default) — unit tests only, no guest boot
  • guest — Zephyr guest loading
  • linux_guest — Linux guest with DynamicIdentityMapper, GICR trap-and-emulate, virtio-blk, virtio-net
  • multi_pcpu — Multi-pCPU support (implies linux_guest): 1:1 vCPU-to-pCPU affinity, PSCI boot, TPIDR_EL2 context, SpinLock devices
  • multi_vm — Multi-VM support (implies linux_guest): 2 VMs time-sliced on 1 pCPU, per-VM Stage-2/VMID, per-VM DeviceManager
  • sel2 — S-EL2 SPMC mode: hypervisor as BL32 (SPMC role), separate boot_sel2.S entry, linker base 0x0e100000 (secure DRAM), manifest parsing, FFA_MSG_WAIT handshake, secondary CPU warm-boot via FFA_SECONDARY_EP_REGISTER, boots SP1 (Hello) + SP2 (IRQ) + SP3 (Relay)
  • tfa_boot — TF-A boot mode (implies linux_guest): sets SPMC_PRESENT=true at compile time, NS proxy registers RXTX with SPMD, forwards DIRECT_REQ, PARTITION_INFO_GET, and MEM_SHARE/LEND/RECLAIM (SP receivers) to real SPMC via 8-register SMC

Note: multi_pcpu and multi_vm are mutually exclusive — both imply linux_guest but use different scheduling models. sel2 is mutually exclusive with all others. tfa_boot is used with run-tfa-linux when a real SPMC is available at S-EL2.

Toolchain requirements: Rust nightly, aarch64-linux-gnu-gcc, aarch64-linux-gnu-ar, aarch64-linux-gnu-objcopy, qemu-system-aarch64

Architecture

Privilege Model

  • EL2: Hypervisor — exception handling, Stage-2 page tables, GIC virtual interface
  • EL1: Guest — Linux kernel or Zephyr RTOS
  • Stage-2 Translation: Identity mapping (GPA == HPA), 2MB blocks + 4KB pages

Core Abstractions

Type File Role
Vm src/vm.rs VM lifecycle, Stage-2 setup, run_smp() scheduler loop
Vcpu src/vcpu.rs State machine (Uninitialized→Ready→Running→Stopped), context save/restore
VcpuContext src/arch/aarch64/regs.rs Guest registers (x0-x30, SP, PC, SPSR, system regs)
VcpuArchState src/arch/aarch64/vcpu_arch_state.rs Per-vCPU GIC LRs, timer, EL1 sysregs, PAC keys
DeviceManager src/devices/mod.rs Enum-dispatch MMIO routing to emulated devices
Scheduler src/scheduler.rs Round-robin vCPU scheduler with block/unblock
ExitReason src/arch/aarch64/regs.rs VM exit causes: WfiWfe, HvcCall, SmcCall, DataAbort, etc.
FfaProxy src/ffa/proxy.rs FF-A v1.1 proxy: intercepts guest SMC, handles VERSION/ID_GET/FEATURES/RXTX/messaging/memory
Stage2Walker src/ffa/stage2_walker.rs Stage-2 page table walker from VTTBR_EL2: PTE SW bits, S2AP, map_page/unmap_page for cross-VM sharing
FfaDescriptors src/ffa/descriptors.rs FF-A v1.1 composite memory region descriptor parsing
SmcForward src/ffa/smc_forward.rs SMC forwarding to EL3 + SPMC probe
PlatformInfo src/dtb.rs Runtime DTB parsing: UART, GIC, RAM, CPU count discovery
VSwitch src/vswitch.rs L2 virtual switch with MAC learning, inter-VM frame forwarding
NetRxRing src/vswitch.rs Per-port SPSC ring buffer for async RX frame delivery
VirtualPl031 src/devices/pl031.rs PL031 RTC emulation: counter-based time, PrimeCell ID
SpMcManifest src/manifest.rs SPMC manifest parser: TOS_FW_CONFIG DTB (spmc_id, version)
SpmcHandler src/spmc_handler.rs S-EL2 SPMC event loop + FF-A dispatch, multi-SP DIRECT_REQ routing via dispatch_to_sp() + enter_guest() ERET, SP-initiated FF-A call loop in handle_sp_exit() (MEM_RETRIEVE_REQ/MEM_RELINQUISH/CONSOLE_LOG → handle locally → re-enter SP), SP→SP DIRECT_REQ routing (CallStack cycle detection, recursive dispatch_to_sp, chain preemption via handle_sp_exit sentinel pattern), NS interrupt preemption (SP_IRQ_PREEMPTED flag, CNTHP timer, FFA_INTERRUPT return), resume_preempted_sp() via FFA_RUN, secure vIRQ injection via inject_pending_virq() (HCR_EL2.VI), cross-SP preemption via dispatch_interrupt_to_sp(), NWd RXTX management, PARTITION_INFO_GET writes 24-byte descriptors to NWd RX buffer, SPMC-side memory sharing (MEM_SHARE/LEND/DONATE/RETRIEVE/RELINQUISH/RECLAIM with SpmcShareRecord storage, dynamic Secure Stage-2 mapping via Stage2Walker), SP-to-SP MEM_SHARE/LEND/DONATE/RECLAIM (SP-initiated sharing via handle_sp_exit), MSG_SEND2/MSG_WAIT indirect messaging (per-SP SpMailbox), CONSOLE_LOG (extracts packed characters to UART), SRI/NPI feature IDs
SpContext src/sp_context.rs Per-SP state machine (Reset→Idle→Running→Blocked→Preempted, incl. Blocked→Preempted for chain preemption), wraps VcpuContext, per-SP owned_intids[4] + pending_irq, global SpStore, for_each_sp()/find_sp_for_intid()/find_sp_with_pending_irq() iterators
SecureStage2Config src/secure_stage2.rs VSTTBR_EL2/VSTCR_EL2 config for SP isolation, build_sp_stage2() identity-maps SP code + UART
Sel2Mmu src/sel2_mmu.rs S-EL2 Stage-1 identity map: static L0/L1/L2 tables, NS=1 for NWd DRAM, Device for GIC/UART, Normal Secure for SPMC/SPs; install_sel2_stage1_secondary() for secondary CPU warm-boot

Exception Handling Flow

Guest @ EL1
  ↓ trap (Data Abort, HVC, SMC, WFI, MSR/MRS)
Exception Vector (arch/aarch64/exception.S) — save context
  ↓
handle_exception() (src/arch/aarch64/hypervisor/exception.rs)
  ├─ WFI → return false (exit to scheduler)
  ├─ HVC → handle_psci() (PSCI v1.0: CPU_ON, CPU_OFF, SYSTEM_RESET) or HF_INTERRUPT_GET (sel2: returns pending INTID)
  ├─ SMC → handle_smc() → PSCI or FF-A proxy or forward to EL3
  ├─ Data Abort → HPFAR_EL2 for IPA → decode instruction → MMIO dispatch
  ├─ MSR/MRS trap → handle ICC_SGI1R_EL1 (SGI emulation), sysreg emulation
  └─ IRQ → handle INTID 26 (preemption/poll timer), 27 (vtimer), 33 (UART RX); sel2: per-SP INTID routing via HCR_EL2.VI
  ↓ advance PC, restore context
ERET back to guest

SMP / Multi-vCPU

run_smp() calls run_one_iteration() in a loop. Each iteration runs one vCPU on a single physical CPU via cooperative + preemptive scheduling:

  1. Check per-VM pending_cpu_onboot_secondary_vcpu() (PSCI CPU_ON)
  2. Wake vCPUs with pending SGIs/SPIs → scheduler.unblock()
  3. Pick next vCPU (round-robin) → set current_vcpu_id
  4. Drain UART RX ring → inject SPI 33
  5. Inject pending SGIs/SPIs into arch_state.ich_lr[]
  6. Arm CNTHP preemption timer (10ms, INTID 26) — only when 2+ vCPUs online
  7. vcpu.run() → save/restore arch state → enter_guest() → ERET
  8. Handle exit: terminal→remove, CPU_ON/preemption→yield, WFI→block, other→yield

Important: vcpu_online_mask must include vCPU 0 at boot — without it, preemption timer never activates.

SGI/IPI emulation: ICC_SGI1R_EL1 trapped via ICH_HCR_EL2.TALL1=1 → decoded (TargetList[15:0], Aff1[23:16], INTID[27:24]) → PENDING_SGIS[vcpu_id] atomics → injected before next entry.

Multi-pCPU (4 vCPUs on 4 Physical CPUs)

Feature: multi_pcpu (implies linux_guest). Target: make run-linux-smp.

Architecture: 1:1 vCPU-to-pCPU affinity. Each physical CPU runs one vCPU exclusively — no scheduler needed.

Secondary pCPU Boot: QEMU virt keeps secondary CPUs powered off. wake_secondary_pcpus() issues real PSCI CPU_ON SMC calls (smc #0, function_id=0xC4000003) to QEMU's EL3 firmware with secondary_entry as the entry point.

Per-CPU Context Pointer: TPIDR_EL2 (hardware-banked per physical CPU) replaces the global current_vcpu_context variable in exception.S. Set by enter_guest(), read by exception/IRQ handlers.

Physical GICR Programming: ensure_vtimer_enabled(cpu_id) programs physical GICR ISENABLER0 for SGIs 0-15 + PPI 27 (vtimer) before every guest entry. Guest GICR writes only update the shadow VirtualGicr state.

Cross-pCPU SPI Delivery: inject_spi() reads physical GICD_IROUTER directly (EL2 bypasses Stage-2) to avoid deadlock with the DEVICES SpinLock. If the target is a remote pCPU, sends physical SGI 0 via msr icc_sgi1r_el1 to wake it.

WFI Passthrough: TWI cleared in multi-pCPU mode — real WFI on physical CPU, woken by physical interrupts.

Multi-VM (2 Linux VMs Time-Sliced)

Feature: multi_vm (implies linux_guest). Target: make run-multi-vm.

Architecture: 2 VMs round-robin time-sliced on a single pCPU. Each VM has 4 vCPUs scheduled via the inner run_one_iteration() loop.

Per-VM Global State: VmGlobalState struct (indexed by CURRENT_VM_ID) replaces flat globals. Each VM has its own pending_sgis, pending_spis, vcpu_online_mask, current_vcpu_id, and preemption_exit.

Per-VM DeviceManager: DEVICES: [GlobalDeviceManager; MAX_VMS] array. Exception handler uses CURRENT_VM_ID to dispatch MMIO to the correct VM's devices.

VMID-Tagged Stage-2: Stage2Config::new_with_vmid() encodes VMID in VTTBR_EL2 bits [63:48] for TLB isolation. Vm::activate_stage2() writes VTTBR_EL2/VTCR_EL2 before guest entry.

Two-Level Scheduler: run_multi_vm() → outer VM round-robin → CURRENT_VM_ID.store()activate_stage2()run_one_iteration() → inner vCPU round-robin.

Memory Partitioning: VM 0 at 0x48000000 (256MB), VM 1 at 0x68000000 (256MB). Each VM gets separate kernel, DTB, initramfs, and virtio-blk disk image loaded by QEMU.

GIC Emulation

Component Address Mode Implementation
GICD 0x08000000 Trap + write-through VirtualGicd shadow state + write-through to physical GICD
GICR 0-3 0x080A0000+ Trap-and-emulate VirtualGicr (Stage-2 unmapped, 4KB pages)
ICC regs System regs Virtual ICH_HCR_EL2.En=1 redirects to ICV_* at EL1
ICC_SGI1R System reg Trapped TALL1=1, decoded for IPI emulation

List Register injection: 4 LRs (ICH_LR0-3_EL2). HW=1 for vtimer (INTID 27) enables physical-virtual EOI linkage. EOImode=1 for proper priority drop / deactivation split.

Virtio-blk

VirtioMmioTransport<VirtioBlk>  @ 0x0a000000 (SPI 16 = INTID 48)
  ├─ MMIO registers (virtio-mmio spec)
  ├─ Virtqueue (descriptor table + available ring + used ring)
  └─ VirtioBlk backend (disk image at 0x58000000, loaded by QEMU)

Guest writes QueueNotify → process_request() → read/write disk image via copy_nonoverlapping (identity-mapped) → update used ring → inject_spi(48)flush_pending_spis_to_hardware().

Virtio-net + VSwitch

VirtioMmioTransport<VirtioNet>  @ 0x0a000200 (SPI 17 = INTID 49)
  ├─ MMIO registers (virtio-mmio spec)
  ├─ 2 virtqueues: RX (queue 0) + TX (queue 1)
  └─ VirtioNet backend (device_id=1, MAC 52:54:00:00:00:{vm_id+1})

TX path: Guest writes QueueNotify → process_tx() → strip 12-byte virtio_net_hdr_v1vswitch_forward(src_port, frame) → VSwitch MAC learning + L2 forwarding → PORT_RX[dst].store(frame).

RX path: drain_net_rx(vm_id) in run loop → PORT_RX[vm_id].take()inject_net_rx()inject_rx(frame) → write 12-byte header (num_buffers=1) + frame into RX descriptor chain via copy_nonoverlappinginject_spi(49).

VSwitch (src/vswitch.rs): L2 virtual switch with 16-entry MAC learning table. Broadcasts/multicasts flood all ports (excluding source). Unknown unicasts also flood. MAC entries are learned on TX (source MAC → source port).

NetRxRing: SPSC ring buffer (9 slots, 8 usable + 1 sentinel) per VM port. Atomic head/tail with Acquire/Release ordering. Stores up to 1514-byte Ethernet frames.

MMIO slot abstraction: platform::virtio_slot(n) returns (base_addr, intid) for slot n. Slot 0 = virtio-blk, slot 1 = virtio-net. Stride = 0x200.

Auto-IP: Initramfs /init reads MAC from sysfs, extracts last octet, assigns 10.0.0.{octet}/24 via ifconfig. VM 0 → 10.0.0.1, VM 1 → 10.0.0.2.

FF-A v1.1 Proxy (src/ffa/)

Implements the FF-A (Firmware Framework for Arm) v1.1 hypervisor proxy role (pKVM-compatible). Guest SMC calls trapped via HCR_EL2.TSC=1 (bit 19) are routed through handle_smc()ffa::proxy::handle_ffa_call().

Supported calls: FFA_VERSION, FFA_ID_GET, FFA_SPM_ID_GET, FFA_FEATURES, FFA_RXTX_MAP/UNMAP, FFA_RX_RELEASE, FFA_PARTITION_INFO_GET, FFA_MSG_SEND_DIRECT_REQ, FFA_MSG_SEND2, FFA_MSG_WAIT, FFA_RUN, FFA_MEM_SHARE/LEND/RETRIEVE_REQ/RELINQUISH/RECLAIM, FFA_MEM_FRAG_TX/FRAG_RX, FFA_NOTIFICATION_BITMAP_CREATE/DESTROY/BIND/UNBIND/SET/GET/INFO_GET, FFA_CONSOLE_LOG_32/64. FFA_MEM_DONATE is blocked (returns NOT_SUPPORTED). FFA_FEATURES returns donated SGI INTIDs for SRI (INTID 9) and NPI (INTID 8) feature queries. VM-to-VM memory sharing: sender shares pages via MEM_SHARE, receiver maps them via MEM_RETRIEVE_REQ (dynamic Stage-2 page mapping), receiver unmaps via MEM_RELINQUISH, sender reclaims via MEM_RECLAIM. When tfa_boot + SP receiver (part_id >= 0x8000): MEM_SHARE/LEND/RECLAIM forwarded to real SPMC via SPMD, with dual record (local SW bits + SPMC handle via record_share_with_handle()). PARTITION_INFO_GET: when SPMC_PRESENT, forwards to SPMD and copies 24-byte descriptors from proxy RX to guest RX; otherwise uses 8-byte stub descriptors. Notification bitmaps support FFA_HOST_ID (0x0000) for pKVM host scheduler — endpoint_index() maps it to slot FFA_MAX_VMS + 2.

Stub SPMC (src/ffa/stub_spmc.rs): Simulates 2 Secure Partitions (SP1=0x8001, SP2=0x8002) for testing without a real Secure World. Direct messaging echoes x4-x7 back. Memory sharing tracks multi-range records with MemShareRecord (up to 4 ranges per share, ShareInfo/ShareInfoFull for reclaim/retrieve). mark_retrieved()/mark_relinquished() track retrieve state; MEM_RECLAIM blocked while retrieved.

RXTX Mailbox (src/ffa/mailbox.rs): Per-VM TX/RX buffer IPAs registered via FFA_RXTX_MAP. Used by PARTITION_INFO_GET to return SP descriptors. TX buffer used for FF-A v1.1 composite memory region descriptors.

Page Ownership (src/ffa/memory.rs): Stage-2 PTE software bits [56:55] track page state: Owned(0b00), SharedOwned(0b01), SharedBorrowed(0b10), Donated(0b11). Validated during MEM_SHARE/LEND (Owned required), transitioned to SharedOwned, restored on MEM_RECLAIM. S2AP bits [7:6] restrict access: SHARE→RO, LEND→NONE. Matches pKVM page ownership model.

Stage-2 Walker (src/ffa/stage2_walker.rs): Lightweight page table walker reconstructed from VTTBR_EL2 at SMC handling time. Reads/writes PTE SW bits and S2AP without owning page table memory. Used by MEM_SHARE/LEND/RECLAIM for ownership validation. map_page() creates 4KB page entries in a target VM's Stage-2 (allocates L2/L3 tables from heap), used by MEM_RETRIEVE_REQ for cross-VM sharing. unmap_page() zeroes L3 PTEs, used by MEM_RELINQUISH. PER_VM_VTTBR global stores each VM's L0 table PA for constructing walkers for non-active VMs. Gated by #[cfg(feature = "linux_guest")] — unit tests skip Stage-2 validation (stale VTTBR from earlier page table tests).

Descriptor Parsing (src/ffa/descriptors.rs): Parses FF-A v1.1 composite memory region descriptors (DEN0077A Table 5.19-5.25): FfaMemRegion(48B) → FfaMemAccessDesc(16B) → FfaCompositeMemRegion(16B) → FfaMemRegionAddrRange(16B). Uses core::ptr::read_unaligned for packed struct safety. build_retrieve_resp_descriptor() constructs response descriptors (reverse of parse_mem_region()). Falls back to register-based protocol (x3=IPA, x4=count, x5=receiver) when no mailbox is mapped.

Fragmentation (FragmentState/FragRxState in proxy.rs): MEM_FRAG_TX handles sender-side descriptor fragments (large MEM_SHARE descriptors split across multiple calls). MEM_FRAG_RX handles receiver-side fragments (MEM_RETRIEVE_RESP too large for RX buffer, receiver calls FRAG_RX for subsequent chunks). Per-VM state tracks active handle, buffer, total/delivered lengths.

Console Log: FFA_CONSOLE_LOG_32/64 (FF-A v1.2) extracts packed characters from x2-x7 (8 bytes per register, little-endian, up to 48 chars) and writes to UART. Supported in both NS proxy and SPMC (including handle_sp_exit() for SP-initiated logging).

SMC Forwarding (src/ffa/smc_forward.rs): forward_smc() uses inline smc #0 to forward calls to EL3 (HCR_EL2.TSC only traps EL1 SMC). probe_spmc() sends FFA_VERSION to detect a real SPMC at EL3. ffa::proxy::init() called at boot (linux_guest only) to set SPMC_PRESENT flag. Unknown SMCs in handle_smc() catch-all are forwarded to EL3 instead of returning -1.

SMC routing: is_ffa_function(fid) checks for SMC32/64 function IDs in the 0x84/0xC4 range with low byte >= 0x60. PSCI functions (0x84000000-0x8400001F, 0xC4000000-0xC4000003) are handled separately.

UART (PL011) Emulation

Full trap-and-emulate (Stage-2 unmapped). TX: guest writes UARTDR → output_char() to physical UART. RX: physical IRQ (INTID 33) → UART_RX ring buffer → VirtualUart.push_rx() → inject SPI 33. Linux amba-pl011 probe requires PeriphID/PrimeCellID registers.

PL031 RTC Emulation (src/devices/pl031.rs)

Trap-and-emulate at 0x09010000 (SPI 2 = INTID 34). Counter-based time: RTCDR = load_value + (CNTVCT_EL0 / CNTFRQ_EL0) when enabled (RTCCR bit 0). Registers: RTCDR (0x000, read), RTCLR (0x008, write), RTCCR (0x00C, control), RTCIMSC/RTCRIS/RTCMIS/RTCICR (0x010-0x01C, stubs). PrimeCell ID registers (0xFE0-0xFFC) required for Linux amba bus probe. 4 unit tests in tests/test_pl031.rs.

DTB Runtime Parsing (src/dtb.rs)

At boot, QEMU passes the host DTB address in x0. boot.S preserves it in callee-saved x20, then passes to rust_main(dtb_addr: usize). dtb::init() uses the fdt crate (v0.1.5, zero-copy, no-alloc) to discover platform hardware:

  • UART: arm,pl011 compatible → uart_base
  • GIC: arm,gic-v3 compatible → gicd_base, gicr_base, gicr_size
  • RAM: /memory node → ram_base, ram_size
  • CPUs: cpus node → num_cpus

Helpers: gicr_rd_base(cpu_id) = gicr_base + cpu_id * 0x20000, gicr_sgi_base(cpu_id) = gicr_rd_base + 0x10000.

Falls back to QEMU virt defaults if DTB parse fails (e.g., QEMU passes addr=0 with -kernel). platform::num_cpus() reads DTB at runtime; MAX_SMP_CPUS = 8 is the compile-time array capacity.

Pre-DTB code (uart_puts in lib.rs, GICD/GICC statics in gic.rs) still uses hardcoded platform::UART_BASE/GICD_BASE because they run before DTB init or require const for Rust static.

Memory Layout

Region Address Purpose
SPMC code (sel2) 0x0e100000 S-EL2 linker base (secure DRAM, BL32)
SP1 (sp_hello) 0x0e300000 SP Hello package (1MB, partition 0x8001)
SP2 (sp_irq) 0x0e400000 SP IRQ package (1MB, partition 0x8002)
SP3 (sp_relay) 0x0e500000 SP Relay package (1MB, partition 0x8003)
Secure heap 0x0e600000 S-EL2 page table allocation
Hypervisor code (NS) 0x40200000 NS-EL2 linker base (avoids QEMU DTB at 0x40000000 in -bios mode)
Heap 0x41000000 (16MB) Page table allocation, BumpAllocator
DTB (VM 0) 0x47000000 Device tree blob
Kernel (VM 0) 0x48000000 Linux Image load address
Initramfs (VM 0) 0x54000000 BusyBox initramfs
Disk image (VM 0) 0x58000000 virtio-blk backing store
VM 0 RAM 0x48000000-0x58000000 256MB (single-VM: 0x48000000-0x88000000 = 1GB)
DTB (VM 1) 0x67000000 Device tree blob (multi_vm only)
Kernel (VM 1) 0x68000000 Linux Image load address (multi_vm only)
VM 1 RAM 0x68000000-0x78000000 256MB (multi_vm only)
Disk image (VM 1) 0x78000000 virtio-blk backing store (multi_vm only)

Stage-2 mappers:

  • IdentityMapper (static, 2MB-only) — used by unit tests (make run)
  • DynamicIdentityMapper (heap-allocated, 2MB+4KB) — used by Linux guest (make run-linux), supports unmap_4kb_page() for GICR trap setup

Heap gap: Heap lies within guest's PA range but is left unmapped in Stage-2 to prevent guest corruption of page tables. Guest kernel never accesses this range (declared memory starts at 0x48000000).

Global State (src/global.rs)

Global Type Purpose
DEVICES [GlobalDeviceManager; MAX_VMS] Per-VM MMIO dispatch (UnsafeCell single-pCPU / SpinLock multi-pCPU)
VM_STATE [VmGlobalState; MAX_VMS] Per-VM state (see below)
CURRENT_VM_ID AtomicUsize Which VM is currently active
PENDING_CPU_ON_PER_VCPU [PerVcpuCpuOnRequest; 8] Per-vCPU PSCI CPU_ON (multi-pCPU mode only)
SHARED_VTTBR / SHARED_VTCR AtomicU64 Stage-2 config shared from primary to secondaries (multi-pCPU)
PER_VM_VTTBR [AtomicU64; MAX_VMS] Per-VM L0 table PA for cross-VM Stage-2 access (FF-A RETRIEVE)
UART_RX UartRxRing Lock-free ring buffer, IRQ handler → run loop
PORT_RX [NetRxRing; MAX_PORTS] Per-VM SPSC ring for virtio-net RX frames
VSWITCH UnsafeCell<VSwitch> L2 virtual switch with MAC learning table

VmGlobalState contains per-VM: pending_sgis[MAX_VCPUS], pending_spis[MAX_VCPUS], terminal_exit[MAX_VCPUS], vcpu_online_mask, current_vcpu_id, pending_cpu_on, preemption_exit. Accessed via vm_state(vm_id) or current_vm_state().

SPMC globals (sel2 feature, SpinLock-protected for per-CPU SPMD concurrency): NWD_RXTX (NWd RXTX buffer state), SPMC_SHARES (memory share records), NOTIF_STATE (notification bitmaps), STAGE2_LOCK (serializes all map_page/unmap_page calls to prevent TOCTOU in page table walks). SpinLock required because pKVM's per-CPU SPMD breaks the single-event-loop serialization assumption. Global heap (src/mm/heap.rs) also uses SpinLock<Option<BumpAllocator>> for concurrent alloc_page() safety.

Device Manager Pattern

Enum-dispatch (no dynamic dispatch / trait objects):

pub enum Device {
    Uart(pl011::VirtualUart),
    Gicd(gic::VirtualGicd),
    Gicr(gic::VirtualGicr),
    VirtioBlk(virtio::mmio::VirtioMmioTransport<virtio::blk::VirtioBlk>),
    VirtioNet(virtio::mmio::VirtioMmioTransport<virtio::net::VirtioNet>),
    Pl031(pl031::VirtualPl031),
}

Array-based routing: devices: [Option<Device>; 8], scan for dev.contains(addr).

Build System

  • build.rs: Cross-compiles boot assembly and exception.S via aarch64-linux-gnu-gcc, archives into libboot.a, links with --whole-archive. Feature-gated: sel2 selects boot_sel2.S + linker_sel2.ld, otherwise boot.S + linker.ld
  • Target: aarch64-unknown-none.json (custom spec: llvm-target: aarch64-unknown-none, panic-strategy: abort, disable-redzone: true)
  • Linker: arch/aarch64/linker.ld — base at 0x40200000 (NS-EL2, avoids QEMU DTB at 0x40000000 in -bios mode); arch/aarch64/linker_sel2.ld — base at 0x0e100000 (S-EL2, secure DRAM)

Tests

~457 assertions across 34 test suites run automatically on make run (no feature flags). Orchestrated sequentially in src/main.rs. Located in tests/:

Test Coverage Assertions
test_dtb DTB parsing, PlatformInfo defaults, GICR helpers 8
test_allocator Bump allocator page alloc/free 4
test_heap Global heap (Box, Vec) 4
test_dynamic_pagetable DynamicIdentityMapper 2MB mapping + 4KB unmap 6
test_multi_vcpu Multi-vCPU creation, VMPIDR 4
test_scheduler Round-robin scheduling, block/unblock 4
test_vm_scheduler VM-integrated scheduling lifecycle 5
test_mmio MMIO device registration + guest UART access 1
test_gicv3_virt List Register injection, ELRSR 6
test_complete_interrupt End-to-end IRQ injection flow 1
test_guest Basic hypercall (HVC #0) 1
test_guest_loader GuestConfig for Zephyr/Linux 3
test_simple_guest Simple guest boot + exit 1
test_decode MmioAccess::decode() ISS + instruction paths 9
test_gicd VirtualGicd shadow state (CTLR, ISENABLER, IROUTER) 8
test_gicr VirtualGicr per-vCPU state (TYPER, WAKER, ISENABLER0) 8
test_global PendingCpuOn atomics + UartRxRing SPSC buffer 6
test_guest_irq Per-VM PENDING_SGIS/PENDING_SPIS bitmask operations 5
test_device_routing DeviceManager registration, routing, accessors 6
test_vm_state_isolation Per-VM SGI/SPI/online_mask/vcpu_id independence 4
test_vmid_vttbr VMID 0/1 encoding in VTTBR_EL2 bits [63:48] 2
test_multi_vm_devices DEVICES[0]/DEVICES[1] registration + MMIO isolation 3
test_vm_activate Vm initial VTTBR/VTCR state 2
test_net_rx_ring NetRxRing SPSC: empty/store/take/fill/overflow/wraparound 8
test_vswitch VSwitch: flood/MAC learning/broadcast/no-self/capacity 6
test_virtio_net VirtioNet: device_id/features/queues/config/mac_for_vm 8
test_page_ownership Stage-2 PTE SW bits: read/write OWNED/SHARED_OWNED, unmapped IPA, 2MB block→4KB split 9
test_pl031 PL031 RTC: RTCDR readable, RTCLR write+readback, PeriphID/PrimeCellID, unknown offset 4
test_ffa FF-A proxy: VERSION/ID_GET/FEATURES/RXTX/messaging/MEM_SHARE/MEM_LEND/RECLAIM/descriptors/SMC forward/VM-to-VM RETRIEVE/RELINQUISH/SPM_ID_GET/RUN/notifications/MSG_SEND2/MSG_WAIT/FRAG_RX/CONSOLE_LOG/SP→SP routing 55
test_spmc_handler SPMC dispatch: VERSION/ID_GET/SPM_ID_GET/FEATURES/PARTITION_INFO/DIRECT_REQ echo (32+64)/framework msg/RXTX/RXTX_UNMAP frag cleanup/FFA_RUN Preempted path/multi-SP/find_sp_for_intid/global SP helpers/MEM_SHARE/LEND/RETRIEVE/RELINQUISH/RECLAIM/DONATE(lifecycle+RECLAIM denied+RELINQUISH denied+SP-to-SP)/multi-page/SP2-receiver/zero-page/range overflow/LEND negative tests/notifications/MSG_SEND2/MSG_WAIT/FRAG_RX/CONSOLE_LOG/SRI/NPI/cross-SP isolation/IPA validation/stress/SP→SP DIRECT_REQ relay/cycle detection/SP-to-SP MEM_SHARE lifecycle 182
test_sp_context SpContext: state machine (incl. all illegal transitions), CAS try_transition failure, VcpuContext fields, set/get args (x0-x7), owned_intids, pending_irq lifecycle + overflow 58
test_secure_stage2 SecureStage2Config: VSTTBR address, VSTCR T0SZ, new_from_vsttbr 4
test_log LogBuffer: empty state, write/read, overflow, log_info!, LogWriter, per-CPU isolation, accumulation 8
test_guest_interrupt Guest interrupt injection + exception vector (blocks) 1

Not wired into main.rs (exported but not called):

  • test_timer — timer interrupt detection (requires manual timer setup)

Critical Implementation Details

HPFAR_EL2 for MMIO (must-know)

When guest MMU is on, FAR_EL2 = guest VA, NOT IPA. Use HPFAR_EL2 for the IPA:

IPA = (hpfar & 0x0000_0FFF_FFFF_FFF0) << 8 | (far_el2 & 0xFFF)

Never Modify Guest SPSR_EL2

Guest controls its own PSTATE.I (interrupt mask). Overriding causes spinlock deadlocks.

CNTHP Timer Must Be Re-enabled

Guest can re-disable INTID 26 via GICR writes. ensure_cnthp_enabled() directly writes physical GICR (EL2 bypasses Stage-2) before every vCPU entry.

ICC_SGI1R_EL1 Bit Fields

  • TargetList: bits [15:0] (NOT [23:16])
  • Aff1: bits [23:16] (NOT [27:24])
  • INTID: bits [27:24] (NOT [3:0])

inject_spi() Must Not Acquire DEVICES Lock (multi-pCPU)

inject_spi() is called from signal_interrupt() inside the DEVICES SpinLock. Reading DEVICES.route_spi() would deadlock (non-reentrant). Instead, multi-pCPU mode reads physical GICD_IROUTER directly (EL2 bypasses Stage-2).

QEMU virt Secondary CPUs Are Powered Off

Secondary physical CPUs start powered off — they do NOT execute _start. Must use real PSCI CPU_ON SMC (smc #0, function_id=0xC4000003) to QEMU's EL3 firmware.

TPIDR_EL2 for Per-CPU Context (multi-pCPU)

exception.S uses mrs x0, tpidr_el2 instead of a global variable. Each physical CPU has its own hardware-banked TPIDR_EL2. Set by enter_guest() via msr tpidr_el2, x0.

Physical GICR Must Be Programmed for SGIs/PPIs

Guest GICR writes only update VirtualGicr shadow state. ensure_vtimer_enabled() programs physical GICR ISENABLER0 for SGIs 0-15 + PPI 27 before every guest entry.

HCR_EL2.TSC for SMC Trapping

HCR_TSC = 1 << 19 traps guest SMC instructions to EL2 as EC_SMC64 (0x17). Unlike HVC traps, the trapped SMC sets ELR_EL2 to the SMC instruction itself — exception handler must advance PC by 4. This enables the FF-A proxy to intercept guest FF-A SMC calls and route them through handle_smc().

CPTR_EL3.TFP Traps FP/SIMD at S-EL2 (must-know for TF-A builds)

TF-A's default CPTR_EL3.TFP=1 traps ALL FP/SIMD instructions from S-EL2 to EL3. Rust debug-mode read_volatile uses NEON SIMD internally (cnt v0.8b, v0.8b for popcount alignment check in is_aligned_to), causing silent hangs on any memory read. Fix: CTX_INCLUDE_FPREGS=1 in TF-A build (clears CPTR_EL3.TFP). Requires ENABLE_SVE_FOR_NS=0 and ENABLE_SME_FOR_NS=0 to avoid build conflicts.

S-EL2 SPMC Boot (sel2 feature)

Entry point: boot_sel2.Srust_main_sel2(manifest_addr, hw_config_addr, core_id). SPMD passes x0=TOS_FW_CONFIG (manifest DTB at 0x0e002000), x1=HW_CONFIG, x4=core_id. Init: exception vectors → manifest parse → S-EL2 Stage-1 MMU (identity map with NS=1 for NWd DRAM) → GIC init (enables PPI 26+29 as Secure Group 1) → CNTHCTL_EL2 timer access → Secure Stage-2 → parse SPKG header (img_offset=0x4000) → clear SCTLR_EL1/VBAR_EL1 → ERET to SP1 → SP calls FFA_MSG_WAIT → detect SP2 at SP2_LOAD_ADDR (0x0e400000) → boot SP2 if present → register secondary EP (FFA_SECONDARY_EP_REGISTER) → FFA_MSG_WAIT → SPMC event loop. src/manifest.rs parses /attribute node (spmc_id, maj_ver, min_ver) per FF-A Core Manifest v1.0 (DEN0077A).

Secondary CPU warm-boot: When pKVM issues PSCI CPU_ON, TF-A's SPMD routes the secondary CPU through spmd_cpu_on_finish_handler() → ERET to our registered secondary_entry_sel2 in boot_sel2.S. The secondary path: set per-CPU stack (3 × 32KB in .bss.sel2_pcpu_stacks) → rust_main_sel2_secondary() → install VBAR → install S-EL2 Stage-1 MMU (reuse primary's page tables via install_sel2_stage1_secondary()) → FFA_MSG_WAIT → SPMD completes PSCI CPU_ON → NS-EL2 secondary boots.

S-EL2 Stage-1 MMU (src/sel2_mmu.rs)

S-EL2 runs with MMU off by default. All memory accesses target the Secure physical address space. NWd RXTX buffer PAs (e.g. 0x42a16000) are in Non-Secure DRAM — writing without Stage-1 translation hits the Secure alias, so pKVM reads zeros from the NS alias.

Fix: init_sel2_stage1() enables a minimal S-EL2 Stage-1 identity map. Static page tables (3 pages in .bss, no heap):

  • L1[1-2]: 1GB blocks at 0x40000000/0x80000000, NS=1, Normal WB, XN → NWd DRAM
  • L2[64-79]: 2MB Device blocks, NS=0, XN → GIC (0x08000000) + UART (0x09000000)
  • L2[112-127]: 2MB Normal blocks, NS=0 → SPMC code + SPs + secure heap (0x0E000000)

Registers: MAIR_EL2 (Attr0=Device, Attr1=Normal-WB), TCR_EL2 (T0SZ=16, 4KB, 48-bit PA), TTBR0_EL2, SCTLR_EL2.{M,C,I}=1. Independent of Secure Stage-2 (VSTTBR_EL2) used for SP isolation.

Secure Virtual Interrupt Injection (Phase D)

Hafnium-compatible HCR_EL2.VI mechanism for injecting virtual interrupts to SPs at S-EL1:

  1. Per-SP INTID ownership: SpContext.owned_intids[4] — SP2 owns INTID 29 (Secure Physical Timer PPI)
  2. CNTHP poll timer: Since CNTPS is inaccessible at S-EL1 (SCR_EL3.ST=0), CNTHP at S-EL2 polls for owned INTIDs
  3. IRQ routing (exception.rs): Case 1: owned by current SP → queue + HCR_EL2.VI → continue. Case 2: owned by another SP → queue + preempt current. Unowned → FFA_INTERRUPT
  4. HCR_EL2.VI injection: Setting VI causes hardware auto-vector to VBAR_EL1+0x280 on ERET
  5. HF_INTERRUPT_GET: SP calls HVC with x0=0xFF04 → SPMC returns pending INTID in x0, clears VI
  6. Cross-SP preemption: dispatch_interrupt_to_sp() — preempt SP1 → enter SP2 IRQ handler → SP2 returns → resume SP1

SP2 (sp_irq) at tfa/sp_irq/: S-EL1 partition with VBAR_EL1 IRQ handler, handles both DIRECT_REQ_32 and DIRECT_REQ_64 (matching RESP variant via x15 flag), slow-path busy-loop until vIRQ, responds with captured INTID in x5. Loaded at 0x0e400000 by BL2.

SP Package Format (SPKG)

BL2 loads raw SP packages to load-address from tb_fw_config.dts. SPKG header (24 bytes LE): magic("SPKG"), version, pm_offset(0x1000), pm_size, img_offset(0x4000), img_size. SPMC must parse header and enter SP at load_addr + img_offset. The sp_manifest.dts UUID gets byte-swapped by sp_mk_generator.py (LE conversion); tb_fw_config.dts UUID must match the swapped form. Use fiptool info fip.bin to verify.

Diagnostic Fault Handler (exception.S)

fault_diag_print handles exceptions when TPIDR_EL2=0 (no vCPU context — host-level fault). Prints ESR_EL2, ELR_EL2, FAR_EL2, HPFAR_EL2 to UART. Used during S-EL2 boot to diagnose Data Aborts. Located at end of exception.S (outside vector table alignment constraints).

Platform Constants

Guest-specific addresses (heap, kernel load, virtio disk) are in src/platform.rs. Host hardware addresses (UART, GIC, RAM, CPU count) are discovered at runtime from DTB via src/dtb.rs — use platform::num_cpus() and dtb::platform_info() instead of hardcoded constants. MAX_SMP_CPUS = 8 is the compile-time array capacity; SMP_CPUS = 4 is the fallback default.

Roadmap: NS-EL2 → S-EL2 SPMC → pKVM Integration

Target architecture (end state):

EL3:    TF-A BL31 + SPMD (SMC relay, world switch)
S-EL2:  Our hypervisor (SPMC role, BL32) → manages Secure Partitions
S-EL1:  Secure Partitions (bare-metal SPs)
NS-EL2: pKVM (Linux KVM protected mode) → manages Normal World VMs
NS-EL1: Linux/Android guest

Phase 3 (done): NS-EL2 complete — 2MB block split, FF-A notifications, indirect messaging Phase 4 (done): QEMU secure=on + TF-A boot chain → Sprint 4.1-4.4 done (SPMC + SP Hello + 7/7 BL33 tests) Sprint 5.1 (done): DIRECT_REQ end-to-end — tfa_boot feature, NS proxy → SPMD → SPMC → SP1 (x4 += 0x1000 proof) Sprint 5.2 (done): RXTX + PARTITION_INFO_GET forwarding + Linux FF-A discovery, SPMC NWd RXTX management (SPMD forwards RXTX_MAP to SPMC), 8/8 BL33 tests pass Phase C (done): NS interrupt preemption — IRQ during SP → FFA_INTERRUPT → FFA_RUN resume, CNTHP timer, SP_IRQ_PREEMPTED flag, Preempted state, SP Hello slow path, 9/9 BL33 tests pass Phase D (done): Multi-SP + secure vIRQ injection — SP2 (sp_irq) at S-EL1, per-SP INTID ownership, HCR_EL2.VI + HF_INTERRUPT_GET paravirt, CNTHP poll timer, cross-SP preemption, 11/11 BL33 tests pass Phase 4.5 (done): pKVM at NS-EL2 + our SPMC at S-EL2 — make run-pkvm boots pKVM to BusyBox shell (Protected hVHE mode initialized successfully). Uses AOSP android16-6.12 kernel (make build-pkvm-kernel) with Google's pKVM FF-A proxy (kvm-arm.mode=protected). FF-A v1.1 discovery works in both nVHE and protected mode: ARM FF-A: Driver version 1.2, Firmware version 1.1 found. RXTX_MAP forwarded by SPMD, PARTITION_INFO_GET returns SP1+SP2 descriptors (x3=24 partition_sz). S-EL2 Stage-1 MMU maps NS DRAM with NS=1 bit so writes to pKVM's hyp RX buffer reach Non-Secure memory. Secondary CPU warm-boot: FFA_SECONDARY_EP_REGISTER (0xC4000087) + secondary_entry_sel2 in boot_sel2.S + per-CPU stacks (3 × 32KB) + rust_main_sel2_secondary() (VBAR → MMU → FFA_MSG_WAIT). SVE workaround: sve=off (ENABLE_SVE_FOR_NS=0 conflicts with CTX_INCLUDE_FPREGS=1). SRI/NPI feature IDs now return donated SGI INTIDs (eliminates pKVM -95 messages) M4.6 Sprint S1 (done): SPMC-side memory sharing — MEM_SHARE/LEND/RETRIEVE/RELINQUISH/RECLAIM handlers in spmc_handler.rs with SpmcShareRecord storage, dynamic Secure Stage-2 mapping via Stage2Walker, register-based + descriptor-based protocols, 12 new unit test assertions (54 total), BL33 Test 13 (MEM_SHARE + RECLAIM) M4.6 Sprint S2 (done): True E2E memory sharing — SP-initiated MEM_RETRIEVE/RELINQUISH via handle_sp_exit() loop in dispatch_to_sp()/resume_preempted_sp(), SP Hello memory test command (x3=0xABCD0001), BL33 Test 14 full lifecycle (NWd SHARE → SP RETRIEVE → SP write → SP RELINQUISH → NWd verify → NWd RECLAIM), 14/14 BL33 tests (incl. alternating SP1/SP2 DIRECT_REQ) M4.6 Backlog (done): QW-1~4 (PSCI v1.0, is_valid_receiver), ME-4 SpinLock for SPMC globals, ME-2 MEM_SHARE forwarding to real SPMC, ME-1 BITMAP_CREATE FFA_HOST_ID fix, ME-5 MEM_FRAG_TX/RX fragmentation, ME-3 SPMC-side MSG_SEND2/MSG_WAIT indirect messaging (per-SP SpMailbox), CONSOLE_LOG (proxy + SPMC + handle_sp_exit), ME-7 SRI/NPI feature IDs (eliminates pKVM -95 EOPNOTSUPP). ~370 assertions / 33 test suites Phase 4.6 (done): pKVM E2E validation — FfaMemRegion struct fix (wrong offsets: extra reserved_0, missing ep_mem_size), RETRIEVE_RESP x2=fragment_length (was handle), NWd vs SP RETRIEVE_REQ distinction (pKVM reclaim sends RETRIEVE_REQ to get descriptor — must NOT map pages or mark retrieved), SP2 DIRECT_REQ_64 support (Linux FF-A driver sends 64-bit variant when AARCH64_EXEC set in properties), SP2 MEM_SHARE E2E (BL33 Test 15). ffa_test.ko: 20/20 PASS (SP1 DIRECT_REQ 4 + MEM_SHARE 6, SP2 DIRECT_REQ 4 + MEM_SHARE 6). BL33: 16/16 PASS. make run-pkvm-ffa-test Phase 4.5 AVF (partial): AVF validation — crosvm VMM in pKVM host (EL0) creates pVM via /dev/kvm. Protected hVHE mode works without SMMU (pKVM enabled without an IOMMU driver). KVM API validated: /dev/kvm, KVM_CREATE_VM, KVM_CREATE_VCPU all PASS (5/5). crosvm fails with failed to create IRQ chip — QEMU TCG cannot create KVM_DEV_TYPE_ARM_VGIC_V3 device. SMMUv3 tested (iommu=smmuv3) but hangs at CPU3 GIC redistributor init (custom DTB lacks SMMU nodes). Embedded initramfs approach (nested kernel + crosvm at /nested/), virtio-console (console=hvc0) fixes ttyS0 probe failure. make build-crosvm (Docker cross-compile), make build-crosvm-initramfs, make run-crosvm (protected mode). Requires ARM64 hardware for full AVF validation. Phase 4.7 (done): Security hardening — SPMC cross-SP isolation fix (RETRIEVE/RELINQUISH validate caller==receiver_id via dispatch_ffa_as_sp(), prevents SP1 mapping pages into SP2's Stage-2), IPA alignment + page count validation (4KB-aligned, max 65536 pages/range, overflow checks), fragment sender tracking (NwdFragmentState.sender_id), reset_nwd_frag_state() cleanup helper, stress tests (16-slot exhaustion, interleaved lifecycle, double RETRIEVE, RELINQUISH-without-RETRIEVE). Robustness hardening: range count overflow validation (reject > MAX_SHARE_RANGES instead of silent truncation), RXTX_UNMAP fragment state cleanup (NWD_FRAG + NWD_FRAG_RX), MEM_LEND negative tests + E2E lifecycle (BL33 Test 16). ~415 assertions / 34 test suites Phase 5.1 (done): SP-to-SP DIRECT_REQ — CallStack cycle detection, recursive dispatch_to_sp, chain preemption (Blocked→Preempted), SP3 (sp_relay) at 0x0e500000, BL33 Tests 17-18 (relay chain + cycle detection). SP-to-SP MEM_SHARE — SP-initiated MEM_SHARE/LEND/RECLAIM in handle_sp_exit, SP1→SP2 Secure DRAM sharing (BL33 Test 19). SP-to-SP MEM_RECLAIM — SP1 persists handle in memory, reclaims after SP2 relinquishes (BL33 Test 20). MEM_DONATE — irrevocable ownership transfer (is_donate flag in SpmcShareRecord), RECLAIM/RELINQUISH blocked (DENIED), SP-to-SP DONATE via handle_sp_exit. 20/20 BL33 tests, ~457 assertions / 34 test suites Phase 5.1 pKVM (done): pKVM SP-to-SP E2E verification — SP3 (sp_relay) added to pKVM flash (build-tfa-pkvm), SP3 DIRECT_REQ_64 support, ffa_test.ko extended with SP3 echo + relay + SP-to-SP MEM_SHARE + SP-to-SP RECLAIM (SP1→SP2 Secure DRAM sharing through real SPMD chain). ffa_test.ko: 35/35 PASS (SP1 10 + SP2 10 + SP3 6 + SP-to-SP share+reclaim 9). make run-pkvm-ffa-test Phase 5: RME & CCA (Realm Manager)

See DEVELOPMENT_PLAN.md for full details.