|
| 1 | +# PCI MSI Interrupt-Driven Networking |
| 2 | + |
| 3 | +## Problem |
| 4 | + |
| 5 | +ARM64 network drivers (VirtIO net PCI on Parallels, e1000 on VMware) rely on |
| 6 | +timer-based polling at 100Hz (every 10ms). This adds 5-10ms latency per |
| 7 | +network round-trip, which compounds across DNS, TCP handshake, and HTTP |
| 8 | +response phases. On x86, the e1000 has a proper IRQ 11 handler that processes |
| 9 | +packets immediately via softirq. |
| 10 | + |
| 11 | +## Goal |
| 12 | + |
| 13 | +Replace timer-based polling with interrupt-driven packet processing on ARM64, |
| 14 | +achieving sub-millisecond packet delivery latency. |
| 15 | + |
| 16 | +--- |
| 17 | + |
| 18 | +## Phase 1: VirtIO Net PCI MSI on Parallels (Priority: Immediate) |
| 19 | + |
| 20 | +### Why This Is Easy |
| 21 | + |
| 22 | +All infrastructure already exists and is proven working: |
| 23 | +- **GIC driver** (`gic.rs`): `enable_spi()`, `disable_spi()`, |
| 24 | + `configure_spi_edge_triggered()`, `clear_spi_pending()` — all present |
| 25 | +- **PCI driver** (`pci.rs`): `find_msi_capability()`, `configure_msi()`, |
| 26 | + `disable_intx()` — all present |
| 27 | +- **GICv2m MSI** (`platform_config.rs`): `probe_gicv2m()`, |
| 28 | + `allocate_msi_spi()` — already used by xHCI and GPU PCI drivers on Parallels |
| 29 | +- **net_pci.rs** already has `handle_interrupt()` (line 552) that reads ISR |
| 30 | + and raises NetRx softirq — it's just never called from the interrupt path |
| 31 | + |
| 32 | +### Files to Modify |
| 33 | + |
| 34 | +#### 1. `kernel/src/drivers/virtio/net_pci.rs` |
| 35 | + |
| 36 | +Add MSI setup following the exact pattern from `xhci.rs:setup_xhci_msi()`: |
| 37 | + |
| 38 | +```rust |
| 39 | +static NET_PCI_IRQ: AtomicU32 = AtomicU32::new(0); |
| 40 | + |
| 41 | +pub fn get_irq() -> Option<u32> { |
| 42 | + let irq = NET_PCI_IRQ.load(Ordering::Relaxed); |
| 43 | + if irq != 0 { Some(irq) } else { None } |
| 44 | +} |
| 45 | + |
| 46 | +fn setup_net_pci_msi(pci_dev: &pci::Device) -> Option<u32> { |
| 47 | + // 1. Find MSI capability (cap ID 0x05) |
| 48 | + let cap_offset = pci_dev.find_msi_capability()?; |
| 49 | + // 2. Probe GICv2m (already probed by xHCI, returns cached value) |
| 50 | + let gicv2m_base = platform_config::gicv2m_base_phys()?; |
| 51 | + // 3. Allocate SPI from GICv2m pool |
| 52 | + let spi = platform_config::allocate_msi_spi()?; |
| 53 | + // 4. Program MSI: address = GICv2m doorbell, data = SPI number |
| 54 | + pci_dev.configure_msi(cap_offset, gicv2m_base + 0x40, spi); |
| 55 | + // 5. Disable INTx (MSI replaces it) |
| 56 | + pci_dev.disable_intx(); |
| 57 | + // 6. Configure GIC: edge-triggered, enable SPI |
| 58 | + gic::configure_spi_edge_triggered(spi); |
| 59 | + gic::enable_spi(spi); |
| 60 | + Some(spi) |
| 61 | +} |
| 62 | +``` |
| 63 | + |
| 64 | +In `init()`, after device setup: call `setup_net_pci_msi()`, store result in |
| 65 | +`NET_PCI_IRQ`. |
| 66 | + |
| 67 | +Update `handle_interrupt()` with disable/clear/ack/enable SPI pattern (matching |
| 68 | +the xHCI and GPU handlers): |
| 69 | + |
| 70 | +```rust |
| 71 | +pub fn handle_interrupt() { |
| 72 | + let irq = NET_PCI_IRQ.load(Ordering::Relaxed); |
| 73 | + if irq != 0 { |
| 74 | + gic::disable_spi(irq); |
| 75 | + gic::clear_spi_pending(irq); |
| 76 | + } |
| 77 | + // Read ISR status register (existing code — auto-acks on read for legacy VirtIO) |
| 78 | + // Raise NetRx softirq (existing code) |
| 79 | + if irq != 0 { |
| 80 | + gic::enable_spi(irq); |
| 81 | + } |
| 82 | +} |
| 83 | +``` |
| 84 | + |
| 85 | +#### 2. `kernel/src/arch_impl/aarch64/exception.rs` |
| 86 | + |
| 87 | +Add dispatch entry in the SPI match arm (32..=1019), alongside existing GPU |
| 88 | +PCI handler: |
| 89 | + |
| 90 | +```rust |
| 91 | +if let Some(net_pci_irq) = crate::drivers::virtio::net_pci::get_irq() { |
| 92 | + if irq_id == net_pci_irq { |
| 93 | + crate::drivers::virtio::net_pci::handle_interrupt(); |
| 94 | + } |
| 95 | +} |
| 96 | +``` |
| 97 | + |
| 98 | +#### 3. `kernel/src/arch_impl/aarch64/timer_interrupt.rs` |
| 99 | + |
| 100 | +Conditionalize polling — only poll when no MSI IRQ is configured: |
| 101 | + |
| 102 | +```rust |
| 103 | +if !crate::drivers::virtio::net_pci::get_irq().is_some() |
| 104 | + && (net_pci::is_initialized() || e1000::is_initialized()) |
| 105 | + && _count % 10 == 0 |
| 106 | +{ |
| 107 | + raise_softirq(SoftirqType::NetRx); |
| 108 | +} |
| 109 | +``` |
| 110 | + |
| 111 | +### Verification |
| 112 | + |
| 113 | +- DNS resolution should complete in <200ms (was 4-5 seconds) |
| 114 | +- HTTP fetch should complete in <2 seconds (was 10 seconds) |
| 115 | +- `cat /proc/interrupts` or trace counters should show NIC interrupts firing |
| 116 | + |
| 117 | +--- |
| 118 | + |
| 119 | +## Phase 2: E1000 MSI on VMware (Priority: Next) |
| 120 | + |
| 121 | +VMware Fusion uses GICv3 with ITS (Interrupt Translation Service), not GICv2m. |
| 122 | +This is a different MSI delivery mechanism. |
| 123 | + |
| 124 | +### Approach A: GICv3 ITS (Correct, Complex) |
| 125 | + |
| 126 | +The ITS provides MSI translation for GICv3 systems: |
| 127 | + |
| 128 | +1. **Discover ITS**: Parse ACPI MADT for ITS entry, or scan GIC redistributor |
| 129 | + space. ITS is typically at a well-known address (e.g., 0x0801_0000 on |
| 130 | + VMware virt). |
| 131 | + |
| 132 | +2. **Initialize ITS**: |
| 133 | + - Allocate command queue (4KB aligned, mapped uncacheable) |
| 134 | + - Allocate device table and collection table |
| 135 | + - Enable ITS via GITS_CTLR |
| 136 | + |
| 137 | +3. **Per-device setup**: |
| 138 | + - `MAPD` command: map device ID to interrupt table |
| 139 | + - `MAPTI` command: map event ID to LPI (physical interrupt) |
| 140 | + - `MAPI` command: map interrupt to collection (target CPU) |
| 141 | + - `INV` command: invalidate cached translation |
| 142 | + |
| 143 | +4. **MSI configuration**: |
| 144 | + - MSI address = `GITS_TRANSLATER` physical address |
| 145 | + - MSI data = device-specific event ID |
| 146 | + - Program via `pci_dev.configure_msi(cap, its_translater, event_id)` |
| 147 | + |
| 148 | +5. **IRQ handling**: LPIs are delivered via GICv3 ICC_IAR1_EL1, same as SPIs. |
| 149 | + Dispatch by LPI number in exception.rs. |
| 150 | + |
| 151 | +**Estimated effort**: 200-400 lines of new code for ITS initialization + per-device |
| 152 | +setup. Most complex part is the command queue protocol. |
| 153 | + |
| 154 | +### Approach B: INTx via ACPI _PRT (Simpler, Limited) |
| 155 | + |
| 156 | +Parse the ACPI DSDT for PCI interrupt routing: |
| 157 | + |
| 158 | +1. **Parse ACPI _PRT**: The PCI Routing Table maps (slot, pin) -> GIC SPI. |
| 159 | + Breenix already has basic ACPI parsing for MADT/SPCR. Extend to parse |
| 160 | + DSDT for _PRT entries. |
| 161 | + |
| 162 | +2. **Configure SPI**: Once the SPI number is known from _PRT, configure it as |
| 163 | + level-triggered (INTx is level, not edge), enable in GIC. |
| 164 | + |
| 165 | +3. **Shared interrupt handling**: INTx lines may be shared between devices. |
| 166 | + Handler must check each device's ISR before claiming the interrupt. |
| 167 | + |
| 168 | +**Estimated effort**: 100-200 lines for _PRT parsing + level-triggered handler. |
| 169 | + |
| 170 | +### Approach C: VMware-Specific Probe (Pragmatic) |
| 171 | + |
| 172 | +If VMware always maps e1000 INTx to a known SPI (discoverable from the device |
| 173 | +tree or hardcoded for the vmware-aarch64 machine model), we could: |
| 174 | + |
| 175 | +1. Read `interrupt_line` from PCI config space (currently 0xFF on ARM64) |
| 176 | +2. Use VMware's DT to find the actual SPI mapping |
| 177 | +3. Hardcode the mapping as a platform quirk if it's stable |
| 178 | + |
| 179 | +**Estimated effort**: 20-50 lines, but fragile. |
| 180 | + |
| 181 | +### Recommendation |
| 182 | + |
| 183 | +Start with Approach B (_PRT parsing) since ACPI infrastructure partially exists. |
| 184 | +Defer ITS to Phase 3 when multiple PCI devices need independent MSI vectors. |
| 185 | + |
| 186 | +--- |
| 187 | + |
| 188 | +## Phase 3: Generic PCI Interrupt Framework (Priority: Future) |
| 189 | + |
| 190 | +### Dynamic IRQ Dispatch Table |
| 191 | + |
| 192 | +Replace the chain of `if let Some(irq)` in exception.rs with a registration- |
| 193 | +based dispatch: |
| 194 | + |
| 195 | +```rust |
| 196 | +static PCI_IRQ_HANDLERS: Mutex<[(u32, fn()); 16]>; |
| 197 | + |
| 198 | +pub fn register_pci_irq(spi: u32, handler: fn()) { ... } |
| 199 | +``` |
| 200 | + |
| 201 | +This allows any PCI driver to register its own handler without modifying |
| 202 | +exception.rs. |
| 203 | + |
| 204 | +### Full ITS Support |
| 205 | + |
| 206 | +For GICv3 platforms (VMware, newer QEMU configs, real hardware): |
| 207 | +- ITS command queue management |
| 208 | +- LPI configuration tables (PROPBASER, PENDBASER) |
| 209 | +- Per-device interrupt translation |
| 210 | +- Multi-CPU interrupt routing via collections |
| 211 | + |
| 212 | +### QEMU Virt INTx Mapping |
| 213 | + |
| 214 | +QEMU virt machine maps PCI INTx to fixed SPIs: |
| 215 | +- INTA -> SPI 3 (GIC INTID 35) |
| 216 | +- INTB -> SPI 4 (GIC INTID 36) |
| 217 | +- INTC -> SPI 5 (GIC INTID 37) |
| 218 | +- INTD -> SPI 6 (GIC INTID 38) |
| 219 | + |
| 220 | +With swizzling: `actual_pin = (slot + pin - 1) % 4` |
| 221 | + |
| 222 | +These are level-triggered and shared, requiring ISR checks per device. |
| 223 | + |
| 224 | +--- |
| 225 | + |
| 226 | +## Architecture Reference |
| 227 | + |
| 228 | +### Current Packet Receive Path (Polling) |
| 229 | + |
| 230 | +``` |
| 231 | +Timer interrupt (1000Hz) |
| 232 | + -> every 10th tick: raise_softirq(NetRx) |
| 233 | + -> net_rx_softirq_handler() |
| 234 | + -> process_rx() |
| 235 | + -> net_pci::receive() / e1000::receive() |
| 236 | + -> process_packet() |
| 237 | + -> udp::enqueue_packet() / tcp::handle_segment() |
| 238 | + -> wake blocked thread |
| 239 | +``` |
| 240 | + |
| 241 | +Latency: 0-10ms (mean 5ms) per packet. |
| 242 | + |
| 243 | +### Target Packet Receive Path (MSI) |
| 244 | + |
| 245 | +``` |
| 246 | +NIC MSI interrupt -> GIC SPI |
| 247 | + -> exception.rs handle_irq() |
| 248 | + -> net_pci::handle_interrupt() |
| 249 | + -> read ISR (auto-ack) |
| 250 | + -> raise_softirq(NetRx) |
| 251 | + -> net_rx_softirq_handler() |
| 252 | + -> process_rx() |
| 253 | + -> ... (same as above) |
| 254 | +``` |
| 255 | + |
| 256 | +Latency: <100us per packet (GIC + softirq overhead). |
| 257 | + |
| 258 | +### MSI Delivery on Parallels (GICv2m) |
| 259 | + |
| 260 | +``` |
| 261 | +Device writes MSI data to GICv2m doorbell address: |
| 262 | + addr = GICV2M_BASE + 0x40 (MSI_SETSPI_NS) |
| 263 | + data = allocated SPI number |
| 264 | +
|
| 265 | +GICv2m translates write to GIC SPI assertion. |
| 266 | +GIC delivers SPI to target CPU via ICC_IAR1_EL1. |
| 267 | +``` |
0 commit comments