Commit 84f0b61
feat(cuda): multi-GPU runtime facade + 3-phase migration orchestrator (v1.1)
Per spec sections 3.1 and 4.1. Host-side orchestration only; GPU-side
kernels committed separately.
crates/ringkernel-cuda/src/multi_gpu/:
runtime.rs (1229 lines):
- MultiGpuRuntime: device pool + NvlinkTopology + MultiGpuRegistry
+ MigrationController
- GpuBackend trait abstracts CudaRuntime (mockable for tests)
- PlacementHint: Auto, Pinned, WithActor, NvlinkPreferred
- Peer access bookkeeping (defers cuCtxEnablePeerAccess to hardware phase)
- Cross-GPU send routing (uses topology to pick NVLink path)
migration.rs (405 lines):
- MigrationPlan, MigrationReport, PhaseDurations
- RebalanceStrategy: LoadBalance, CommunicationAware, Explicit
- MultiGpuError with RingKernelError conversion
- 3-phase orchestration: quiesce → transfer → swap
migration_controller.rs (353 lines):
- Atomic concurrency cap (default 1000)
- Global buffer budget (default 8GB) with AtomicU64 accounting
- TokenBucket rate limiter
- MigrationPermit RAII release pattern
staging.rs (383 lines):
- StagingBuffer: framed (u32 length + payload) push/drain
- Optional disk-spill overflow for bursts
- CRC-32/ISO-HDLC checksum (pre- and post-transfer)
- High-water tracking
registry.rs (220 lines):
- Bidirectional KernelId↔GPU indexing
- Per-GPU load accounting
- Atomic location updates for migration
Tests: 62 new tests + 16 topology + 6 migration_kernels = 84 multi_gpu
tests total (188 ringkernel-cuda --lib tests pass w/ 24 ignored for
hardware). All 3-phase ordering, rate limiting, budget enforcement,
checksum match/mismatch, disk spill, placement hint resolution,
rebalance source/target selection verified without hardware.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent a1226a5 commit 84f0b61
7 files changed
Lines changed: 2627 additions & 9 deletions
File tree
- crates/ringkernel-cuda
- src/multi_gpu
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
50 | 50 | | |
51 | 51 | | |
52 | 52 | | |
| 53 | + | |
53 | 54 | | |
54 | 55 | | |
55 | 56 | | |
| |||
0 commit comments