Skip to content

Commit 84f0b61

Browse files
mivertowskiclaude
andcommitted
feat(cuda): multi-GPU runtime facade + 3-phase migration orchestrator (v1.1)
Per spec sections 3.1 and 4.1. Host-side orchestration only; GPU-side kernels committed separately. crates/ringkernel-cuda/src/multi_gpu/: runtime.rs (1229 lines): - MultiGpuRuntime: device pool + NvlinkTopology + MultiGpuRegistry + MigrationController - GpuBackend trait abstracts CudaRuntime (mockable for tests) - PlacementHint: Auto, Pinned, WithActor, NvlinkPreferred - Peer access bookkeeping (defers cuCtxEnablePeerAccess to hardware phase) - Cross-GPU send routing (uses topology to pick NVLink path) migration.rs (405 lines): - MigrationPlan, MigrationReport, PhaseDurations - RebalanceStrategy: LoadBalance, CommunicationAware, Explicit - MultiGpuError with RingKernelError conversion - 3-phase orchestration: quiesce → transfer → swap migration_controller.rs (353 lines): - Atomic concurrency cap (default 1000) - Global buffer budget (default 8GB) with AtomicU64 accounting - TokenBucket rate limiter - MigrationPermit RAII release pattern staging.rs (383 lines): - StagingBuffer: framed (u32 length + payload) push/drain - Optional disk-spill overflow for bursts - CRC-32/ISO-HDLC checksum (pre- and post-transfer) - High-water tracking registry.rs (220 lines): - Bidirectional KernelId↔GPU indexing - Per-GPU load accounting - Atomic location updates for migration Tests: 62 new tests + 16 topology + 6 migration_kernels = 84 multi_gpu tests total (188 ringkernel-cuda --lib tests pass w/ 24 ignored for hardware). All 3-phase ordering, rate limiting, budget enforcement, checksum match/mismatch, disk spill, placement hint resolution, rebalance source/target selection verified without hardware. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent a1226a5 commit 84f0b61

7 files changed

Lines changed: 2627 additions & 9 deletions

File tree

crates/ringkernel-cuda/Cargo.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,7 @@ nvml-wrapper = { version = "0.12", optional = true }
5050

5151
[dev-dependencies]
5252
tokio = { workspace = true, features = ["test-util", "macros", "rt-multi-thread"] }
53+
tempfile = { workspace = true }
5354

5455
# Note: GPU execution tests require the `cuda` feature to be enabled.
5556
# Run with: cargo test -p ringkernel-cuda --features cuda

0 commit comments

Comments
 (0)