Skip to content

Tolerate heterogeneous transport handles#2719

Open
rmahidhar wants to merge 3 commits into
meta-pytorch:mainfrom
rmahidhar:export-D106118519
Open

Tolerate heterogeneous transport handles#2719
rmahidhar wants to merge 3 commits into
meta-pytorch:mainfrom
rmahidhar:export-D106118519

Conversation

@rmahidhar
Copy link
Copy Markdown

Summary: Make Uniflow treat per-segment transport handles as capabilities instead of a batch schema. Segment import now keeps usable handles when one optional transport cannot be imported, preserves the first import error if no handle can be imported, and MultiTransport selects a single common transport across the whole batch. This is production behavior, not a benchmark workaround: for distributed KV-cache transfer, a peer may expose both NVLink and RDMA for most GPU cache segments while one process/topology can only import RDMA; the transfer should fall back to RDMA when it is common to every request and fail cleanly when no common transport exists. The RDMA DMA-BUF fallback comment also documents that DMA-BUF is the preferred GDR path while ibv_reg_mr remains the correctness fallback for valid VRAM allocations that cannot be exported as DMA-BUF.

Differential Revision: D106118519

Mahidhar Ramesh Rajala added 3 commits May 28, 2026 10:06
Summary: Adds a unified multi-architecture `uniflow_disagg_bench_mast` fbpkg builder target for deploying the disaggregated benchmark to MAST on H100/x86_64 and GB200/aarch64 platforms. The aarch64 variant uses CUDA `13.0`, the x86_64 variant uses CUDA `12.8`, and the target leaves `hpc_comms.use_nccl` at its platform default instead of overriding it. Updates the UniFlow integration test to create agents on the main thread.

Reviewed By: saifhhasan

Differential Revision: D105276382
Summary: Fix UniFlow topology discovery to respect CUDA-visible GPUs and tolerate development RDMA topologies. GPU discovery now enumerates `CudaApi::getDeviceCount()` instead of NVML physical device count, resolves each CUDA-visible GPU to its NVML handle by normalized PCI bus ID, and treats NVML enrichment as best-effort so a CUDA-visible GPU is not dropped just because NVML link metadata is unavailable. The same topology path now also handles virtual RDMA devices such as RXE by recognizing `/sys/devices/virtual/` IB devices, representing them without PCI ancestry, and using the reported RDMA port speed as their CPU-link bandwidth with a conservative 10 Gbps fallback when ibverbs reports zero. This keeps production physical NIC/GPU behavior unchanged while allowing constrained `CUDA_VISIBLE_DEVICES` and software-RDMA/dev-test environments to build a usable topology instead of failing discovery.

Reviewed By: saifhhasan

Differential Revision: D104611386
Summary: Make Uniflow treat per-segment transport handles as capabilities instead of a batch schema. Segment import now keeps usable handles when one optional transport cannot be imported, preserves the first import error if no handle can be imported, and `MultiTransport` selects a single common transport across the whole batch. This is production behavior, not a benchmark workaround: for distributed KV-cache transfer, a peer may expose both `NVLink` and `RDMA` for most GPU cache segments while one process/topology can only import `RDMA`; the transfer should fall back to `RDMA` when it is common to every request and fail cleanly when no common transport exists. The RDMA DMA-BUF fallback comment also documents that DMA-BUF is the preferred GDR path while `ibv_reg_mr` remains the correctness fallback for valid VRAM allocations that cannot be exported as DMA-BUF.

Differential Revision: D106118519
@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label May 28, 2026
@meta-codesync
Copy link
Copy Markdown
Contributor

meta-codesync Bot commented May 28, 2026

@rmahidhar has exported this pull request. If you are a Meta employee, you can view the originating Diff in D106118519.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant