Cluster Perf tests - EP benchmarking & RDMA Perf and Cluster Env mapping recommendation tool#734
Open
lcskrishna wants to merge 5 commits into
Open
Cluster Perf tests - EP benchmarking & RDMA Perf and Cluster Env mapping recommendation tool#734lcskrishna wants to merge 5 commits into
lcskrishna wants to merge 5 commits into
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds cluster RDMA discovery + recommendation tooling and introduces Slurm+Docker launchers for RDMA and MoRI EP microbenchmarks.
Changes:
- Add a CLI tool to map RDMA→PCI→NetDev, detect NIC vendor, and emit recommended Docker + NCCL/rocSHMEM env exports.
- Add two-node RDMA perf test harness (
ib_write_bw) with TCP startup barrier helpers and Slurm launch scripts. - Add MoRI EP bench Slurm launcher plus a “slim” Dockerfile to run intra-/inter-node MoRI microbenchmarks.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/cluster-rdma-env-recommender/cluster_rdma_env_recommender.py | Implements RDMA inventory + recommendation output (Docker command + env vars). |
| tools/cluster-rdma-env-recommender/README.md | Documents how to run the RDMA env recommender tool. |
| benchmark/kernel/rdma_perf/socket_wait.py | Adds a helper to wait on a remote TCP port state for coordination. |
| benchmark/kernel/rdma_perf/socket_barrier.py | Adds a TCP barrier used to synchronize container readiness across nodes. |
| benchmark/kernel/rdma_perf/run_slurm.sh | Adds Slurm launcher that runs perftest inside Docker on allocated nodes. |
| benchmark/kernel/rdma_perf/run_rdma_tests.sh | Adds in-container script that performs barrier + server/client ib_write_bw. |
| benchmark/kernel/rdma_perf/README.md | Documents how to use the RDMA perf tests and common troubleshooting. |
| benchmark/kernel/ep_bench/run_slurm.sh | Adds Slurm launcher for MoRI EP microbenchmarks inside Docker. |
| benchmark/kernel/ep_bench/run_mori_bench.sh | Adds in-container script to run MoRI intra-/inter-node microbenchmarks. |
| benchmark/kernel/ep_bench/docker/Dockerfile.mori | Adds a MoRI bench image recipe layered on vLLM ROCm base image. |
| benchmark/kernel/ep_bench/README.md | Documents building/running the MoRI EP-bench launcher and image options. |
Comments suppressed due to low confidence (1)
benchmark/kernel/rdma_perf/socket_barrier.py:1
- Closing
server_socketwhile the daemon thread is blocked inaccept()will typically raise anOSErrorin the thread (and can print a stack trace). Wrap theaccept()loop in a try/except forOSErrorand break cleanly when the socket is closed (or use a shutdown flag + timeout).
###############################################################################
Comment on lines
+44
to
+45
| cmd = "ip route show default | awk '{print $5}'" | ||
| out = subprocess.check_output(cmd, shell=True, text=True).strip() |
Comment on lines
+60
to
+64
| pci_updated = pci.replace("0000:", "") | ||
| out = subprocess.check_output( | ||
| ["lspci", "-s", pci, "-nn"], | ||
| text=True | ||
| ).lower() |
Comment on lines
+227
to
+231
| print (f"{bnxt_rdma:>5}") | ||
| print (f"{rdmacm:>5}") | ||
| print (f"{ibverbs:>5}") | ||
| print (f"{libnl3:>5}") | ||
| print (f"{libnl3_router:>5}") |
Comment on lines
+310
to
+313
| print (f"{ionic_rdma:>5}") | ||
| print (f"{ionic_driver:>5}") | ||
| for so_file in ionic_so: | ||
| print (f"{so_file:>5}") |
Comment on lines
+389
to
+391
| if (len(gid_indexes) > 1): | ||
| print (" \n WARNING: multiple GID indeces detected, please check detailed report for mapping the env variables.") | ||
| nccl_env_variables.append(f"export NCCL_IB_GID_INDEX={max(list(gid_indexes))}") |
Comment on lines
+433
to
+434
| parser.add_argument("--html", help="Generate HTML report", action="store_true") | ||
| args = parser.parse_args() |
Comment on lines
+81
to
+91
| if [[ "${NODE_RANK}" -eq 0 ]]; then | ||
| echo "-------------------------------------------------" | tee -a "${LOG_FILE}" | ||
| echo "[${HOST_NAME}:${HOST_IP}] Running ib_write_bw as SERVER" | tee -a "${LOG_FILE}" | ||
| echo "-------------------------------------------------" | tee -a "${LOG_FILE}" | ||
|
|
||
| ib_write_bw -d "${IBDEVICES}" -q 4 -a --report_gbits -F -p "${IB_WRITE_BW_PORT}" \ | ||
| 2>&1 | tee -a "${LOG_FILE}" | ||
| else | ||
| echo "-------------------------------------------------" | tee -a "${LOG_FILE}" | ||
| echo "[${HOST_NAME}:${HOST_IP}] Running ib_write_bw as CLIENT against ${SERVER_IP}" | tee -a "${LOG_FILE}" | ||
| echo "-------------------------------------------------" | tee -a "${LOG_FILE}" |
Comment on lines
+93
to
+95
| echo "[${HOST_NAME}] Waiting for server port to open..." | tee -a "${LOG_FILE}" | ||
| sleep 30 | ||
|
|
Comment on lines
+99
to
+106
| echo "[Node ${NODE_RANK}] Running MoRI INTERNODE dispatch/combine benchmark (v1, bf16)..." | ||
| torchrun --nnodes=$NNODES \ | ||
| --node_rank=$NODE_RANK \ | ||
| --nproc_per_node=1 \ | ||
| --master_addr=$MASTER_ADDR \ | ||
| --master_port=$MASTER_PORT \ | ||
| "${INTERNODE_SCRIPT}" --cmd bench \ | ||
| 2>&1 | tee "${LOG_DIR}/mori_internode_v1_rank${NODE_RANK}.log" |
Comment on lines
+54
to
+56
| RUN git clone --recursive $(grep '^MORI_REPO:' versions.txt | cut -d' ' -f2) && \ | ||
| cd mori && \ | ||
| git checkout $(grep '^MORI_BRANCH:' /app/versions.txt | cut -d' ' -f2) |
Author
|
cc: @alfuyao-amd |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR enables the following perf tests and tools into Primus.
Here are the summary of perf tests and tools added.
benchmark/kernel/ep_bench- Used for Microbenchmarking Large Expert Parallelism using MoRIbenchmark/kernel/rdma_perf- Used to validate the NIC performance of a cluster between two nodes.tools/cluster-rdma-env-recommender- Details of cluster like Firmware, GID, RDMA -> NETDEV mapping and NIC vendor and few other recommendations.Each of these perf tests and tools have their respective README.