Cluster Perf tests - EP benchmarking & RDMA Perf and Cluster Env mapping recommendation tool by lcskrishna · Pull Request #734 · AMD-AGI/Primus

lcskrishna · 2026-05-27T13:49:53Z

This PR enables the following perf tests and tools into Primus.
Here are the summary of perf tests and tools added.

Large EP Performance tests (MoRI-EP) - benchmark/kernel/ep_bench - Used for Microbenchmarking Large Expert Parallelism using MoRI
RDMA Perf (IB_write) tests - benchmark/kernel/rdma_perf - Used to validate the NIC performance of a cluster between two nodes.
Cluster RDMA Env mapping Tool - tools/cluster-rdma-env-recommender - Details of cluster like Firmware, GID, RDMA -> NETDEV mapping and NIC vendor and few other recommendations.

Each of these perf tests and tools have their respective README.

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds cluster RDMA discovery + recommendation tooling and introduces Slurm+Docker launchers for RDMA and MoRI EP microbenchmarks.

Changes:

Add a CLI tool to map RDMA→PCI→NetDev, detect NIC vendor, and emit recommended Docker + NCCL/rocSHMEM env exports.
Add two-node RDMA perf test harness (ib_write_bw) with TCP startup barrier helpers and Slurm launch scripts.
Add MoRI EP bench Slurm launcher plus a “slim” Dockerfile to run intra-/inter-node MoRI microbenchmarks.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 13 comments.

Show a summary per file

File	Description
tools/cluster-rdma-env-recommender/cluster_rdma_env_recommender.py	Implements RDMA inventory + recommendation output (Docker command + env vars).
tools/cluster-rdma-env-recommender/README.md	Documents how to run the RDMA env recommender tool.
benchmark/kernel/rdma_perf/socket_wait.py	Adds a helper to wait on a remote TCP port state for coordination.
benchmark/kernel/rdma_perf/socket_barrier.py	Adds a TCP barrier used to synchronize container readiness across nodes.
benchmark/kernel/rdma_perf/run_slurm.sh	Adds Slurm launcher that runs perftest inside Docker on allocated nodes.
benchmark/kernel/rdma_perf/run_rdma_tests.sh	Adds in-container script that performs barrier + server/client `ib_write_bw`.
benchmark/kernel/rdma_perf/README.md	Documents how to use the RDMA perf tests and common troubleshooting.
benchmark/kernel/ep_bench/run_slurm.sh	Adds Slurm launcher for MoRI EP microbenchmarks inside Docker.
benchmark/kernel/ep_bench/run_mori_bench.sh	Adds in-container script to run MoRI intra-/inter-node microbenchmarks.
benchmark/kernel/ep_bench/docker/Dockerfile.mori	Adds a MoRI bench image recipe layered on vLLM ROCm base image.
benchmark/kernel/ep_bench/README.md	Documents building/running the MoRI EP-bench launcher and image options.

Comments suppressed due to low confidence (1)

benchmark/kernel/rdma_perf/socket_barrier.py:1

Closing server_socket while the daemon thread is blocked in accept() will typically raise an OSError in the thread (and can print a stack trace). Wrap the accept() loop in a try/except for OSError and break cleanly when the socket is closed (or use a shutdown flag + timeout).

###############################################################################

+            cmd = "ip route show default | awk '{print $5}'"
+            out = subprocess.check_output(cmd, shell=True, text=True).strip()


+            pci_updated = pci.replace("0000:", "")
+            out = subprocess.check_output(
+                ["lspci", "-s", pci, "-nn"],
+                text=True
+            ).lower()


+        print (f"{bnxt_rdma:>5}")
+        print (f"{rdmacm:>5}")
+        print (f"{ibverbs:>5}")
+        print (f"{libnl3:>5}")
+        print (f"{libnl3_router:>5}")


+        print (f"{ionic_rdma:>5}")
+        print (f"{ionic_driver:>5}")
+        for so_file in ionic_so:
+            print (f"{so_file:>5}")


+        if (len(gid_indexes) > 1):
+            print (" \n WARNING: multiple GID indeces detected, please check detailed report for mapping the env variables.")
+        nccl_env_variables.append(f"export NCCL_IB_GID_INDEX={max(list(gid_indexes))}")


+    parser.add_argument("--html", help="Generate HTML report", action="store_true")
+    args = parser.parse_args()


+if [[ "${NODE_RANK}" -eq 0 ]]; then
+    echo "-------------------------------------------------" | tee -a "${LOG_FILE}"
+    echo "[${HOST_NAME}:${HOST_IP}] Running ib_write_bw as SERVER" | tee -a "${LOG_FILE}"
+    echo "-------------------------------------------------" | tee -a "${LOG_FILE}"
+
+    ib_write_bw -d "${IBDEVICES}" -q 4 -a --report_gbits -F -p "${IB_WRITE_BW_PORT}" \
+        2>&1 | tee -a "${LOG_FILE}"
+else
+    echo "-------------------------------------------------" | tee -a "${LOG_FILE}"
+    echo "[${HOST_NAME}:${HOST_IP}] Running ib_write_bw as CLIENT against ${SERVER_IP}" | tee -a "${LOG_FILE}"
+    echo "-------------------------------------------------" | tee -a "${LOG_FILE}"


+    echo "[${HOST_NAME}] Waiting for server port to open..." | tee -a "${LOG_FILE}"
+    sleep 30
+


+    echo "[Node ${NODE_RANK}] Running MoRI INTERNODE dispatch/combine benchmark (v1, bf16)..."
+    torchrun --nnodes=$NNODES \
+        --node_rank=$NODE_RANK \
+        --nproc_per_node=1 \
+        --master_addr=$MASTER_ADDR \
+        --master_port=$MASTER_PORT \
+        "${INTERNODE_SCRIPT}" --cmd bench \
+        2>&1 | tee "${LOG_DIR}/mori_internode_v1_rank${NODE_RANK}.log"


+RUN git clone --recursive $(grep '^MORI_REPO:' versions.txt | cut -d' ' -f2) && \
+    cd mori && \
+    git checkout $(grep '^MORI_BRANCH:' /app/versions.txt | cut -d' ' -f2) 


lcskrishna · 2026-05-27T17:39:27Z

cc: @alfuyao-amd

lcskrishna added 5 commits May 26, 2026 09:40

add large ep & rdma perf tests

a54c974

add cluster rdma env recommender

676f9d5

add readme for the cluster tool

0366910

update ep benchmarking docker

34ee2fc

update rdma perf readme

49fd836

Copilot AI review requested due to automatic review settings May 27, 2026 13:49

lcskrishna requested review from Xiaoming-AMD, limou102 and wenxie-amd as code owners May 27, 2026 13:49

lcskrishna mentioned this pull request May 27, 2026

[Primus Preflight] Add cluster rdma env recommender tool & cluster NIC rdma perf tool from clusterSphere (distinf tools) #700

Closed

Copilot AI reviewed May 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cluster Perf tests - EP benchmarking & RDMA Perf and Cluster Env mapping recommendation tool#734

Cluster Perf tests - EP benchmarking & RDMA Perf and Cluster Env mapping recommendation tool#734
lcskrishna wants to merge 5 commits into
AMD-AGI:mainfrom
lcskrishna:csrikris-cluster-tests

lcskrishna commented May 27, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

lcskrishna commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		cmd = "ip route show default \| awk '{print $5}'"
		out = subprocess.check_output(cmd, shell=True, text=True).strip()

		parser.add_argument("--html", help="Generate HTML report", action="store_true")
		args = parser.parse_args()

		echo "[${HOST_NAME}] Waiting for server port to open..." \| tee -a "${LOG_FILE}"
		sleep 30

Conversation

lcskrishna commented May 27, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

lcskrishna commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants