Support TBG inside Pipeline by caiomcbr · Pull Request #783 · microsoft/mscclpp

caiomcbr · 2026-04-11T21:23:48Z

No description provided.

Copilot

Pull request overview

This PR adds support for thread-block-group (TBG) slicing within PIPELINE execution by carrying TBG metadata (tbId, tbgSize) in each Operation and applying per-slice offsets/sizes inside the device execution handlers.

Changes:

Extend Operation with tbId / tbgSize and populate them from tbg_info in the JSON execution plan.
Add device-side calcOffset/calcSize and apply TBG slicing across multiple kernel handlers (GET/PUT/COPY/REDUCE/packet ops).
Remove host-side per-TBG pre-slicing in ExecutionPlan::Impl::setupOperation (offsets/sizes remain “full”, slicing moves to device).

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File	Description
`src/core/include/execution_kernel.hpp`	Implements device-side TBG slicing and applies it in multiple operation handlers, including within `PIPELINE` iterations.
`src/core/include/execution_common.hpp`	Adds `tbId` / `tbgSize` fields to `Operation` so the device kernel can apply TBG slicing.
`src/core/executor/execution_plan.cc`	Parses `tbg_info` from JSON and stores it into `Operation` instead of pre-slicing offsets/sizes on the host.

Copilot · 2026-04-11T21:29:06Z

src/core/include/execution_kernel.hpp

+MSCCLPP_DEVICE_INLINE uint32_t calcOffset(uint32_t size, uint32_t index, uint32_t slices) {
+  constexpr uint32_t alignment = 16;
+  uint32_t nelems = size / alignment;
+  uint32_t minNelems = nelems / slices;
+  uint32_t remainder = nelems % slices;
+  uint32_t off = index * minNelems + (index < remainder ? index : remainder);
+  return off * alignment;
+}
+
+MSCCLPP_DEVICE_INLINE uint32_t calcSize(uint32_t size, uint32_t index, uint32_t slices) {
+  return calcOffset(size, index + 1, slices) - calcOffset(size, index, slices);
+}


calcOffset/calcSize compute nelems = size / 16, which drops any remaining bytes (<16). In pipeline mode this can silently skip data whenever unitSize (or the remaining bytes in the last iteration) isn’t a multiple of 16, and can even make calcSize() return 0 for size < 16.

Consider either (a) enforcing/validating that unitSize and all buffer sizes processed here are multiples of 16, or (b) updating the slicing logic to distribute the tail bytes so the full size is covered across slices.

src/core/include/execution_kernel.hpp

Copilot · 2026-04-11T21:29:07Z

src/core/include/execution_kernel.hpp

+MSCCLPP_DEVICE_INLINE uint32_t calcOffset(uint32_t size, uint32_t index, uint32_t slices) {
+  constexpr uint32_t alignment = 16;
+  uint32_t nelems = size / alignment;
+  uint32_t minNelems = nelems / slices;
+  uint32_t remainder = nelems % slices;
+  uint32_t off = index * minNelems + (index < remainder ? index : remainder);
+  return off * alignment;


calcOffset hardcodes alignment = 16, but ExecutionPlan supports a configurable buffer_alignment (default 16). If a plan sets buffer_alignment != 16, host-side offsets/sizes are computed on that alignment while device-side TBG slicing is computed on 16B units, which can produce inconsistent slicing and misaligned accesses.

Consider plumbing the plan’s bufferAlignment to device code (e.g., store it in DeviceExecutionPlan/Operation) or explicitly enforcing buffer_alignment == 16 when tbg_info is used.

caiomcbr added 4 commits April 6, 2026 23:18

wip

9cbe78c

wip

0a26f9d

Merge branch 'main' into caiorocha/support_tbg_pipeline

7f5ff93

wip

bc62843

caiomcbr requested review from Binyang2014 and Copilot April 11, 2026 21:23

Copilot started reviewing on behalf of caiomcbr April 11, 2026 21:24 View session

Copilot AI reviewed Apr 11, 2026

View reviewed changes

Merge branch 'main' into caiorocha/support_tbg_pipeline

85bcc9a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support TBG inside Pipeline#783

Support TBG inside Pipeline#783
caiomcbr wants to merge 5 commits intomainfrom
caiorocha/support_tbg_pipeline

caiomcbr commented Apr 11, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 11, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

caiomcbr commented Apr 11, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants