Skip to content

Commit b130c22

Browse files
zhusy54zhusy54
andauthored
Fix: re-guard dispatch_shape() do-while against empty core mask (#569)
Commit 9951499 refactored the two-phase dispatch helper but dropped the guard that PR #566 (af3b1db) had introduced. When a multi-block task drains all available cores mid-batch, the next task in the batch entered the do-while unconditionally, calling cores.pop_first() on an empty bitmask (returns -1) and passing -1 as core_offset to dispatch_block(), causing OOB access. Re-add the guard before dispatched_any = true in both a2a3 and a5: when cores are exhausted, push_batch() re-enqueues the current and remaining batch tasks atomically and breaks out of the for-loop. The existing regression test (spmd_batch_dispatch_oob) covers this. Co-authored-by: zhusy54 <zhusiyu1@hisilicon.com>
1 parent 92155aa commit b130c22

2 files changed

Lines changed: 14 additions & 0 deletions

File tree

src/a2a3/runtime/tensormap_and_ringbuffer/aicpu/aicpu_executor.cpp

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1097,6 +1097,13 @@ struct AicpuExecutor {
10971097
}
10981098
}
10991099

1100+
// Guard: a preceding task in this batch may have drained all cores;
1101+
// re-enqueue the rest of the batch instead of popping an empty mask.
1102+
if (!cores.has_value()) {
1103+
rt->scheduler.ready_queues[static_cast<int32_t>(shape)].push_batch(&batch[bi], got - bi);
1104+
break;
1105+
}
1106+
11001107
dispatched_any = true;
11011108
try_pushed = true;
11021109
#if PTO2_SCHED_PROFILING

src/a5/runtime/tensormap_and_ringbuffer/aicpu/aicpu_executor.cpp

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1081,6 +1081,13 @@ struct AicpuExecutor {
10811081
}
10821082
}
10831083

1084+
// Guard: a preceding task in this batch may have drained all cores;
1085+
// re-enqueue the rest of the batch instead of popping an empty mask.
1086+
if (!cores.has_value()) {
1087+
rt->scheduler.ready_queues[static_cast<int32_t>(shape)].push_batch(&batch[bi], got - bi);
1088+
break;
1089+
}
1090+
10841091
dispatched_any = true;
10851092
try_pushed = true;
10861093
#if PTO2_SCHED_PROFILING

0 commit comments

Comments
 (0)