[UR][L0v2] Fix double zeCommandListClose() in batched queue flush by ldorau · Pull Request #21660 · intel/llvm

ldorau · 2026-03-31T09:34:11Z

Root cause

ur_queue_batched_t::queueFlushUnlocked is called after every
enqueueUSMHostAllocExp / enqueueUSMSharedAllocExp / enqueueUSMDeviceAllocExp
/ enqueueUSMFreeExp to eagerly submit the current batch. The call sequence
is:

  queueFlushUnlocked
    enqueueCurrentBatchUnlocked()   <- (1) zeCommandListClose
                                       + zeCommandListImmediateAppendCommandListsExp
    renewBatchUnlocked()
      if runBatches.size() >= initialSlotsForBatches (10):
        queueFinishUnlocked()
          if !isActiveBatchEmpty(): <- true: enqueuedOperationsCounter > 0
            enqueueCurrentBatchUnlocked() <- (2) BUG: double zeCommandListClose
                                                  + double submit
          hostSynchronize()         <- may hang on some driver versions

enqueuedOperationsCounter is incremented by markIssuedCommandInBatch before
queueFlushUnlocked is called, but it is not cleared by
enqueueCurrentBatchUnlocked, so queueFinishUnlocked's !isActiveBatchEmpty()
guard does not protect against the re-entry.

With 2 queues and 256 iterations the bug fires ~92 times. On certain GPU
driver versions the immediate command list enters a state from which
zeCommandListHostSynchronize never returns, hanging the test:

  UR_ADAPTERS_FORCE_LOAD=lib/libur_adapter_level_zero_v2.so \
  UR_L0_V2_FORCE_BATCHED=1 \
  ./test/adapters/level_zero/enqueue_alloc-test \
  --gtest_filter=*urL0EnqueueAllocMultiQueueSameDeviceTest.SuccessMt*

Fix

Remove the pre-close of the active batch from queueFlushUnlocked and move
enqueueCurrentBatchUnlocked() into renewBatchUnlocked's else branch.
This ensures that when the batch-slot limit is reached the active batch is
still open, so the existing delegation to queueFinishUnlocked closes and
submits it exactly once via its !isActiveBatchEmpty() guard:

  renewBatchUnlocked()
    if runBatches.size() >= initialSlotsForBatches (10):
      queueFinishUnlocked()
        if !isActiveBatchEmpty(): <- closes + submits exactly once
          enqueueCurrentBatchUnlocked()
        hostSynchronize()
        queueFinishPoolsUnlocked()
        batchFinish()
    else:
      enqueueCurrentBatchUnlocked()  <- normal path
      renewRegularUnlocked()

This keeps all finish logic inside queueFinishUnlocked, making the code
easier to maintain and less prone to bugs if queueFinishUnlocked changes.

With queueFlushUnlocked reduced to a single call to renewBatchUnlocked,
the wrapper is no longer needed. Remove it and call renewBatchUnlocked
directly at all former call sites.

Tested-by:

  UR_ADAPTERS_FORCE_LOAD=lib/libur_adapter_level_zero_v2.so \
  UR_L0_V2_FORCE_BATCHED=1 \
  ./test/adapters/level_zero/enqueue_alloc-test \
  (81 passed, 6 skipped, 0 failed)

ldorau · 2026-03-31T09:37:33Z

Please review @pbalcer @EuphoricThinking

Copilot

Pull request overview

Fixes a Level Zero v2 batched-queue flush/renew corner case where reaching the submitted-batch slot limit could re-enter submission and call zeCommandListClose()/submit twice on the same command list, potentially hanging in zeCommandListHostSynchronize() on some driver versions.

Changes:

Update ur_queue_batched_t::renewBatchUnlocked() to avoid delegating to queueFinishUnlocked() when the batch-slot limit is reached; instead directly synchronize, clean pools, and batchFinish() to reset state without re-submitting.
Clarify batch_manager::batchFinish() comments to reflect that the active batch may already have been submitted via queueFlushUnlocked().
Unskip batched-queue execution in the multi-queue USM alloc test to allow exercising this path.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
unified-runtime/source/adapters/level_zero/v2/queue_batched.cpp	Prevents double close/double submit on slot-limit renew by changing the synchronization/reset sequence.
unified-runtime/test/adapters/level_zero/enqueue_alloc.cpp	Removes batched-queue skips so the multi-queue USM alloc test can run under batched submission.

unified-runtime/test/adapters/level_zero/enqueue_alloc.cpp

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

ldorau · 2026-03-31T11:43:23Z

The "E2E (Preview Mode, ["Linux", "gen12"]" CI job failed because of #21023

EuphoricThinking · 2026-03-31T12:30:08Z

Good catch. I would also say that we could replace the content of queueFlushUnlocked with renewBatchUnlocked, with minor adjustments in order to avoid re-enqueueing the batch, since renewBatchUnlocked is used only in queueFlushUnlocked:

ur_result_t
ur_queue_batched_t::renewBatchUnlocked(locked<batch_manager> &batchLocked) {
  if (batchLocked->isLimitOfUsedCommandListsReached()) {
    // enqueue already in queueFinish
    return queueFinishUnlocked(batchLocked);
  } else {
    // Add enqueue here - maybe with checking for emptiness?
    UR_CALL(batchLocked->enqueueCurrentBatchUnlocked());
    ////

    return batchLocked->renewRegularUnlocked(getNewRegularCmdList());
  }
}

This way, we don't scatter the implementation of queueFinish across different parts of the code, reducing the possibility of bugs if we change the queueFinish implementation and forget to update the other components.

ldorau · 2026-03-31T16:07:09Z

Good catch. I would also say that we could replace the content of queueFlushUnlocked with renewBatchUnlocked, with minor adjustments in order to avoid re-enqueueing the batch, since renewBatchUnlocked is used only in queueFlushUnlocked:
ur_result_t
ur_queue_batched_t::renewBatchUnlocked(locked<batch_manager> &batchLocked) {
  if (batchLocked->isLimitOfUsedCommandListsReached()) {
    // enqueue already in queueFinish
    return queueFinishUnlocked(batchLocked);
  } else {
    // Add enqueue here - maybe with checking for emptiness?
    UR_CALL(batchLocked->enqueueCurrentBatchUnlocked());
    ////

    return batchLocked->renewRegularUnlocked(getNewRegularCmdList());
  }
}
This way, we don't scatter the implementation of queueFinish across different parts of the code, reducing the possibility of bugs if we change the queueFinish implementation and forget to update the other components.

Thanks! Fixed.

ldorau · 2026-03-31T16:07:43Z

Please review @EuphoricThinking @pbalcer

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

unified-runtime/source/adapters/level_zero/v2/queue_batched.cpp

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

ldorau · 2026-04-01T08:18:17Z

Please review @pbalcer @EuphoricThinking @intel/unified-runtime-reviewers-level-zero

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

unified-runtime/test/adapters/level_zero/enqueue_alloc.cpp

EuphoricThinking · 2026-04-01T13:04:52Z

Good catch. I would also say that we could replace the content of queueFlushUnlocked with renewBatchUnlocked, with minor adjustments in order to avoid re-enqueueing the batch, since renewBatchUnlocked is used only in queueFlushUnlocked:
ur_result_t
ur_queue_batched_t::renewBatchUnlocked(locked<batch_manager> &batchLocked) {
  if (batchLocked->isLimitOfUsedCommandListsReached()) {
    // enqueue already in queueFinish
    return queueFinishUnlocked(batchLocked);
  } else {
    // Add enqueue here - maybe with checking for emptiness?
    UR_CALL(batchLocked->enqueueCurrentBatchUnlocked());
    ////

    return batchLocked->renewRegularUnlocked(getNewRegularCmdList());
  }
}
This way, we don't scatter the implementation of queueFinish across different parts of the code, reducing the possibility of bugs if we change the queueFinish implementation and forget to update the other components.
Thanks! Fixed.

My bad, I was thinking of replacing queueFlush body with renewBatchUnlocked. Now queueFlushUnlocked is a wrapper for a single function.

I mean:

ur_result_t

//       |   method name change: renewBatchUnlocked --> queueFlushUnlocked
//      \/
ur_queue_batched_t::queueFlushUnlocked(locked<batch_manager> &batchLocked) {
  if (batchLocked->isLimitOfUsedCommandListsReached()) {
    // enqueue already in queueFinish
    return queueFinishUnlocked(batchLocked);
  } else {
    // Add enqueue here - maybe with checking for emptiness?
    UR_CALL(batchLocked->enqueueCurrentBatchUnlocked());
    ////

    return batchLocked->renewRegularUnlocked(getNewRegularCmdList());
  }
}

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

unified-runtime/test/adapters/level_zero/enqueue_alloc.cpp

ldorau · 2026-04-01T13:46:58Z

@EuphoricThinking Fixed. Re-review please.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

Root cause ---------- ur_queue_batched_t::queueFlushUnlocked is called after every enqueueUSMHostAllocExp / enqueueUSMSharedAllocExp / enqueueUSMDeviceAllocExp / enqueueUSMFreeExp to eagerly submit the current batch. The call sequence is: queueFlushUnlocked enqueueCurrentBatchUnlocked() <- (1) zeCommandListClose + zeCommandListImmediateAppendCommandListsExp renewBatchUnlocked() if runBatches.size() >= initialSlotsForBatches (10): queueFinishUnlocked() if !isActiveBatchEmpty(): <- true: enqueuedOperationsCounter > 0 enqueueCurrentBatchUnlocked() <- (2) BUG: double zeCommandListClose + double submit hostSynchronize() <- may hang on some driver versions enqueuedOperationsCounter is incremented by markIssuedCommandInBatch before queueFlushUnlocked is called, but it is not cleared by enqueueCurrentBatchUnlocked, so queueFinishUnlocked's !isActiveBatchEmpty() guard does not protect against the re-entry. With 2 queues and 256 iterations the bug fires ~92 times. On certain GPU driver versions the immediate command list enters a state from which zeCommandListHostSynchronize never returns, hanging the test: UR_ADAPTERS_FORCE_LOAD=lib/libur_adapter_level_zero_v2.so \ UR_L0_V2_FORCE_BATCHED=1 \ ./test/adapters/level_zero/enqueue_alloc-test \ --gtest_filter=*urL0EnqueueAllocMultiQueueSameDeviceTest.SuccessMt* Fix --- Remove the pre-close of the active batch from queueFlushUnlocked and move enqueueCurrentBatchUnlocked() into renewBatchUnlocked's else branch. This ensures that when the batch-slot limit is reached the active batch is still open, so the existing delegation to queueFinishUnlocked closes and submits it exactly once via its !isActiveBatchEmpty() guard: renewBatchUnlocked() if runBatches.size() >= initialSlotsForBatches (10): queueFinishUnlocked() if !isActiveBatchEmpty(): <- closes + submits exactly once enqueueCurrentBatchUnlocked() hostSynchronize() queueFinishPoolsUnlocked() batchFinish() else: enqueueCurrentBatchUnlocked() <- normal path renewRegularUnlocked() This keeps all finish logic inside queueFinishUnlocked, making the code easier to maintain and less prone to bugs if queueFinishUnlocked changes. With queueFlushUnlocked reduced to a single call to renewBatchUnlocked, the wrapper is no longer needed. Remove it and call renewBatchUnlocked directly at all former call sites. Tested-by: UR_ADAPTERS_FORCE_LOAD=lib/libur_adapter_level_zero_v2.so \ UR_L0_V2_FORCE_BATCHED=1 \ ./test/adapters/level_zero/enqueue_alloc-test \ (81 passed, 6 skipped, 0 failed) Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

unified-runtime/test/adapters/level_zero/enqueue_alloc.cpp

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

… all queue types Enable the urL0EnqueueAllocMultiQueueSameDeviceTest and parameterize it over all queue submission modes (UR_QUEUE_FLAG_SUBMISSION_BATCHED and UR_QUEUE_FLAG_SUBMISSION_IMMEDIATE) by: - Removing SKIP_IF_BATCHED_QUEUE to enable the test for batched queues. - Changing the base class template parameter from EnqueueAllocMultiQueueTestParam to uur::MultiQueueParam<EnqueueAllocMultiQueueTestParam> so that the queue mode becomes part of the test parameter. - Adding getMultiQueueParam(), getAllocParam() and getQueueFlags() helpers to the fixture for clean access to the two parts of the parameter tuple. getMultiQueueParam() calls the base class getter using the fully qualified name uur::urContextTestWithParam<...>::getParam() to avoid ambiguity. The names are specific enough to not be confused with GoogleTest's conventional GetParam(). - Creating queues with the parameterized flag via ur_queue_properties_t instead of a hardcoded UR_QUEUE_FLAG_SUBMISSION_BATCHED. - Switching the test suite macro from UUR_DEVICE_TEST_SUITE_WITH_PARAM to UUR_MULTI_QUEUE_TYPE_TEST_SUITE_WITH_PARAM and the printer to deviceTestWithParamPrinterMulti, which expands the suite to cover both queue modes automatically. - Updating all three test bodies (SuccessMt, SuccessReuse, SuccessDependantMt) to use getAllocParam() instead of std::get<1>(this->GetParam()), and restoring the numQueues parameter in SuccessMt to getAllocParam().numQueues. This ensures both batched and immediate queues are covered by default test runs without requiring UR_L0_V2_FORCE_BATCHED=1. Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>

EuphoricThinking · 2026-04-02T14:21:02Z

unified-runtime/source/adapters/level_zero/v2/queue_batched.cpp

  auto batchLocked = currentCmdLists.lock();
  if (batchLocked->isCurrentGeneration(batch_generation)) {
-    return queueFlushUnlocked(batchLocked);
+    return renewBatchUnlocked(batchLocked);


Why would we like to rename queueFlushUnlocked? I think it is consistent with Func() and FuncUnlocked() naming convention.

We do not rename it. We replace all calls to queueFlushUnlocked() with a call to renewBatchUnlocked().

ldorau requested review from EuphoricThinking, Copilot and pbalcer March 31, 2026 09:36

Copilot started reviewing on behalf of ldorau March 31, 2026 09:38 View session

Copilot AI reviewed Mar 31, 2026

View reviewed changes

unified-runtime/test/adapters/level_zero/enqueue_alloc.cpp Outdated Show resolved Hide resolved

ldorau force-pushed the URL0v2_Fix_double_zeCommandListClose_in_batched_queue_flush branch from ce528a7 to 111db9f Compare March 31, 2026 10:31

ldorau requested a review from Copilot March 31, 2026 10:31

Copilot started reviewing on behalf of ldorau March 31, 2026 10:33 View session

Copilot AI reviewed Mar 31, 2026

View reviewed changes

ldorau force-pushed the URL0v2_Fix_double_zeCommandListClose_in_batched_queue_flush branch from 111db9f to 27761c6 Compare March 31, 2026 16:03

ldorau requested a review from Copilot March 31, 2026 16:07

Copilot started reviewing on behalf of ldorau March 31, 2026 16:09 View session

Copilot AI reviewed Mar 31, 2026

View reviewed changes

unified-runtime/source/adapters/level_zero/v2/queue_batched.cpp Show resolved Hide resolved

ldorau requested a review from Copilot April 1, 2026 08:06

ldorau marked this pull request as ready for review April 1, 2026 08:07

ldorau requested a review from a team as a code owner April 1, 2026 08:07

ldorau changed the title ~~[DRAFT] [UR][L0v2] Fix double zeCommandListClose() in batched queue flush~~ [UR][L0v2] Fix double zeCommandListClose() in batched queue flush Apr 1, 2026

Copilot AI reviewed Apr 1, 2026

View reviewed changes

ldorau requested a review from Copilot April 1, 2026 08:29

Copilot started reviewing on behalf of ldorau April 1, 2026 08:30 View session

Copilot AI reviewed Apr 1, 2026

View reviewed changes

EuphoricThinking requested changes Apr 1, 2026

View reviewed changes

unified-runtime/test/adapters/level_zero/enqueue_alloc.cpp Show resolved Hide resolved

ldorau force-pushed the URL0v2_Fix_double_zeCommandListClose_in_batched_queue_flush branch from 27761c6 to 1e4274b Compare April 1, 2026 12:57

ldorau requested a review from EuphoricThinking April 1, 2026 12:57

ldorau requested a review from Copilot April 1, 2026 12:57

Copilot started reviewing on behalf of ldorau April 1, 2026 13:00 View session

Copilot AI reviewed Apr 1, 2026

View reviewed changes

unified-runtime/test/adapters/level_zero/enqueue_alloc.cpp Outdated Show resolved Hide resolved

ldorau force-pushed the URL0v2_Fix_double_zeCommandListClose_in_batched_queue_flush branch from 1e4274b to 6d299f3 Compare April 1, 2026 13:46

ldorau requested a review from Copilot April 1, 2026 13:46

Copilot started reviewing on behalf of ldorau April 1, 2026 13:48 View session

Copilot AI reviewed Apr 1, 2026

View reviewed changes

ldorau requested a review from Copilot April 1, 2026 14:00

Copilot started reviewing on behalf of ldorau April 1, 2026 14:02 View session

Copilot AI reviewed Apr 1, 2026

View reviewed changes

ldorau force-pushed the URL0v2_Fix_double_zeCommandListClose_in_batched_queue_flush branch from 6d299f3 to d421dc7 Compare April 1, 2026 15:21

ldorau requested a review from Copilot April 1, 2026 15:25

Copilot started reviewing on behalf of ldorau April 1, 2026 15:27 View session

Copilot AI reviewed Apr 1, 2026

View reviewed changes

ldorau requested a review from Copilot April 1, 2026 15:49

Copilot started reviewing on behalf of ldorau April 1, 2026 15:52 View session

Copilot AI reviewed Apr 1, 2026

View reviewed changes

ldorau requested a review from Copilot April 2, 2026 08:21

Copilot AI reviewed Apr 2, 2026

View reviewed changes

unified-runtime/test/adapters/level_zero/enqueue_alloc.cpp Outdated Show resolved Hide resolved

unified-runtime/test/adapters/level_zero/enqueue_alloc.cpp Show resolved Hide resolved

Copilot started reviewing on behalf of ldorau April 2, 2026 08:59 View session

ldorau force-pushed the URL0v2_Fix_double_zeCommandListClose_in_batched_queue_flush branch from d421dc7 to 7507428 Compare April 2, 2026 10:05

ldorau requested a review from Copilot April 2, 2026 10:06

Copilot started reviewing on behalf of ldorau April 2, 2026 10:08 View session

ldorau force-pushed the URL0v2_Fix_double_zeCommandListClose_in_batched_queue_flush branch from 7507428 to c7c9a6f Compare April 2, 2026 10:08

Copilot AI reviewed Apr 2, 2026

View reviewed changes

EuphoricThinking reviewed Apr 2, 2026

View reviewed changes

Conversation

ldorau commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Root cause

Fix

Uh oh!

ldorau commented Mar 31, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

ldorau commented Mar 31, 2026

Uh oh!

EuphoricThinking commented Mar 31, 2026

Uh oh!

ldorau commented Mar 31, 2026

Uh oh!

ldorau commented Mar 31, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

ldorau commented Apr 1, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

EuphoricThinking commented Apr 1, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

ldorau commented Apr 1, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

EuphoricThinking Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

ldorau commented Mar 31, 2026 •

edited

Loading