Skip to content

Commit 5cba485

Browse files
committed
fix(webapp): guard failBackgroundWorkerDeployment with BUILDING predicate
Symmetric with the BUILDING → DEPLOYING `updateMany` guard. Without this, two attempts running side-by-side (idempotency lets the retry proceed past worker creation) can produce: A succeeds and flips to DEPLOYING; B hits a transient file/resource/schedule error and unconditionally flips to FAILED; A's subsequent guarded update finds 0 rows; deployment is stuck in FAILED with a worker registered but never deployed. Now the failure handler only flips status on rows that are still BUILDING. The timeout dequeue is also gated on the actual flip so we don't cancel a sibling attempt's just-enqueued timeout.
1 parent a3d837f commit 5cba485

1 file changed

Lines changed: 20 additions & 2 deletions

File tree

apps/webapp/app/v3/services/createDeploymentBackgroundWorkerV4.server.ts

Lines changed: 20 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -232,9 +232,14 @@ export class CreateDeploymentBackgroundWorkerServiceV4 extends BaseService {
232232
}
233233

234234
async #failBackgroundWorkerDeployment(deployment: WorkerDeployment, error: Error) {
235-
await this._prisma.workerDeployment.update({
235+
// Guarded BUILDING → FAILED transition, symmetric with the BUILDING → DEPLOYING
236+
// transition in `call()`. With idempotent retries, two attempts can run side-by-side;
237+
// without the predicate, one attempt's failure could downgrade the deployment after
238+
// the other already flipped it to DEPLOYING, leaving it stuck in FAILED with a worker.
239+
const { count: updatedCount } = await this._prisma.workerDeployment.updateMany({
236240
where: {
237241
id: deployment.id,
242+
status: "BUILDING",
238243
},
239244
data: {
240245
status: "FAILED",
@@ -246,7 +251,20 @@ export class CreateDeploymentBackgroundWorkerServiceV4 extends BaseService {
246251
},
247252
});
248253

249-
await TimeoutDeploymentService.dequeue(deployment.id, this._prisma);
254+
if (updatedCount === 0) {
255+
logger.warn(
256+
"failBackgroundWorkerDeployment: deployment moved out of BUILDING during call, skipping FAILED transition",
257+
{
258+
deploymentId: deployment.id,
259+
originalError: error.message,
260+
}
261+
);
262+
} else {
263+
// Only dequeue the timeout if we actually flipped to FAILED — otherwise a
264+
// sibling attempt may have just enqueued it as part of a successful
265+
// BUILDING → DEPLOYING transition.
266+
await TimeoutDeploymentService.dequeue(deployment.id, this._prisma);
267+
}
250268

251269
throw error;
252270
}

0 commit comments

Comments
 (0)