Skip to content

shutdown Celery workers do not have an opportunity to reset their task status #388

@MoralCode

Description

@MoralCode

We observed in production that Flower was showing 0 core tasks running for an extended period of time, yet the collection_status operations table had ~40 core tasks listed as Collecting (with UUIDs).

confirming (by searching the repo url for those repos in flower) that these tasks were not currently running and hadnt already completed, we concluded that they were stale/the DB's picture of collection status had gotten out of sync with flower.

Changing a few of these tasks from Collecting with a UUID, to Pending (and nulling out the UUID) seems to have partially cleared the block on core workers.

When asked to identify bugs that could cause this out of sync-ness/stale tasks, Generative AI identified 4 issues. Three of these were deemed (by me) to be not relevant and/or intended behavior, but the one that was promising seemed to be:

No on_revoke / task revocation handler

celery_app.py Lines 1-10
from celery.signals import worker_process_init, worker_process_shutdown
There's no task_revoked or task_prerun/task_postrun signal handler that would reset COLLECTING to ERROR/PENDING when tasks are revoked during worker shutdown. The on_failure on CoreRepoCollectionTask is only triggered by exceptions raised inside the task function — not by SIGKILL, OOM kills, or Celery task revocation.

This seems like a very plausible error chain:

  1. a celery worker gets shutdown or recreated for some reason
  2. we have no error handler to catch this when that workers tasks are revoked
  3. we subsequently never change the status of those now-revoked tasks in the DB
  4. even if/when the new workers come back, the collection monitor doesnt pick up any more tasks because it thinks core collection is already running at full beans

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions