Skip to content

Add celery beat task, archive_pending_files to automatically archives files that have been created but not made live after 24 hours.#4875

Open
rparke wants to merge 10 commits into
mainfrom
rp-cleanup-pending-files-scheduled-task
Open

Add celery beat task, archive_pending_files to automatically archives files that have been created but not made live after 24 hours.#4875
rparke wants to merge 10 commits into
mainfrom
rp-cleanup-pending-files-scheduled-task

Conversation

@rparke

@rparke rparke commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

This is vital cleanup work that ensures that these stale files don't persist and are properly cleaned up, eventually being removed from S3 too when they are picked up by the scheduled task: remove-archived-template-email-files-from-s3.

24 hours has been selected because that's the session timeout for our users and should be long enough that if they were going to make the file live, they'd have done it.

@rparke rparke force-pushed the rp-cleanup-pending-files-scheduled-task branch 2 times, most recently from b4a0df2 to 8ed0f9d Compare June 8, 2026 10:04
Comment thread app/config.py Outdated
Comment thread tests/app/celery/test_scheduled_tasks.py Outdated
Comment thread app/dao/template_email_files_dao.py Outdated
…e files that have been created but not made live after 24 hours.

This is vital cleanup work that ensures that these stale files don't persist and are properly cleaned up, eventually being removed from S3 too.
@rparke rparke force-pushed the rp-cleanup-pending-files-scheduled-task branch from 8ed0f9d to 1775df8 Compare June 8, 2026 10:22
Comment thread app/dao/template_email_files_dao.py Outdated

@CrystalPea CrystalPea left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! I left a few comments 🙌🏼

Comment thread app/celery/scheduled_tasks.py Outdated
Comment thread app/celery/scheduled_tasks.py Outdated
datetime.datetime.utcnow() - TemplateEmailFile.created_at
> datetime.timedelta(hours=current_app.config.get("TEMPLATE_EMAIL_FILE_ARCHIVE_PERIOD_IN_HOURS")),
).all()
for file in files_in_pending:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we archive all pending files in a single transaction, instead of looping? That will make the query much much faster.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how much faster this will make the query because we don't get thousands of new files a day (at the moment) can update if needed.

Comment thread app/config.py
Comment thread tests/app/celery/test_scheduled_tasks.py
Comment thread tests/app/celery/test_scheduled_tasks.py
Comment thread app/celery/scheduled_tasks.py Outdated
Comment thread tests/app/db.py Outdated
Comment thread app/celery/scheduled_tasks.py
Comment thread app/dao/template_email_files_dao.py
Comment thread app/dao/template_email_files_dao.py
rparke and others added 4 commits June 12, 2026 13:53
Co-authored-by: joybytes <171517790+joybytes@users.noreply.github.com>
…alphagov/notifications-api into rp-cleanup-pending-files-scheduled-task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants