Skip to content

Backup fails with stale Restic lock when apply-updates runs during a backup #8077

Description

@piviul

Steps to reproduce

  • Configure a scheduled backup (e.g. hourly) so that its start minute coincides with apply-updates.timer (default 03:12)
  • Wait for both timers to fire at the same time on a node where the backup of a large module (e.g. mail) is still running when the update starts
  • Observe the following backup runs for that module

Expected behavior

Module updates and backup runs do not interfere: either the update waits for the running backup to complete, or the backup is stopped gracefully so that Restic releases its repository lock. Subsequent backup runs succeed.

Actual behavior

The module update restarts the module's containers while backup1.service is mid-operation. The restic-<module>-<pid> container is killed with SIGTERM (backup1.service: Main process exited, code=killed, status=15/TERM) and Restic cannot release its lock. A stale lock file remains in the repository locks/ directory, and every subsequent backup run for that module fails during forget --prune with:

[ERROR] restic backup failed. Command '[...restic...] forget --prune --keep-last=360' returned non-zero exit status 11.

The failure chain persists until the lock is removed manually (restic unlock). Modules with short backup times are rarely affected; large modules (e.g. mail with ~400GB of data) are hit systematically.

Components

  • core 3.19.2 (ghcr.io/nethserver/restic:3.19.2)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Fields

    No fields configured for Bug.

    Projects

    Status
    ToDo

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions