Skip to content

CrashLoopBackOff on Kubernetes: DuckDB WAL replay fails on fresh database initialization #36

@marcochiodo

Description

@marcochiodo

Environment

  • Liwan version: 1.4 (image ghcr.io/explodingcamera/liwan:1.4, SHA sha256:80c696af40b84abb1a008ee307e297a7722c178e654ab88dee5e953e2f93f661)
  • DuckDB version: 1.5.0 (bundled in liwan 1.4)
  • Kubernetes: k3s v1.33.6+k3s1
  • Container runtime: containerd 2.1.5-k3s1.33
  • Host OS: Debian GNU/Linux 13 (trixie)
  • Kernel: 6.12.38+deb13-cloud-amd64
  • CPU: AMD EPYC with AVX2 support
  • Filesystem: ext4
  • Storage: local-path provisioner (k3s default), bind-mount into pod

Problem

Liwan crashes immediately (<1 second) on every startup inside a Kubernetes pod,
even on a completely clean /data directory. The error is:

WARN liwan::app::db: Failed to create DuckDB connection. If you've just upgraded
to Liwan 1.2, please downgrade to version 1.1.1 first, start and stop the server,
and then upgrade to 1.2 again.
Error: Failed to create DuckDB connection: INTERNAL Error: Failure while replaying
WAL file "/data/liwan-events.duckdb.wal": Calling DatabaseManager::GetDefaultDatabase
with no default database set
This error signals an assertion failure within DuckDB.

DuckDB creates the WAL file on fresh init, then immediately fails to replay it as
part of the initial checkpoint. The result is a CrashLoopBackOff with a deterministic
4494-byte WAL file created every time.

What We Tested

Scenario Result
ctr run with overlay filesystem (no volume) ✅ Works
ctr run with bind-mount to the same PVC path (clean dir) ✅ Works
Kubernetes pod, clean PVC, default security context ❌ Crashes
Kubernetes pod + seccompProfile: Unconfined ❌ Crashes
Kubernetes pod + capabilities: add: ["ALL"] ❌ Crashes
Kubernetes pod + runAsUser: 0 ❌ Crashes
Kubernetes pod + memory limit 512Mi ❌ Crashes
Kubernetes pod + LIWAN_DUCKDB_THREADS=1 ❌ Crashes

The crash is deterministic and always produces the same 4494-byte WAL file before failing.

Additional Findings

  • LIWAN_LISTEN cannot be set via environment variable — causes Error: duplicate field 'listen', suggesting the distroless image includes a TOML config with listen already set.
  • LIWAN_BASE_URL must point to a resolvable domain before the pod starts, otherwise liwan
    panics with failed to lookup address information: Name does not resolve (src/web/mod.rs:143).
  • DuckDB 1.5.1 release notes mention a fix for "WAL corruption related to
    MarkBlockAsCheckpointed on fresh database initialization"
    , which matches this exact failure.
    Liwan 1.4 bundles DuckDB 1.5.0.

Suspected Root Cause

DuckDB 1.5.0 bug in fresh database initialization, fixed in DuckDB 1.5.1. The bug surfaces
specifically in the Kubernetes pod execution context (possibly related to cgroup constraints or
subtle runtime differences vs. bare ctr run). Upgrading the bundled DuckDB to ≥1.5.1 would
likely fix the issue.

Workaround

None found. The application is currently not usable on Kubernetes with liwan 1.4.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions