Skip to content

feat: require self-healing + bounded resources for systemd container services (profile 1.2.0)#3

Merged
fatherlinux merged 1 commit into
mainfrom
feat/container-runtime-resilience
Jun 1, 2026
Merged

feat: require self-healing + bounded resources for systemd container services (profile 1.2.0)#3
fatherlinux merged 1 commit into
mainfrom
feat/container-runtime-resilience

Conversation

@fatherlinux
Copy link
Copy Markdown
Member

Container Image profile now mandates Restart=on-failure on internal services and bounded worker pools (php-fpm) sized to the container memory limit.

Codifies the lesson from the crunchtools.com 2026-05-27 OOM outage where stock php-fpm config + no Restart= caused a silent ~5-day blog outage.

🤖 Generated with Claude Code

…r services (profile 1.2.0)

Container Image profile now mandates Restart=on-failure on internal services
and bounded worker pools (php-fpm) sized to the container memory limit.
Codifies the lesson from the crunchtools.com 2026-05-27 OOM outage.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the Container Image Profile to version 1.2.0, introducing guidelines for systemd-based containers to self-heal internal services using Restart=on-failure and to bound resource consumption by capping worker pool concurrency (such as pm.max_children for php-fpm). The review feedback recommends adjusting the formula for pm.max_children to subtract the memory overhead of other running services, preventing OOM starvation in multi-service environments.

Comment on lines +105 to +106
prefer `pm = ondemand` with an explicit `pm.max_children` sized to
`(memory_limit ÷ avg_worker_RSS)`, plus `pm.max_requests` to recycle workers.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In a multi-service systemd container (e.g., running systemd, httpd, MariaDB, and php-fpm together), allocating the entire container memory limit to the php-fpm worker pool formula can still lead to OOM starvation of other essential services. The formula should subtract the memory overhead of other running processes to ensure safe boundaries.

Suggested change
prefer `pm = ondemand` with an explicit `pm.max_children` sized to
`(memory_limit ÷ avg_worker_RSS)`, plus `pm.max_requests` to recycle workers.
prefer `pm = ondemand` with an explicit `pm.max_children` sized to
`((container_memory_limit - other_services_memory) ÷ avg_worker_RSS)`, plus `pm.max_requests` to recycle workers.

@fatherlinux fatherlinux merged commit 5eb6916 into main Jun 1, 2026
1 check passed
@fatherlinux fatherlinux deleted the feat/container-runtime-resilience branch June 1, 2026 15:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant