feat: require self-healing + bounded resources for systemd container services (profile 1.2.0)#3
Conversation
…r services (profile 1.2.0) Container Image profile now mandates Restart=on-failure on internal services and bounded worker pools (php-fpm) sized to the container memory limit. Codifies the lesson from the crunchtools.com 2026-05-27 OOM outage. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request updates the Container Image Profile to version 1.2.0, introducing guidelines for systemd-based containers to self-heal internal services using Restart=on-failure and to bound resource consumption by capping worker pool concurrency (such as pm.max_children for php-fpm). The review feedback recommends adjusting the formula for pm.max_children to subtract the memory overhead of other running services, preventing OOM starvation in multi-service environments.
| prefer `pm = ondemand` with an explicit `pm.max_children` sized to | ||
| `(memory_limit ÷ avg_worker_RSS)`, plus `pm.max_requests` to recycle workers. |
There was a problem hiding this comment.
In a multi-service systemd container (e.g., running systemd, httpd, MariaDB, and php-fpm together), allocating the entire container memory limit to the php-fpm worker pool formula can still lead to OOM starvation of other essential services. The formula should subtract the memory overhead of other running processes to ensure safe boundaries.
| prefer `pm = ondemand` with an explicit `pm.max_children` sized to | |
| `(memory_limit ÷ avg_worker_RSS)`, plus `pm.max_requests` to recycle workers. | |
| prefer `pm = ondemand` with an explicit `pm.max_children` sized to | |
| `((container_memory_limit - other_services_memory) ÷ avg_worker_RSS)`, plus `pm.max_requests` to recycle workers. |
Container Image profile now mandates
Restart=on-failureon internal services and bounded worker pools (php-fpm) sized to the container memory limit.Codifies the lesson from the crunchtools.com 2026-05-27 OOM outage where stock php-fpm config + no Restart= caused a silent ~5-day blog outage.
🤖 Generated with Claude Code