Hello FrankenPHP team!
We are currently tuning our production environment on Kubernetes and finalizing our internal documentation.
We have a few internal discussions regarding how FrankenPHP handles concurrency, specifically the math behind workers, threads, and auto-scaling (max_threads).
We would love to get official clarification on our assumptions to ensure we are configuring things optimally. This will likely help other users tuning their production environments too!
1. Workers vs Threads & Sizing Math
Our assumption:
- a "Worker" in FrankenPHP is essentially a lightweight Goroutine running the PHP script in a loop
- a "Thread" is a real OS/POSIX thread initialized via CGo.
If we have the following configuration:
Our mathematical deduction:
Since 8 workers are permanently loaded in RAM, we calculated that the maximum number of threads available to be dynamically spawned or used for other tasks is 40, based on this ratio from our internal docs:
Max Available Threads = (max_threads + num) - num_threads (e.g., (65 + 8) - 33 = 40 as supported by the screenshot above of a load test, busy threads doesn't go over 40).
Question 1: Is this mathematical ratio correct? Do workers permanently occupy a thread from the num_threads pool, or do they dynamically grab them?
Question 2: Are there any other hidden mathematical rules, constraints, or ratios (similar to num_threads * memory_limit + GOMEMLIMIT for pod memory) that we should be aware of when sizing these parameters?
2. The utility of max_threads vs. Kubernetes Pod Scaling
We are debating the best approach to handle traffic spikes.
Approach A: max_threads isn't particularly useful in a Kubernetes context. It is better to strictly limit the number of threads per pod (to heavily control memory/CPU usage) and rely entirely on Kubernetes HPA (Horizontal Pod Autoscaler) to add more pods when the load increases.
Approach B: max_threads is extremely useful because it acts as a fast buffer. Spawning an additional thread at runtime happens in milliseconds, which makes the server more resilient to sudden latency spikes. Relying solely on K8s HPA means waiting seconds or minutes for a new pod to become ready, risking queued or dropped requests during a sudden spike.
Question 3: Could you confirm if max_threads is indeed designed to act as this fast buffer for traffic spikes, and how do you recommend balancing max_threads (vertical scaling within the pod) versus horizontal pod scaling?
3. Understanding Metrics for HPA
We are using your Prometheus metrics to build our dashboards
We noticed frankenphp_busy_threads accurately tracks our active threads.
We also track frankenphp_worker_busy_workers and frankenphp_queue_depth.
Question 4: For Kubernetes HPA, would you recommend scaling based primarily on frankenphp_queue_depth to ensure pods scale up before requests start piling up and timing out?
Thanks for this project and for taking the time to clarify these points!
Hello FrankenPHP team!
We are currently tuning our production environment on Kubernetes and finalizing our internal documentation.
We have a few internal discussions regarding how FrankenPHP handles concurrency, specifically the math behind workers, threads, and auto-scaling (max_threads).
We would love to get official clarification on our assumptions to ensure we are configuring things optimally. This will likely help other users tuning their production environments too!
1. Workers vs Threads & Sizing Math
Our assumption:
If we have the following configuration:
name: FRANKENPHP_CONFIG
value: num_threads 33 max_threads 65
name: FRANKENPHP_WORKER_CONFIG
value: num 8
Our mathematical deduction:
Since 8 workers are permanently loaded in RAM, we calculated that the maximum number of threads available to be dynamically spawned or used for other tasks is 40, based on this ratio from our internal docs:
Max Available Threads = (max_threads + num) - num_threads (e.g., (65 + 8) - 33 = 40 as supported by the screenshot above of a load test, busy threads doesn't go over 40).
Question 1: Is this mathematical ratio correct? Do workers permanently occupy a thread from the num_threads pool, or do they dynamically grab them?
Question 2: Are there any other hidden mathematical rules, constraints, or ratios (similar to num_threads * memory_limit + GOMEMLIMIT for pod memory) that we should be aware of when sizing these parameters?
2. The utility of max_threads vs. Kubernetes Pod Scaling
We are debating the best approach to handle traffic spikes.
Approach A: max_threads isn't particularly useful in a Kubernetes context. It is better to strictly limit the number of threads per pod (to heavily control memory/CPU usage) and rely entirely on Kubernetes HPA (Horizontal Pod Autoscaler) to add more pods when the load increases.
Approach B: max_threads is extremely useful because it acts as a fast buffer. Spawning an additional thread at runtime happens in milliseconds, which makes the server more resilient to sudden latency spikes. Relying solely on K8s HPA means waiting seconds or minutes for a new pod to become ready, risking queued or dropped requests during a sudden spike.
Question 3: Could you confirm if max_threads is indeed designed to act as this fast buffer for traffic spikes, and how do you recommend balancing max_threads (vertical scaling within the pod) versus horizontal pod scaling?
3. Understanding Metrics for HPA
We are using your Prometheus metrics to build our dashboards
We noticed frankenphp_busy_threads accurately tracks our active threads.
We also track frankenphp_worker_busy_workers and frankenphp_queue_depth.
Question 4: For Kubernetes HPA, would you recommend scaling based primarily on frankenphp_queue_depth to ensure pods scale up before requests start piling up and timing out?
Thanks for this project and for taking the time to clarify these points!