Clarification on Concurrency: Workers vs Threads, scaling math, and max_threads

Hello FrankenPHP team!

We are currently tuning our production environment on Kubernetes and finalizing our internal documentation. 
We have a few internal discussions regarding how FrankenPHP handles concurrency, specifically the math behind workers, threads, and auto-scaling (max_threads).

We would love to get official clarification on our assumptions to ensure we are configuring things optimally. This will likely help other users tuning their production environments too!

**1. Workers vs Threads & Sizing Math**
Our assumption:
- a "Worker" in FrankenPHP is essentially a lightweight Goroutine running the PHP script in a loop
- a "Thread" is a real OS/POSIX thread initialized via CGo.

If we have the following configuration:

<img width="1024" height="424" alt="Image" src="https://github.com/user-attachments/assets/83eb5346-a739-4bb1-8b34-33b780811f48" />


- name: FRANKENPHP_CONFIG

  value: num_threads 33 max_threads 65

- name: FRANKENPHP_WORKER_CONFIG

  value: num 8

Our mathematical deduction:
Since 8 workers are permanently loaded in RAM, we calculated that the maximum number of threads available to be dynamically spawned or used for other tasks is 40, based on this ratio from our internal docs:
Max Available Threads = (max_threads + num) - num_threads (e.g., (65 + 8) - 33 = 40 as supported by the screenshot above of a load test, busy threads doesn't go over 40).

**Question 1**: Is this mathematical ratio correct? Do workers permanently occupy a thread from the num_threads pool, or do they dynamically grab them?

**Question 2**: Are there any other hidden mathematical rules, constraints, or ratios (similar to num_threads * memory_limit + GOMEMLIMIT for pod memory) that we should be aware of when sizing these parameters?

**2. The utility of max_threads vs. Kubernetes Pod Scaling**
We are debating the best approach to handle traffic spikes.

Approach A: max_threads isn't particularly useful in a Kubernetes context. It is better to strictly limit the number of threads per pod (to heavily control memory/CPU usage) and rely entirely on Kubernetes HPA (Horizontal Pod Autoscaler) to add more pods when the load increases.

Approach B: max_threads is extremely useful because it acts as a fast buffer. Spawning an additional thread at runtime happens in milliseconds, which makes the server more resilient to sudden latency spikes. Relying solely on K8s HPA means waiting seconds or minutes for a new pod to become ready, risking queued or dropped requests during a sudden spike.

**Question 3**: Could you confirm if max_threads is indeed designed to act as this fast buffer for traffic spikes, and how do you recommend balancing max_threads (vertical scaling within the pod) versus horizontal pod scaling?

**3. Understanding Metrics for HPA**
We are using your Prometheus metrics to build our dashboards 
We noticed frankenphp_busy_threads accurately tracks our active threads.

We also track frankenphp_worker_busy_workers and frankenphp_queue_depth.

**Question 4**: For Kubernetes HPA, would you recommend scaling based primarily on frankenphp_queue_depth to ensure pods scale up before requests start piling up and timing out?

Thanks for this project and for taking the time to clarify these points!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarification on Concurrency: Workers vs Threads, scaling math, and max_threads #2471

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Clarification on Concurrency: Workers vs Threads, scaling math, and max_threads #2471

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions