Skip to content
Open
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
87917b6
Add blog post: The Hidden Cost of Rootless Container Networking
franz1981 Apr 9, 2026
2eb826f
Remove unfair inside-container pgbench row from network comparison table
franz1981 Apr 9, 2026
729766e
Revise article: add Spring numbers, new title, upstream references
franz1981 Apr 9, 2026
876b902
Add active benchmarking takeaway with Gregg and Quarkus blog references
franz1981 Apr 9, 2026
7b65a05
Fix mpstat analysis, add missing links, set correct date
franz1981 Apr 9, 2026
f5eb911
Fix mpstat narrative: remove hindsight bias
franz1981 Apr 9, 2026
97d9fc9
Rewrite blog article: fix narrative, add visuals and links
franz1981 Apr 9, 2026
3e0f7f4
Restructure narrative: two flamegraphs, diagnose then confirm
franz1981 Apr 9, 2026
a628c96
Refine flamegraph analysis: clarify kernel spin locks and network pro…
franz1981 Apr 9, 2026
2d4a6d2
Link pgbench to PostgreSQL docs
franz1981 Apr 9, 2026
56b18ec
Clarify performance analysis: refine Quarkus and Spring CPU efficienc…
franz1981 Apr 9, 2026
f212e60
Fix capitalization of PostgreSQL in benchmark description and network…
franz1981 Apr 9, 2026
9ca77f6
Refine analysis of CPU efficiency: clarify impact of infrastructure t…
franz1981 Apr 9, 2026
276e05c
Expand hidden cost analysis: highlight connection pool impact from ro…
franz1981 Apr 9, 2026
3510b91
Add interactive SVG flamegraphs, Agroal cascade note, CPU budget framing
franz1981 Apr 9, 2026
e74950c
Refine analysis of connection pool impact: clarify network latency ef…
franz1981 Apr 9, 2026
da3c4ad
Refine pasta overhead analysis: highlight Quarkus CPU saturation and …
franz1981 Apr 9, 2026
2a131c6
Optimize PostgreSQL benchmarks: emphasize `--network=host` impact and…
franz1981 Apr 10, 2026
399582d
Refine networking overhead analysis: emphasize pasta's single-threade…
franz1981 Apr 21, 2026
924a83a
Update performance analysis in index.adoc
franz1981 Apr 21, 2026
a105d49
Update index.adoc
franz1981 Apr 21, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
168 changes: 168 additions & 0 deletions content/post/hidden-cost-rootless-container-networking/index.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,168 @@
---
title: "Why isn't Quarkus 2x faster than Spring on my machine?"
date: 2026-04-09T00:00:00Z
categories: ['performance', 'benchmarking', 'containers']
summary: 'Our perf-lab shows Quarkus 2x faster than Spring, but a community member only sees 1.19x locally. The culprit: a userspace TCP proxy hidden inside rootless podman.'
image: 'diff-flamegraph.png'
authors:
- Francesco Nigro
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Front matter in other blog posts includes a related: [''] field; this new post omits it. If the site/theme expects consistent front-matter keys (even when empty), add the related field here to match the established pattern across existing posts.

Suggested change
- Francesco Nigro
- Francesco Nigro
related: ['']

Copilot uses AI. Check for mistakes.
---

A community member ran our https://github.com/quarkusio/spring-quarkus-perf-comparison[Quarkus vs Spring CRUD benchmark] on their bare-metal Fedora workstation and asked:

[quote]
____
[.lead]
_Why do I see only 1.19x instead of 2x?_
____

**Our perf-lab shows Quarkus at 2.08x Spring's throughput, but locally the gap nearly disappears.**

This post walks through the investigation that found the culprit.

== The gap

The benchmark is a REST/CRUD application backed by PostgreSQL. The app runs on the host, postgres in a rootless podman container. Each HTTP request executes 2 SQL queries (confirmed via https://www.postgresql.org/docs/current/pgstatstatements.html[pg_stat_statements]).

image::throughput-gap.svg[Throughput comparison: Local vs Perf-lab]

Spring delivers roughly the same throughput in both environments (~12-13K TPS). Quarkus swings from 15.5K to 24.5K -- it is being held back locally. **Something between the app and postgres is penalizing Quarkus specifically.**

== mpstat: where is the CPU going?

The benchmark collects https://man7.org/linux/man-pages/man1/mpstat.1.html[mpstat] data during every run — per-CPU utilization split into `%usr` (application code), `%sys` (kernel), `%soft` (softirq, mainly network packet processing), and `%idle`. This is part of our https://github.com/quarkusio/spring-quarkus-perf-comparison/issues/62[active benchmarking practice]: observing the system _while it runs_, not just collecting final TPS numbers.

Both environments run Quarkus at 2.3GHz with the same workload and CPU pinning. The mpstat profiles could not be more different:
Comment thread
franz1981 marked this conversation as resolved.

[cols="2,1,1,1,1", options="header"]
|===
| Environment | %usr | %sys | %soft | %idle

| Local (Fedora, 15,504 TPS) | 39-50% | 34-41% | 9-17% | 3-5%
| Perf-lab (RHEL, 24,472 TPS) | 87-94% | 5-11% | 0-2% | 0%
|===

`%usr` is time running application code. `%sys` is time in the kernel. On perf-lab, over 85% of CPU goes to the application. Locally, nearly half goes to the kernel — and the application has idle CPU it cannot use. Same application, same clock speed, same workload: **the local environment is burning CPU in the kernel instead of running the app.** We isolated the network path next.
Comment thread
franz1981 marked this conversation as resolved.
Outdated

== Isolating the network layer with pgbench
Comment thread
franz1981 marked this conversation as resolved.
Outdated

To confirm the network path was the bottleneck, we ran `pgbench` with the same 2-query workload (50 clients, prepared statements, 30 seconds) over different network paths. We also tested with Fedora's https://wiki.nftables.org/[nftables] firewall disabled, since the JFR flamegraph showed `nft_do_chain` in the kernel stacks:

[cols="2,1,1", options="header"]
|===
| Network path | TPS | Stmt latency

| Host -> container (pasta + nftables) | 18,106 | 1.38ms
| Host -> container (pasta, no nftables) | 20,402 | 1.22ms
| Host -> container (`--network=host`) | 53,262 | 0.47ms
|===

With `--network=host`, statement latency drops from 1.38ms to 0.47ms — a 3x reduction. With 2 statements per HTTP request, that overhead adds up on every request.

== The flamegraph tells the story

JFR CPU profiles (collected via https://github.com/async-profiler/async-profiler[async-profiler]) from the default and host-networking Quarkus runs were compared using a https://www.brendangregg.com/flamegraphs.html[differential flamegraph]. Red frames appear more in the default (pasta) configuration; blue frames appear more with host networking.

image::diff-flamegraph.png[Differential flamegraph: pasta vs host networking]

Red means more CPU in the default (pasta) run; blue means more CPU with host networking. The red stacks split into two groups: the pasta proxy overhead — extra `tcp_sendmsg`, `ip_output`, and softirq `net_rx_action` from the two additional kernel/userspace boundary crossings — and the firewall overhead — `nf_hook_slow` and `nft_do_chain` from Fedora's 973 nftables rules. Both disappear with `--network=host`, because the app and postgres share the same network namespace and packets never leave the kernel.

Per-request CPU cost confirms the picture:

[cols="2,1", options="header"]
|===
| Configuration | CPU ms/req

| Default pasta (15,504 TPS) | 0.231
| Host networking (24,116 TPS) | 0.158
| Perf-lab (24,472 TPS) | 0.158
|===

With host networking, per-request cost **matches the perf-lab exactly**: 0.158 ms/req.

== Root cause: pasta, the userspace TCP proxy

Rootless podman on Fedora uses https://passt.top/passt/[pasta (passt)] to forward container ports. Unlike rootful podman (which uses kernel-level port forwarding), pasta is a userspace process that proxies every TCP packet:

----
With pasta (default rootless):
App --> kernel --> pasta (userspace) --> kernel --> container netns --> postgres

With --network=host:
App --> kernel --> postgres (same network namespace)
----

Every JDBC packet traverses two extra kernel/userspace boundary crossings plus a userspace copy in the pasta process. For a chatty protocol like JDBC with small, frequent packets, this is devastating.

=== Bonus: nftables firewall overhead

Fedora's `firewalld` maintains 973 https://wiki.nftables.org/[nftables] rules that every packet traverses (`nf_hook_slow` -> `nft_do_chain`). This is independent of pasta — it affects any network traffic on the host. Disabling the firewall recovers another ~10% throughput. This matches findings from https://talawah.io/blog/extreme-http-performance-tuning-one-point-two-million/[prior work on extreme HTTP tuning] where iptables `nf_hook_slow` consumed ~18% of CPU in benchmarks.

== Why Quarkus is affected but Spring is not

[cols="2,1,1,1", options="header"]
|===
| Configuration | Quarkus TPS | Spring TPS | Ratio

| Default (pasta + nftables) | 15,504 | 13,062 | 1.19x
| `--network=host` | 24,116 | 13,368 | 1.80x
| Perf-lab (RHEL 9.6) | 24,472 | 11,783 | 2.08x
|===

Removing pasta boosts Quarkus by 55% but Spring by only 2.3%. **The reason is where each framework spends its CPU time.**
Comment thread
franz1981 marked this conversation as resolved.
Outdated

**Quarkus is I/O-efficient**: its per-request framework overhead is small, so DB round-trip latency dominates the profile. When pasta adds 0.9ms per statement, that overhead becomes a large fraction of Quarkus's total per-request cost. Remove pasta, and Quarkus unlocks all the CPU it was wasting on proxy overhead.

**Spring is CPU-bound on framework overhead**: deeper call stacks and more instructions per request mean DB latency is a smaller fraction of Spring's per-request cost. Removing pasta barely moves the needle.

In other words, **pasta was masking Quarkus's I/O efficiency advantage** -- the very thing that makes it 2x faster on the perf-lab.
Comment thread
franz1981 marked this conversation as resolved.
Outdated

== The fix
Comment thread
franz1981 marked this conversation as resolved.
Outdated

Run the postgres container with `--network=host` instead of port-mapping (`-p 5432:5432`). We added `DB_HOST_NETWORK=true` to the benchmark's https://github.com/quarkusio/spring-quarkus-perf-comparison/blob/main/scripts/infra.sh[infrastructure script].

[cols="2,1,1,1", options="header"]
|===
| Configuration | Quarkus TPS | vs Perf-lab | Ratio Q/S

| Default (pasta + nftables) | 15,504 | 63.4% | 1.19x
| No nftables only | 16,105 | 65.8% | --
| `--network=host` | 24,116 | 98.5% | 1.80x
| `--network=host` + no nftables | 26,039 | 106.4% | --
| Perf-lab (RHEL 9.6) | 24,472 | 100% | 2.08x
|===

**With host networking, the local Fedora workstation matches the perf-lab.** The remaining gap to the perf-lab's 2.08x ratio is accounted for by nftables (Fedora's 973 rules vs RHEL's minimal ruleset) and minor kernel differences.

== Takeaways

* **A benchmark that doesn't stress what it claims to stress will deliver misleading results.** This is a textbook case of what Brendan Gregg calls https://www.brendangregg.com/activebenchmarking.html[active benchmarking]:
+
[quote, Brendan Gregg]
____
_You benchmark A, but actually measure B, and conclude you've measured C._
____
+
We thought we were measuring framework throughput, but we were actually measuring pasta proxy overhead. Only by observing the system _while the benchmark was running_ — https://man7.org/linux/man-pages/man1/mpstat.1.html[mpstat], pgbench, flamegraphs, as https://github.com/quarkusio/spring-quarkus-perf-comparison/issues/62[required by our benchmarking practice] — did the real bottleneck emerge. The https://quarkus.io/blog/new-benchmarks/[published benchmark] was designed to isolate framework performance from infrastructure variables -- but rootless container networking silently violated that isolation.

* **The impact is asymmetric.** I/O-efficient frameworks like Quarkus are disproportionately penalized because DB latency is a larger fraction of their per-request cost. CPU-bound frameworks like Spring are barely affected, which compresses the apparent gap.

* **Check your networking path.** Run `podman info | grep rootlessNetworkCmd` to see your backend. If it says `pasta` and your benchmark talks to a containerized database, use `--network=host` for the database container.

* **Firewall rules add up.** Nearly 1000 nftables rules cost ~10% throughput on a chatty workload. For benchmarking, consider temporarily disabling the firewall or using a minimal ruleset.

== Known upstream issues

Our findings are consistent with several known issues in the podman/pasta ecosystem:

* **pasta is single-threaded by design** and degrades above ~8 concurrent connections. At higher concurrency, even the older slirp4netns backend can outperform it. (https://github.com/containers/podman/discussions/22559[Podman Discussion #22559])

* **pasta consuming 90-100% CPU** has been reported under sustained network load, e.g. Wireguard tunnels on kernel 6.x. (https://github.com/containers/podman/issues/23686[Podman Issue #23686])

* **Java + PostgreSQL hang** -- a Spring app running PostgreSQL `COPY FROM STDIN` via pasta consistently freezes mid-transfer. `--network=host` fixes it. (https://github.com/containers/podman/issues/22593[Podman Issue #22593])

* **Throughput far below host capacity** -- rootless containers on multi-gigabit hosts achieving only ~100 Mbit/s through pasta. (https://github.com/containers/podman/issues/17865[Podman Issue #17865])

* **Traffic stalls under sustained load** -- TCP downloads through pasta start normally then halt, with pasta pinned at high CPU. (https://github.com/containers/podman/issues/17703[Podman Issue #17703])

* The official https://github.com/containers/podman/blob/main/docs/tutorials/performance.md[Podman performance tutorial] documents `--network=host` and socket activation as workarounds for network-sensitive workloads.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.