Skip to content

Blog/hidden cost rootless container networking#27

Open
franz1981 wants to merge 18 commits intoRedHatPerf:devfrom
franz1981:blog/hidden-cost-rootless-container-networking
Open

Blog/hidden cost rootless container networking#27
franz1981 wants to merge 18 commits intoRedHatPerf:devfrom
franz1981:blog/hidden-cost-rootless-container-networking

Conversation

@franz1981
Copy link
Copy Markdown
Contributor

@franz1981 franz1981 commented Apr 9, 2026

Covers how podman's pasta userspace proxy silently reduces Java
benchmark throughput by 40% when talking to a containerized
PostgreSQL, and the USE method investigation that found it.
The "Inside container (unix socket)" row was not comparable to the other
rows because pgbench shared the 3 DB cores with postgres, while the other
tests ran pgbench on 4 separate app cores. This made the inside-container
result bottlenecked by core contention rather than measuring network overhead.

Use --network=host as the baseline instead, and make the overhead
calculation explicit (1.38ms vs 0.47ms = ~0.9ms pasta overhead).
Reframe around the community question "Why isn't Quarkus 2x faster
than Spring on my machine?" Add Quarkus vs Spring comparison tables,
new section explaining asymmetric pasta impact, known upstream issues
with links, and tighten data claims.
Link to Brendan Gregg's active benchmarking methodology and the
Quarkus benchmark blog post. Key point: if a benchmark doesn't
stress what it claims to, results are misleading.
- Fix mpstat section: loopback does generate softirqs, the real
  clue was CPUs saturated at 97% for 18K TPS vs 60% for 48K TPS
  on pure loopback. Extra %sys is the tell, not %soft.
- Link differential flamegraph to Brendan Gregg's flamegraph page
- Link infra.sh reference to actual file on GitHub
- Set post date to today (Hugo skips future-dated posts)
We didn't know about the loopback comparison at mpstat stage.
Honest narrative: high kernel overhead felt wrong, so we isolated
the network path next.
Rewrite mpstat section to explain what was genuinely visible
before finding pasta, add SVG throughput chart, link tools
(mpstat, pg_stat_statements, async-profiler, nftables), and
bold key findings for scannability.
Add diff flamegraph comparing perf-lab vs local (both unpatched)
right after mpstat to show where kernel time goes. Move existing
before/after flamegraph to after the fix as visual confirmation.
@franz1981 franz1981 marked this pull request as ready for review April 9, 2026 13:38
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new blog post explaining why Quarkus’ performance advantage over Spring can appear smaller on a local Fedora workstation due to rootless Podman networking (pasta) and nftables overhead, including supporting visuals.

Changes:

  • Adds a new AsciiDoc post detailing the investigation (mpstat, differential flamegraphs, pgbench) and recommended mitigation (--network=host).
  • Adds a new SVG chart visualizing the throughput gap between local and perf-lab environments.

Reviewed changes

Copilot reviewed 1 out of 4 changed files in this pull request and generated 1 comment.

File Description
content/post/hidden-cost-rootless-container-networking/index.adoc New post content with benchmark data, analysis, and mitigation guidance.
content/post/hidden-cost-rootless-container-networking/throughput-gap.svg New chart used by the post to summarize throughput differences and ratios.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

summary: 'Our perf-lab shows Quarkus 2x faster than Spring, but a community member only sees 1.19x locally. The culprit: a userspace TCP proxy hidden inside rootless podman.'
image: 'diff-flamegraph.png'
authors:
- Francesco Nigro
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Front matter in other blog posts includes a related: [''] field; this new post omits it. If the site/theme expects consistent front-matter keys (even when empty), add the related field here to match the established pattern across existing posts.

Suggested change
- Francesco Nigro
- Francesco Nigro
related: ['']

Copilot uses AI. Check for mistakes.
| Perf-lab (RHEL, 24,472 TPS) | 87-94% | 5-11% | 0-2% | 0%
|===

`%usr` is time running application code. `%sys` is time in the kernel. On perf-lab, over 85% of CPU goes to the application. Locally, nearly half goes to the kernel — and the application has idle CPU it cannot use. Same application, same clock speed, same workload: **the local environment is burning CPU in the kernel instead of running the app.** We isolated the network path next.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nearly half goes to the kernel
Explain what column you look to draw this conclusion.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

%usris time running application code.%sys` is time in the kernel

It should be there already

Add SVG versions of both diff flamegraphs (clickable from PNGs),
explain _raw_spin_unlock_irqrestore cascade through Agroal connection
pool, reframe Quarkus/Spring asymmetry as CPU budget fraction, and
rename postgres to PostgreSQL throughout.
…clarify Spring's reduced sensitivity to networking gains
@franz1981
Copy link
Copy Markdown
Contributor Author

I have to:

  • add spring numbers as we have disabled the firewall and moved to not using pasta
  • shows what happen locally with 4.3 GHz
  • make clear that the past TPS is both costly from CPU pov and capping the network latency: that makes Quarkus (with pasta) unable to exceed a specific TPS (which surprisingly is the same at 2.3 Ghz and 4.3 GHz)

@franz1981
Copy link
Copy Markdown
Contributor Author

More, I have to reduce the content too, as I see few repetitive parts 🙏

stalep
stalep previously approved these changes Apr 10, 2026
Copy link
Copy Markdown

@stalep stalep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The core narrative and data are strong. The main thing missing perhaps is reproducibility (commands, versions, pinning details) and depth on the latency distribution?
The article currently proves that pasta is the problem, if we could add something for them to reproduce it locally that would be awesome (but not needed for this article if you think it will be too long)..

… clarify nftables overhead reduction benefits
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants