Description
Context
I've been using the Balloons plugin for quite some time and it does a great job pinning containers to CPUs.
Currently I couldn't find the way to manage interrupt affinity or other sources of interference on those cores using only NRI plugins, or more concrete Balloons.
My use case is HPC-like workloads — specifically single-threaded, busy-looping when doing nothing, running continuously without preemption (or that's the idea) —, and for that pinning alone is not enough.
Currently I handle this out-of-band via kernel cmdline:
# Example
isolcpus=nohz,domain,managed_irq,1-23,25-47 nohz_full=1-23,25-47 irqaffinity=0,24 rcu_nocbs=1-23,25-47 mitigations=off
This is close to working as expected - we are exploring the use of a kernel patch to remove the ticks/kick -, but it's fully manual, static, requires to update the bootline, and decoupled from whatever Balloons decides at runtime.
Proposal/request for guidance
An "exclusive" mode for balloon types — either via a new annotation or a config flag — that signals the plugin to treat those CPUs as fully isolated. This would ideally cover:
- Moving IRQ affinity away from balloon CPUs (
/proc/irq/*/smp_affinity)
- Suppressing or relocating any other kernel interference (timers, RCU callbacks, etc.)
Any guidance on the recommended approach — or confirmation of what Balloons is and isn't expected to own — would already be valuable.
Environment
Cc @klihub @kad
Rationale
Without this, users targeting HPC-like or real-time workloads need to statically pre-configure the kernel bootline independently of the plugin, with no coordination between the two layers, to not mention that the boot config has to be re-signed when doing secure-boot.
Our process is extremely sensitive to any interruption — even a single IRQ landing on its core causes a measurable latency spike. We have a tight running budget and a predefined execution deadline.
Description
Context
I've been using the Balloons plugin for quite some time and it does a great job pinning containers to CPUs.
Currently I couldn't find the way to manage interrupt affinity or other sources of interference on those cores using only NRI plugins, or more concrete Balloons.
My use case is HPC-like workloads — specifically single-threaded, busy-looping when doing nothing, running continuously without preemption (or that's the idea) —, and for that pinning alone is not enough.
Currently I handle this out-of-band via kernel cmdline:
# Example isolcpus=nohz,domain,managed_irq,1-23,25-47 nohz_full=1-23,25-47 irqaffinity=0,24 rcu_nocbs=1-23,25-47 mitigations=offThis is close to working as expected - we are exploring the use of a kernel patch to remove the ticks/kick -, but it's fully manual, static, requires to update the bootline, and decoupled from whatever Balloons decides at runtime.
Proposal/request for guidance
An "exclusive" mode for balloon types — either via a new annotation or a config flag — that signals the plugin to treat those CPUs as fully isolated. This would ideally cover:
/proc/irq/*/smp_affinity)Any guidance on the recommended approach — or confirmation of what Balloons is and isn't expected to own — would already be valuable.
Environment
Cc @klihub @kad
Rationale
Without this, users targeting HPC-like or real-time workloads need to statically pre-configure the kernel bootline independently of the plugin, with no coordination between the two layers, to not mention that the boot config has to be re-signed when doing secure-boot.
Our process is extremely sensitive to any interruption — even a single IRQ landing on its core causes a measurable latency spike. We have a tight running budget and a predefined execution deadline.