Package: amazon-ec2-net-utils 2.7.1-1.amzn2023.0.1 (bug confirmed present in main / 2.7.2)
OS: Amazon Linux 2023
Upstream: https://github.com/amazonlinux/amazon-ec2-net-utils
Severity: High: causes 100% CPU saturation on long-running ECS hosts running IDS/IPS software
Discovered: 2026-03-27
Contributors: Dinko Dermendzhiev, William Pharr, Jonathan Clark
Summary
setup-policy-routes start contains an infinite sleep 0.1 loop waiting for an ENI's sysfs node to appear (bin/setup-policy-routes.sh#L53-L59). If the ENI is detached before the sysfs node appears, the process loops forever holding a per-ENI lockfile. Any refresh invocation for the same ENI spins for up to 1000 seconds trying to acquire the lock, then exits, but the per-ENI timer immediately fires a new refresh, repeating the cycle. Over time, every ECS task lifecycle event that races start against ENI detach adds a permanently stuck process. On a host with heavy ECS task churn this accumulates to hundreds or thousands of processes, growing linearly with the number of failed ECS tasks. The stuck processes themselves are not CPU-intensive as each is blocked in sleep 0.1, but they generate a continuous stream of syscalls (at least one per process every 100ms). eBPF-based security sensors that instrument the kernel at the syscall level intercept these events, and at sufficient process counts the sensor's processing pipelines are fully saturated by this legitimate (from the sensor's perspective) syscall telemetry. The sensor ends up consuming 80–100% CPU doing exactly what it was designed to do, thus masking the root cause.
Affected Files
The Bug
start)
register_networkd_reloader # acquires per-ENI lockfile
counter=0
while [ ! -e "/sys/class/net/${iface}" ]; do
if ((counter % 1000 == 0)); then
debug "Waiting for sysfs node to exist for ${iface} (iteration $counter)"
fi
sleep 0.1
((counter++))
done
/lib/systemd/systemd-networkd-wait-online -i "$iface"
do_setup
;;
No timeout. If the ENI is detached before the sysfs node appears, this loop runs indefinitely at 0.1s intervals, holding the lockfile for the lifetime of the host.
register_networkd_reloader() acquires a per-ENI lockfile at /run/amazon-ec2-net-utils/setup-policy-routes/<iface> using noclobber (set globally at the top of the script via set -eo pipefail -o noclobber -o nounset):
register_networkd_reloader() {
local -i registered=1 cnt=0
local -i max=10000
local -r lockfile="${lockdir}/${iface}"
...
while [ $cnt -lt $max ]; do
cnt+=1
2>/dev/null echo $$ > "${lockfile}" # fails if file exists (noclobber)
registered=$?
[ $registered -eq 0 ] && break
sleep 0.1 # 10,000 * 0.1s = up to 1000 seconds
if (( $cnt % 100 == 0 )); then
debug "Unable to lock ${iface} after ${cnt} tries."
fi
done
if [ $registered -ne 0 ]; then
error "Unable to lock configuration for ${iface}. Check pid $(cat "${lockfile}")"
exit 1 # ← exits after ~1000s, but the timer immediately fires a new refresh
fi
}
The stuck start process holds the lock indefinitely. There is no check whether the PID in the lockfile is still alive. A kill -0 $lock_pid check would allow recovery from a dead lock owner.
refresh)
register_networkd_reloader
[ -e "/sys/class/net/${iface}" ] || exit 0 # exits immediately if ENI is gone
do_setup
;;
refresh exits immediately if the ENI no longer exists in sysfs, but only after it acquires the lock. Because start holds the lock, refresh spins for up to 1000 seconds in register_networkd_reloader, then calls exit 1. The per-ENI timer (refresh-policy-routes@<eni>.timer, firing every 60s) immediately spawns a new refresh, which spins again. This produces a continuous stream of spinning processes per stuck ENI.
4. udev remove event does not fire reliably for ECS ENIs
The udev rule (udev/99-vpc-policy-routes.rules) calls systemctl disable --now on both the timer and service on clean ENI removal:
SUBSYSTEM=="net", ACTION=="remove", ..., RUN+="/usr/bin/systemctl disable --now refresh-policy-routes@$name.timer policy-routes@$name.service"
On clean detach this would clean up correctly. The bug occurs because ECS ENI detach does not reliably produce a udev remove event before the sysfs node disappears, leaving start stuck in the wait loop with no cleanup path.
Trigger Condition
- ECS attaches ENI → udev
add fires → policy-routes@<eni>.service starts → setup-policy-routes <eni> start
start acquires lockfile, enters infinite sysfs wait loop
- ECS task fails → ENI detached → sysfs node never appears or disappears mid-loop
- udev
remove event does not fire (or fires after start is already stuck) → no cleanup
start loops forever, holding the lockfile
refresh-policy-routes@<eni>.timer fires → refresh spins ~1000s trying to acquire lock → exit 1 → timer fires again → repeat
We observed this sequence when:
- ECS task health check failures causing repeated task replacement
But these scanarios may also cause this (unconfirmed):
- Rapid ECS deployments (rolling updates, blue-green)
- High-frequency autoscaling events
Evidence From Affected Hosts
Many ECS hosts confirmed affect, but here are two examples:
| Host |
Uptime |
Stuck ENIs |
Peak Processes |
Peak Load Avg |
host-A |
9 days |
112 |
~214 |
~107 |
host-B |
14 days |
766 |
1787+ |
414 |
# All ENIs confirmed missing from sysfs
ps aux | grep "setup-policy-routes" | grep -v grep | awk '{print $(NF-1)}' | sort -u | while read iface; do
[ -e "/sys/class/net/$iface" ] && echo "EXISTS: $iface" || echo "MISSING: $iface"
done
# Result: ALL MISSING
# start processes own locks, refresh processes are waiting
ps -eo pid,cmd | grep setup-policy-routes | grep -v grep | while read pid cmd; do
iface=$(echo "$cmd" | awk '{print $(NF-1)}')
action=$(echo "$cmd" | awk '{print $NF}')
lockfile="/run/amazon-ec2-net-utils/setup-policy-routes/$iface"
lock_pid=$(cat "$lockfile" 2>/dev/null)
echo "$action $iface lock_owner=$lock_pid this_pid=$pid $([ "$lock_pid" = "$pid" ] && echo OWNS || echo WAITING)"
done | sort | head -20
# systemd unit count
systemctl list-units 'policy-routes@*' --no-legend | wc -l
Systemd journal (logged every 1000 iterations, i.e. every ~100 seconds per stuck process):
Mar 27 16:21:57 ec2net[3312864]: Waiting for sysfs node to exist for ecse1a2b3c (iteration 0)
Mar 27 16:23:44 ec2net[3312864]: Waiting for sysfs node to exist for ecse1a2b3c (iteration 1000)
Mar 27 16:25:31 ec2net[3312864]: Waiting for sysfs node to exist for ecse1a2b3c (iteration 2000)
[repeating indefinitely]
Impact
- Direct: Load average 414 on a host with 8 vCPUs. Host effectively unusable.
- Indirect: Each stuck process issues a
nanosleep syscall every 100ms. eBPF-based security sensors instrument the kernel at the syscall level and intercept every one of these events. At sufficient process counts the sensor's kernel probe handlers and userspace event pipelines are fully saturated processing what is, from the sensor's perspective, legitimate telemetry. The symptom presents as the security sensor consuming 80%+ CPU while doing exactly what it was designed to do, masking the root cause.
- Silent accumulation: Count grows with uptime × ECS deploy frequency. A host may take days or weeks to saturate. By the time CPU spikes, hundreds of units are stuck.
Proposed Fix
This is the root cause fix. Without it, Fix 2 alone has no effect because the stuck start process is alive. Its lock is not stale, so the dead-lock check in register_networkd_reloader never triggers.
A timeout of 5 minutes (max_wait=3000, i.e. 3000 × 0.1s) is conservative enough to not false-positive on a slow or congested host while still bounding accumulation to one stuck process per ENI rather than an indefinitely running one:
start)
register_networkd_reloader
local -i counter=0
local -i max_wait=3000 # 5 minute timeout (3000 * 0.1s)
while [ ! -e "/sys/class/net/${iface}" ]; do
if ((counter % 1000 == 0)); then
debug "Waiting for sysfs node to exist for ${iface} (iteration $counter)"
fi
sleep 0.1
((counter++))
if ((counter >= max_wait)); then
error "Timed out waiting for sysfs node for ${iface} after ${counter} iterations, giving up"
exit 1
fi
done
...
;;
Note: the timeout value is a judgment call. 5 minutes is generous; on a healthy host the sysfs node appears in milliseconds. Maybe there is a documented SLA for how quickly a newly-attached ENI appears in sysfs. If the upstream maintainers have data suggesting shorter is safe, a tighter value is fine.
Fix 2 (secondary): Deadlock detection in register_networkd_reloader -> lib/lib.sh#L628
After Fix 1, start exits on timeout, but it holds the lockfile until exit. A refresh that was already mid-spin waiting for the lock may then acquire it and run do_setup for a non-existent ENI. Adding a stale-lock check lets any subsequent invocation recover immediately rather than inheriting the full spin period:
# Check if existing lock owner is still alive; if not, remove stale lock
local -r lockfile="${lockdir}/${iface}"
if [ -f "${lockfile}" ]; then
existing_pid=$(cat "${lockfile}" 2>/dev/null)
if [ -n "$existing_pid" ] && ! kill -0 "$existing_pid" 2>/dev/null; then
debug "Removing stale lock from dead process $existing_pid for ${iface}"
rm -f "${lockfile}"
fi
fi
Temporary Workaround (for affected running hosts)
Does not persist across reboots. Replace the instance for a permanent fix.
pkill does not work as systemd respawns processes immediately (Restart=on-failure on policy-routes@.service, per-ENI timers on refresh-policy-routes@.timer). Must stop and mask via systemd.
# 1. Stop services and timers
systemctl stop 'policy-routes@*.service'
systemctl stop 'refresh-policy-routes@*.service'
systemctl stop 'refresh-policy-routes@*.timer'
# 2. Mask to prevent respawn of known units
systemctl mask 'policy-routes@*.service' 2>/dev/null || \
systemctl list-units 'policy-routes@*.service' --no-legend | awk '{print $1}' | xargs systemctl mask
systemctl mask 'refresh-policy-routes@*.service' 2>/dev/null || \
systemctl list-units 'refresh-policy-routes@*.service' --no-legend | awk '{print $1}' | xargs systemctl mask
systemctl mask 'refresh-policy-routes@*.timer' 2>/dev/null || \
systemctl list-units 'refresh-policy-routes@*.timer' --no-legend | awk '{print $1}' | xargs systemctl mask
# 3. Verify
ps aux | grep setup-policy-routes | grep -v grep | wc -l # should be 0
# WARNING: new ECS tasks deployed after masking will spawn new unmasked units.
# Do NOT mask the templates (policy-routes@.service etc.) as that disables ENI
# routing for all new ECS tasks.
Note: refresh-policy-routes@.service has SuccessExitStatus=SIGTERM, so systemctl stop (which sends SIGTERM) exits cleanly.
After masking, you will still find a high unit count. Masked/stopped units remain as inactive records in systemd state. The count reflects accumulated history, not active processes. To zero it: replace the instance.
Detection
On-instance:
systemctl list-units 'policy-routes@*' --no-legend | wc -l
# 1 = healthy (one active task). >1 = accumulation in progress.
Fleet-wide (via SSM Run Command or equivalent remote execution):
# 1. Check unit count per host (>1 = accumulation in progress)
systemctl list-units 'policy-routes@*' --no-legend | wc -l
# 2. Confirm stuck processes are looping against missing ENIs
ps aux | grep "setup-policy-routes" | grep -v grep | awk '{print $(NF-1)}' | sort -u | while read iface; do
[ -e "/sys/class/net/$iface" ] && echo "EXISTS: $iface" || echo "MISSING: $iface"
done
# 3. Check host load average
uptime
Thresholds observed:
- 112 units → load avg ~107, IDS/IPS agent at 80% CPU (9 days uptime)
- 766 units → load avg 414, IDS/IPS agent at 80% CPU (14 days uptime)
References
Package:
amazon-ec2-net-utils 2.7.1-1.amzn2023.0.1(bug confirmed present inmain/ 2.7.2)OS: Amazon Linux 2023
Upstream: https://github.com/amazonlinux/amazon-ec2-net-utils
Severity: High: causes 100% CPU saturation on long-running ECS hosts running IDS/IPS software
Discovered: 2026-03-27
Contributors: Dinko Dermendzhiev, William Pharr, Jonathan Clark
Summary
setup-policy-routes startcontains an infinitesleep 0.1loop waiting for an ENI's sysfs node to appear (bin/setup-policy-routes.sh#L53-L59). If the ENI is detached before the sysfs node appears, the process loops forever holding a per-ENI lockfile. Anyrefreshinvocation for the same ENI spins for up to 1000 seconds trying to acquire the lock, then exits, but the per-ENI timer immediately fires a newrefresh, repeating the cycle. Over time, every ECS task lifecycle event that racesstartagainst ENI detach adds a permanently stuck process. On a host with heavy ECS task churn this accumulates to hundreds or thousands of processes, growing linearly with the number of failed ECS tasks. The stuck processes themselves are not CPU-intensive as each is blocked insleep 0.1, but they generate a continuous stream of syscalls (at least one per process every 100ms). eBPF-based security sensors that instrument the kernel at the syscall level intercept these events, and at sufficient process counts the sensor's processing pipelines are fully saturated by this legitimate (from the sensor's perspective) syscall telemetry. The sensor ends up consuming 80–100% CPU doing exactly what it was designed to do, thus masking the root cause.Affected Files
bin/setup-policy-routes.sh-> infinite sysfs wait loop instartactionlib/lib.sh->register_networkd_reloader()lock mechanism with no deadlock detectionThe Bug
1. Infinite sysfs wait loop (
bin/setup-policy-routes.sh#L52-L59)start) register_networkd_reloader # acquires per-ENI lockfile counter=0 while [ ! -e "/sys/class/net/${iface}" ]; do if ((counter % 1000 == 0)); then debug "Waiting for sysfs node to exist for ${iface} (iteration $counter)" fi sleep 0.1 ((counter++)) done /lib/systemd/systemd-networkd-wait-online -i "$iface" do_setup ;;No timeout. If the ENI is detached before the sysfs node appears, this loop runs indefinitely at 0.1s intervals, holding the lockfile for the lifetime of the host.
2. Lock never released (
lib/lib.sh#L628-L664)register_networkd_reloader()acquires a per-ENI lockfile at/run/amazon-ec2-net-utils/setup-policy-routes/<iface>usingnoclobber(set globally at the top of the script viaset -eo pipefail -o noclobber -o nounset):The stuck
startprocess holds the lock indefinitely. There is no check whether the PID in the lockfile is still alive. Akill -0 $lock_pidcheck would allow recovery from a dead lock owner.3.
refreshcycle (bin/setup-policy-routes.sh#L44-L49)refresh) register_networkd_reloader [ -e "/sys/class/net/${iface}" ] || exit 0 # exits immediately if ENI is gone do_setup ;;refreshexits immediately if the ENI no longer exists in sysfs, but only after it acquires the lock. Becausestartholds the lock,refreshspins for up to 1000 seconds inregister_networkd_reloader, then callsexit 1. The per-ENI timer (refresh-policy-routes@<eni>.timer, firing every 60s) immediately spawns a newrefresh, which spins again. This produces a continuous stream of spinning processes per stuck ENI.4. udev remove event does not fire reliably for ECS ENIs
The udev rule (
udev/99-vpc-policy-routes.rules) callssystemctl disable --nowon both the timer and service on clean ENI removal:On clean detach this would clean up correctly. The bug occurs because ECS ENI detach does not reliably produce a udev
removeevent before the sysfs node disappears, leavingstartstuck in the wait loop with no cleanup path.Trigger Condition
addfires →policy-routes@<eni>.servicestarts →setup-policy-routes <eni> startstartacquires lockfile, enters infinite sysfs wait loopremoveevent does not fire (or fires afterstartis already stuck) → no cleanupstartloops forever, holding the lockfilerefresh-policy-routes@<eni>.timerfires →refreshspins ~1000s trying to acquire lock →exit 1→ timer fires again → repeatWe observed this sequence when:
But these scanarios may also cause this (unconfirmed):
Evidence From Affected Hosts
Many ECS hosts confirmed affect, but here are two examples:
host-Ahost-BSystemd journal (logged every 1000 iterations, i.e. every ~100 seconds per stuck process):
Impact
nanosleepsyscall every 100ms. eBPF-based security sensors instrument the kernel at the syscall level and intercept every one of these events. At sufficient process counts the sensor's kernel probe handlers and userspace event pipelines are fully saturated processing what is, from the sensor's perspective, legitimate telemetry. The symptom presents as the security sensor consuming 80%+ CPU while doing exactly what it was designed to do, masking the root cause.Proposed Fix
Fix 1 (primary): Add timeout to sysfs wait loop ->
bin/setup-policy-routes.sh#L52This is the root cause fix. Without it, Fix 2 alone has no effect because the stuck
startprocess is alive. Its lock is not stale, so the dead-lock check inregister_networkd_reloadernever triggers.A timeout of 5 minutes (
max_wait=3000, i.e. 3000 × 0.1s) is conservative enough to not false-positive on a slow or congested host while still bounding accumulation to one stuck process per ENI rather than an indefinitely running one:start) register_networkd_reloader local -i counter=0 local -i max_wait=3000 # 5 minute timeout (3000 * 0.1s) while [ ! -e "/sys/class/net/${iface}" ]; do if ((counter % 1000 == 0)); then debug "Waiting for sysfs node to exist for ${iface} (iteration $counter)" fi sleep 0.1 ((counter++)) if ((counter >= max_wait)); then error "Timed out waiting for sysfs node for ${iface} after ${counter} iterations, giving up" exit 1 fi done ... ;;Note: the timeout value is a judgment call. 5 minutes is generous; on a healthy host the sysfs node appears in milliseconds. Maybe there is a documented SLA for how quickly a newly-attached ENI appears in sysfs. If the upstream maintainers have data suggesting shorter is safe, a tighter value is fine.
Fix 2 (secondary): Deadlock detection in
register_networkd_reloader->lib/lib.sh#L628After Fix 1,
startexits on timeout, but it holds the lockfile until exit. Arefreshthat was already mid-spin waiting for the lock may then acquire it and rundo_setupfor a non-existent ENI. Adding a stale-lock check lets any subsequent invocation recover immediately rather than inheriting the full spin period:Temporary Workaround (for affected running hosts)
Does not persist across reboots. Replace the instance for a permanent fix.
pkilldoes not work as systemd respawns processes immediately (Restart=on-failureonpolicy-routes@.service, per-ENI timers onrefresh-policy-routes@.timer). Must stop and mask via systemd.Note:
refresh-policy-routes@.servicehasSuccessExitStatus=SIGTERM, sosystemctl stop(which sends SIGTERM) exits cleanly.After masking, you will still find a high unit count. Masked/stopped units remain as inactive records in systemd state. The count reflects accumulated history, not active processes. To zero it: replace the instance.
Detection
On-instance:
Fleet-wide (via SSM Run Command or equivalent remote execution):
Thresholds observed:
References
bin/setup-policy-routes.sh— infinite sysfs wait loop at line 53lib/lib.sh—register_networkd_reloader()at line 628udev/99-vpc-policy-routes.rules— remove event handlersystemd/system/policy-routes@.servicesystemd/system/refresh-policy-routes@.timeramazon-ec2-net-utils 2.7.1-1.amzn2023.0.1(bug present in main/2.7.2)BindsTo=%i.deviceto stop units when a device disappears. This would not fix this bug:BindsToonly triggers when a device unit goes away, but in this race the ENI is detached before the sysfs node ever appears, so no device unit exists for systemd to bind to. Mentioned as related context on unit lifecycle cleanup.