resctrl-mon: revert CRI-O pidfile fallback due to security concerns by cmcantalupo · Pull Request #671 · containers/nri-plugins

cmcantalupo · 2026-06-07T15:08:08Z

Summary

Revert the CRI-O pidfile fallback added in f7883b9, and add a runtime
version check that rejects CRI-O < 1.36 at startup.

Context

CRI-O <= 1.35 does not populate Container.Pid in NRI events. The
reverted commit attempted to work around this by reading PID from
CRI-O's internal pidfile. However, enabling this requires a host mount
that violates the principle of least privilege.

The upstream fix (CRI-O commit 19d319695) populates the PID field
directly in NRI and is available in CRI-O >= 1.36.

Impact

Runtime	Effect
containerd	No impact — PID is always provided via NRI
CRI-O >= 1.36	No impact — PID is provided via NRI
CRI-O < 1.36	Plugin refuses to start with a clear error message; DaemonSet enters CrashLoopBackOff

The hard fail is intentional: without PIDs, mon_groups are created but
remain empty, silently reporting zeros for all metrics. This is worse
than a crash because it produces data indistinguishable from "this pod
uses no LLC/bandwidth," leading to incorrect capacity decisions. A loud
crash with a clear log message ("CRI-O 1.35.x does not provide
container PIDs via NRI, requires >= 1.36") gives the admin immediate
actionable feedback.

Security analysis

The current deployment in origin/main is not actively insecure.
The default Helm daemonset template mounts only:

/var/run/nri (NRI socket)
/sys/fs/resctrl (resctrl filesystem)

There is no volume or volumeMount for /run/containers/storage in the
shipped template. Without that mount, readContainerPID() calls
os.ReadFile on a path that does not exist inside the container, gets
ENOENT, and returns 0. The fallback is a complete no-op in the
default deployment — no information is leaked and no error is raised.

However, the code cannot function without the insecure mount, and
the only way to make it function is to add a hostPath volume for
/run/containers/storage. That directory contains:

/run/containers/storage/overlay-containers/<id>/userdata/pidfile
(the target file)
/run/containers/storage/overlay-containers/<id>/userdata/config.json
(full OCI runtime spec including process environment variables)

Mounting this path grants the monitoring plugin read access to secrets
and environment variables of every container on the node. This is
an unacceptable privilege escalation for a read-only monitoring plugin.

In summary: the upstream code is dead code that can only be
activated via an insecure deployment modification. Rather than ship
non-functional code whose only activation path is insecure, we remove
it entirely.

This reverts commit f7883b9 ("resctrl-mon: read container PID from CRI-O pidfile"). The readContainerPID() fallback reads container PIDs from CRI-O's internal storage at /run/containers/storage/overlay-containers/. The default Helm daemonset template does not mount this path, so the function always receives ENOENT and returns 0 — the code is a complete no-op in the shipped deployment and no information is leaked. However, the fallback cannot function without adding a hostPath volume for /run/containers/storage. That directory also contains the full OCI runtime spec (config.json) for every container on the node, including process environment variables and secrets. Mounting it grants the monitoring plugin read access to credentials of all host containers — an unacceptable privilege escalation for a read-only monitoring plugin. Rather than ship non-functional code whose only activation path is insecure, remove it entirely. The fallback was added to work around CRI-O <= 1.35 not populating Container.Pid in NRI events. The proper fix is upstream in CRI-O (commit 19d319695) and will be available in CRI-O >= 1.36. Until then, the plugin is effectively a no-op on affected CRI-O versions: mon_group directories are created but remain empty since no PIDs can be assigned. containerd users are unaffected — PID is always populated via NRI. Signed-off-by: Christopher M. Cantalupo <christopher.m.cantalupo@intel.com>

CRI-O versions before 1.36 do not populate Container.Pid in NRI events (the fix is CRI-O commit 19d319695, not yet in a release). Without a PID, the plugin cannot assign container tasks to monitoring groups, and empty mon_groups silently report zero for all metrics. Rather than degrade silently, fail hard during Configure when the runtime identifies as CRI-O with a version below 1.36. The error message directs the user to upgrade CRI-O or use containerd. This makes the failure mode explicit: the DaemonSet will enter CrashLoopBackOff with a clear log message rather than appearing healthy while producing no useful monitoring data. Signed-off-by: Christopher M. Cantalupo <christopher.m.cantalupo@intel.com>

cmcantalupo added 2 commits June 5, 2026 16:46

cmcantalupo force-pushed the revert-pidfile-fallback branch from a71bccd to 94c0cce Compare June 7, 2026 15:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

resctrl-mon: revert CRI-O pidfile fallback due to security concerns#671

resctrl-mon: revert CRI-O pidfile fallback due to security concerns#671
cmcantalupo wants to merge 2 commits into
containers:mainfrom
cmcantalupo:revert-pidfile-fallback

cmcantalupo commented Jun 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cmcantalupo commented Jun 7, 2026

Summary

Context

Impact

Security analysis

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant