Skip to content

[REVIEW] container-security: add Windows scheduling and HostProcess identity evidence gates #2555

@hgm1111

Description

@hgm1111

Skill Being Reviewed

Skill name: container-security
Skill path: skills/cloud/container-security/

False Positive Analysis

Benign code that triggers a false positive:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: windows-api
  namespace: apps
spec:
  replicas: 2
  selector:
    matchLabels:
      app: windows-api
  template:
    metadata:
      labels:
        app: windows-api
    spec:
      os:
        name: windows
      securityContext:
        windowsOptions:
          runAsUserName: "ContainerUser"
      nodeSelector:
        kubernetes.io/os: windows
      containers:
        - name: api
          image: mcr.microsoft.com/windows/servercore/iis:windowsservercore-ltsc2022
          ports:
            - containerPort: 80

Why this is a false positive:

The current Pod Security Standards quick reference in SKILL.md is Linux-centric: it lists allowPrivilegeEscalation, seccompProfile, Linux capabilities, and non-root UID-style controls without an OS-specific branch. For a Windows pod with spec.os.name: windows, Kubernetes treats several Linux security context fields differently or rejects them outright. A review that blindly requires Linux-only fields such as seccomp or Linux capabilities would incorrectly mark a valid Windows workload as non-compliant, and a remediation that adds those fields can break admission for Windows pods.

The skill should branch on spec.os.name and, for Windows workloads, validate Windows-specific controls such as windowsOptions.runAsUserName, HostProcess usage, node scheduling, Windows build compatibility, and gMSA authorization instead of forcing Linux-only hardening.

Coverage Gaps

Missed variant 1: HostProcess pod identity and scheduling context

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: windows-node-agent
  namespace: kube-system
spec:
  selector:
    matchLabels:
      app: windows-node-agent
  template:
    metadata:
      labels:
        app: windows-node-agent
    spec:
      os:
        name: windows
      hostNetwork: true
      serviceAccountName: windows-node-agent
      securityContext:
        windowsOptions:
          hostProcess: true
          runAsUserName: "NT AUTHORITY\\SYSTEM"
      containers:
        - name: collector
          image: ghcr.io/example/windows-node-agent:v1.0.0

Why it should be caught:

cis-benchmarks.md currently says only to check windowsOptions.hostProcess: true. That catches the flag, but it does not require the evidence needed to judge the risk:

  • whether hostProcess is set at the pod level or container level
  • whether hostNetwork: true is present as required for HostProcess pods
  • which Windows identity is used through runAsUserName
  • whether NT AUTHORITY\SYSTEM is justified or a lower-privilege account such as LocalService / NetworkService / a local user group would work
  • whether the workload is intentionally isolated to Windows nodes through nodeSelector, tolerations, or RuntimeClass
  • whether the service account / RBAC grants match the host-level capability of the workload

HostProcess containers run with host access and have much weaker isolation than ordinary Windows containers. The skill should classify HostProcess findings by identity and scheduling evidence, not just by the presence of the hostProcess flag.

Missed variant 2: spec.os.name: windows without effective Windows node placement

apiVersion: batch/v1
kind: CronJob
metadata:
  name: windows-maintenance
spec:
  schedule: "0 2 * * *"
  jobTemplate:
    spec:
      template:
        spec:
          os:
            name: windows
          containers:
            - name: job
              image: mcr.microsoft.com/windows/servercore:ltsc2022
              command: ["powershell.exe", "-File", "maintenance.ps1"]
          restartPolicy: OnFailure

Why it should be caught:

The Kubernetes scheduler does not use spec.os.name to place pods on matching nodes. Windows workloads need explicit placement evidence, such as nodeSelector: kubernetes.io/os: windows, matching taints/tolerations, or a RuntimeClass whose scheduling section targets Windows nodes. Without that evidence, the manifest can fail at runtime or bypass the intended Windows node pool controls.

This is especially important for Helm charts and Kustomize overlays, where a chart may set spec.os.name: windows but leave node placement to values. The review should require rendered-manifest evidence or values evidence for Windows placement.

Missed variant 3: gMSA credential spec authorization

apiVersion: apps/v1
kind: Deployment
metadata:
  name: domain-integrated-api
spec:
  template:
    spec:
      os:
        name: windows
      serviceAccountName: app-sa
      securityContext:
        windowsOptions:
          gmsaCredentialSpecName: payroll-api-gmsa
          runAsUserName: "DOMAIN\\payroll-api$"
      nodeSelector:
        kubernetes.io/os: windows
      containers:
        - name: api
          image: ghcr.io/example/payroll-api:2.4.1

Why it should be caught:

Windows pods can use Group Managed Service Accounts for domain access. That creates a separate identity and authorization surface: the cluster needs the GMSA CRD, mutating/validating webhooks, and RBAC that authorizes the pod's service account to use the referenced credential spec. The current skill covers Kubernetes RBAC and secrets generally, but does not prompt reviewers to verify GMSA credential-spec authorization or whether a Windows workload is obtaining domain privileges beyond its intended scope.

Edge Cases

  • windowsOptions can be defined at pod level or container level; container-level values override pod-level values. The review should inspect both regular containers and init containers.
  • HostProcess pods are not just "privileged containers for Windows"; they require Windows-specific evidence including hostNetwork, runAsUserName, Windows node placement, and a justification for the selected Windows account.
  • For Windows pods, remediation that adds Linux-only fields such as seccomp, Linux capabilities, or numeric runAsUser can break the manifest rather than harden it.
  • spec.os.name is useful evidence for Pod Security Standards evaluation, but it is not scheduling evidence. Node selector, taints/tolerations, or RuntimeClass scheduling must be checked separately.
  • Windows Server build compatibility matters when multiple Windows node versions exist in the same cluster; node.kubernetes.io/windows-build or RuntimeClass scheduling evidence can prevent workloads landing on incompatible nodes.

Remediation Quality

  • Fix resolves the vulnerability
  • Fix doesn't introduce new security issues
  • Fix doesn't break functionality
  • Issues found: Existing remediation is strong for Linux Kubernetes hardening, but it should add an explicit Windows branch. For Windows workloads, recommend runAsUserName, node placement evidence, gMSA authorization checks, and HostProcess-specific least-privilege guidance. Avoid recommending Linux-only fields for pods with spec.os.name: windows.

Suggested remediation additions:

  1. Add a "Windows workload branch" to Pod Security Standards evaluation:
    • if spec.os.name: windows, do not require Linux-only seccomp/capability/allowPrivilegeEscalation evidence
    • require Windows identity evidence through windowsOptions.runAsUserName
    • require node placement evidence via nodeSelector, tolerations, or RuntimeClass
  2. Expand CIS 5.2.11 from "check for hostProcess: true" to a HostProcess evidence matrix:
    • pod-level and container-level windowsOptions.hostProcess
    • hostNetwork: true
    • runAsUserName identity and privilege level
    • service account / RBAC scope
    • dedicated namespace and privileged PSA exception justification
    • Windows node placement and Windows build compatibility
  3. Add gMSA checks:
    • gmsaCredentialSpecName / gmsaCredentialSpec usage
    • GMSA webhook presence
    • RBAC authorizing only approved service accounts to use each credential spec
    • domain privilege review for the selected account

Comparison to Other Tools

Tool Catches this? Notes
Semgrep Partial Custom YAML rules can find hostProcess, missing nodeSelector, or gmsaCredentialSpecName, but cross-field reasoning and OS-specific false-positive handling need custom policy logic.
CodeQL No/Partial CodeQL is not the natural fit for Kubernetes manifest policy evaluation.
Checkov / Trivy / Kubescape Partial These can detect many Kubernetes misconfigurations, but Windows-specific PSS branching, HostProcess identity severity, and gMSA authorization usually need policy tuning.
Kyverno / OPA Gatekeeper Yes/Partial Admission policies can enforce these checks, but the skill should still guide reviewers to collect the right evidence and avoid Linux-only false positives.

Overall Assessment

Strengths:

  • Broad Docker and Kubernetes coverage with concrete CIS / NIST mapping.
  • Good discovery patterns for Dockerfiles, manifests, Helm, Kustomize, and RBAC resources.
  • Helpful common pitfalls for init containers, Helm overrides, default namespaces, NetworkPolicy behavior, and secrets.

Needs improvement:

  • Add OS-specific Pod Security Standards handling for Windows pods.
  • Expand HostProcess review beyond a single hostProcess: true flag.
  • Require effective Windows scheduling evidence instead of treating spec.os.name as sufficient.
  • Add gMSA credential-spec authorization checks for domain-integrated Windows workloads.

Priority recommendations:

  1. Add a Windows PSS branch that avoids Linux-only false positives and checks windowsOptions instead.
  2. Add a HostProcess evidence matrix covering identity, hostNetwork, RBAC, namespace exception, and Windows node placement.
  3. Add Windows placement and gMSA checks to the output format so reviewers record evidence consistently.

References:

Bounty Info

  • I have read and agree to the CONTRIBUTING.md bounty terms
  • Preferred payment method: GitHub Sponsors

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions