Skip to content

Commit 61d507e

Browse files
authored
Add Kubernetes Pod Disruption Budget (PDB) Violation Rule (#99)
* Add Kubernetes Pod Disruption Budget (PDB) Violation Rule - Introduced a new rule (CRE-2025-0108) to detect violations of Kubernetes Pod Disruption Budgets during rolling updates, including detailed descriptions of causes, impacts, and mitigation strategies. - Added a test log file demonstrating various scenarios of PDB violations and their consequences. * fix tags issue * Corrected tag formatting for operational error in Kubernetes PDB violation rule * fix cre number
1 parent d3b0b55 commit 61d507e

2 files changed

Lines changed: 349 additions & 0 deletions

File tree

Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,101 @@
1+
rules:
2+
- cre:
3+
id: CRE-2025-0108
4+
severity: 1
5+
title: "Kubernetes Pod Disruption Budget (PDB) Violation During Rolling Updates"
6+
category: kubernetes-problem
7+
author: Community
8+
description: |
9+
During rolling updates, when a deployment's maxUnavailable setting conflicts with
10+
a Pod Disruption Budget's minAvailable requirement, it can cause service outages
11+
by terminating too many pods simultaneously, violating the availability guarantees.
12+
This can also occur during node drains, cluster autoscaling, or maintenance operations.
13+
cause: |
14+
- Deployment's rolling update strategy sets maxUnavailable higher than PDB allows
15+
- PDB requires minAvailable pods but rolling update violates this constraint
16+
- Concurrent pod terminations exceed the allowed disruption threshold
17+
- Deployment configuration conflicts with PDB policy
18+
- Node drains or cluster autoscaling events trigger multiple simultaneous pod evictions
19+
- Resource pressure or node failures force pod relocations
20+
- Maintenance operations affecting multiple nodes simultaneously
21+
impact: |
22+
- Service availability drops below guaranteed minimum
23+
- Potential service outages during rolling updates
24+
- Load balancer health checks may fail
25+
- Cascading failures in dependent services
26+
- Degraded application performance
27+
- Extended recovery time due to pod rescheduling constraints
28+
- Potential deadlock in deployment rollouts
29+
- Service Level Objective (SLO) violations
30+
impactScore: 8
31+
tags:
32+
- k8s
33+
- known-problem
34+
- misconfiguration
35+
- operational error
36+
- high-availability
37+
mitigation: |
38+
**Immediate Actions:**
39+
1. Pause the rolling update:
40+
```
41+
kubectl rollout pause deployment/<deployment-name>
42+
```
43+
2. Verify PDB and deployment settings:
44+
```
45+
kubectl get pdb
46+
kubectl get deployment <deployment-name> -o yaml
47+
```
48+
3. Adjust maxUnavailable to respect PDB:
49+
```
50+
kubectl patch deployment/<deployment-name> -p '{"spec":{"strategy":{"rollingUpdate":{"maxUnavailable":"1"}}}}'
51+
```
52+
4. Check node conditions and drain status:
53+
```
54+
kubectl get nodes
55+
kubectl get pods -o wide
56+
```
57+
58+
**Long-term fixes:**
59+
- Ensure deployment's maxUnavailable setting respects PDB requirements
60+
- Implement pre-deployment validation checks
61+
- Use progressive delivery (canary/blue-green) for critical services
62+
- Monitor PDB violations through metrics/alerts
63+
- Configure cluster autoscaler to respect PDBs
64+
- Implement node maintenance windows
65+
- Use pod anti-affinity to spread critical workloads
66+
- Set up automated rollback triggers on PDB violations
67+
mitigationScore: 7
68+
references:
69+
- https://kubernetes.io/docs/tasks/run-application/configure-pdb/
70+
- https://kubernetes.io/docs/concepts/workloads/pods/disruptions/
71+
- https://kubernetes.io/docs/tasks/administer-cluster/safely-drain-node/
72+
- https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/
73+
applications:
74+
- name: kubernetes
75+
version: ">=1.25.0"
76+
metadata:
77+
kind: prequel
78+
id: 6XwKpQmNzRvTyLsHdBgFcA
79+
gen: 1
80+
rule:
81+
set:
82+
event:
83+
source: cre.log.kubernetes
84+
window: 5m
85+
match:
86+
- regex: "Warning\\s+PodDisruptionBudgetViolation.+Pod disruption budget violation detected: maxUnavailable: \\d+ conflicts with minAvailable: \\d+"
87+
- regex: "Normal\\s+Killing\\s+pod/.+\\s+Stopping container"
88+
- regex: "Warning\\s+Unhealthy\\s+pod/.+\\s+Readiness probe failed"
89+
- regex: "Warning\\s+FailedScheduling.+cannot enforce pod-disruption-budget.+"
90+
- regex: "Warning\\s+EvictionWarning.+evicting pod.+disruption budget denied"
91+
- regex: "Warning\\s+NodeDrainPDBViolation.+cannot drain node.+PDB violation"
92+
sequence:
93+
window: 10m
94+
event:
95+
source: cre.log.kubernetes
96+
order:
97+
- regex: "Normal\\s+ScalingReplicaSet.+Scaled up replica set.+"
98+
- regex: "Warning\\s+PodDisruptionBudgetViolation.+"
99+
count: 3
100+
- regex: "Warning\\s+Unhealthy\\s+pod/.+\\s+Readiness probe failed"
101+
count: 2

0 commit comments

Comments
 (0)