Skip to content

Commit 059ffa9

Browse files
committed
Fix for delete lb and stale lb dsr vfp rules.
1 parent 114e2bb commit 059ffa9

4 files changed

Lines changed: 484 additions & 0 deletions

File tree

Lines changed: 132 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,132 @@
1+
# Stale LB DSR Rules Cleanup
2+
3+
## Overview
4+
5+
This mitigation script automatically detects and removes stale Load Balancer Direct Server Return (LB DSR) rules from VFP (Virtual Filtering Platform) that reference non-existent backend endpoints. It runs continuously to maintain network health by cleaning up orphaned rules that can cause connectivity issues.
6+
7+
## Problem Statement
8+
9+
When backend endpoints are removed or become unavailable, the corresponding LB DSR rules in VFP may not be cleaned up properly. These stale rules can:
10+
- Cause packet routing failures
11+
- Lead to connection timeouts
12+
- Create unnecessary overhead in the networking stack
13+
- Result in traffic being sent to non-existent endpoints
14+
15+
## Solution
16+
17+
The `cleanup-stale-lb-rules.ps1` script:
18+
1. Checks and sets the required registry configuration for LB DSR feature management
19+
2. Continuously monitors VFP LB DSR rules (both IPv4 and IPv6)
20+
3. Compares rule destination IPs (DIPs) against active HNS endpoints
21+
4. Automatically removes rules that reference non-existent endpoints
22+
23+
## Prerequisites
24+
25+
- Windows Server with HNS (Host Network Service) enabled
26+
- VFP control utilities (`vfpctrl.exe`) available
27+
- PowerShell with administrator privileges
28+
- HNS PowerShell module
29+
30+
## Usage
31+
32+
### Running the Script on a Single Node
33+
34+
```powershell
35+
.\cleanup-stale-lb-rules.ps1
36+
```
37+
38+
The script will:
39+
1. Check registry key `HKLM:\SYSTEM\CurrentControlSet\Policies\Microsoft\FeatureManagement\Overrides\140377743`
40+
2. If the key value is 1, set it to 0 and restart the node (this disables PR 13179278 which is causing delete LB RPC calls from KubeProxy to fail with Invalid IP Error - ICM: 719903780)
41+
3. Start a continuous monitoring loop with 10-second intervals
42+
4. Clean up any stale LB DSR rules found
43+
44+
**Note:** This approach fixes issues on a single node. If the issue is widespread across the cluster, deploy the solution using a DaemonSet:
45+
46+
```powershell
47+
kubectl create -f cleanup-stale-lb-rules.yaml
48+
```
49+
50+
This will run the mitigation script as HPC pods on all affected nodes.
51+
52+
### Configuration
53+
54+
You can modify these parameters at the top of the script:
55+
56+
- **`$groups`**: VFP groups to monitor (default: `LB_DSR_IPv4_OUT`, `LB_DSR_IPv6_OUT`)
57+
- **`$refreshIntervalSeconds`**: Time between cleanup iterations (default: 10 seconds)
58+
59+
## How It Works
60+
61+
### 1. Registry Check
62+
The script first ensures the feature flag registry key (140377743) is set to 0. If not, it sets the value and restarts the node.
63+
64+
### 2. Endpoint Collection
65+
- Retrieves all HNS policies
66+
- Extracts endpoint references
67+
- Builds a dictionary of valid endpoint IP addresses
68+
69+
### 3. Rule Validation
70+
For each VFP port and LB DSR group:
71+
- Lists all rules in the `LB_DSR` layer
72+
- Extracts DIP (Destination IP) ranges from each rule
73+
- Compares DIPs against the valid endpoint dictionary
74+
75+
### 4. Cleanup
76+
- Rules with DIPs not found in active endpoints are flagged as stale
77+
- Stale rules are automatically deleted using `vfpctrl /remove-rule`
78+
79+
## Output Examples
80+
81+
### Healthy State
82+
```
83+
All DIP ranges are present in the dictionary.
84+
```
85+
86+
### Stale Rules Detected
87+
```
88+
Missing DIP ranges:
89+
- 10.244.0.25
90+
- fdf5:5d67:b9ce:b28f::13f
91+
Deleting rule : ruleId: ABC123, port: Port1, group: LB_DSR_IPv4_OUT
92+
```
93+
94+
## Monitoring
95+
96+
The script provides color-coded output:
97+
- **Green**: Healthy state, all rules valid
98+
- **Yellow**: Configuration changes or rule deletion in progress
99+
- **Red**: Stale rules detected
100+
- **Cyan**: Status updates and iteration markers
101+
102+
## Important Notes
103+
104+
- The script runs indefinitely until manually stopped (Ctrl+C)
105+
- Node restart may occur on first run if registry configuration is incorrect
106+
- Ensure no legitimate endpoint updates are in progress during cleanup to avoid false positives
107+
- The script requires elevated privileges to modify VFP rules and registry settings
108+
109+
## Troubleshooting
110+
111+
### Script doesn't detect stale rules
112+
- Verify VFP and HNS are functioning correctly
113+
- Check that `vfpctrl.exe` is accessible in the system PATH
114+
- Ensure HNS endpoints are properly registered
115+
116+
### Node restarts unexpectedly
117+
- This is expected behavior if the registry key is not set to 0
118+
- After restart, the script will continue normal operation
119+
120+
### Permission errors
121+
- Run PowerShell as Administrator
122+
- Verify account has rights to modify VFP rules and registry
123+
124+
## Related Documentation
125+
126+
- [VFP Documentation](../../helper/VFP.psm1)
127+
- [HNS Module](../HNS/)
128+
- [Network Health Monitoring](../../networkhealth/)
129+
130+
## Support
131+
132+
For issues or questions, please refer to the main repository documentation or open an issue.
Lines changed: 102 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,102 @@
1+
$groups = @("LB_DSR_IPv4_OUT", "LB_DSR_IPv6_OUT")
2+
$refreshIntervalSeconds = 10
3+
4+
function Get-DipRangesFromRuleText {
5+
param([string[]]$RuleText)
6+
7+
$collect = $false
8+
$dips = @()
9+
10+
foreach ($line in $RuleText) {
11+
12+
# Detect beginning of DIP Range block
13+
if ($line -match "DIP Range") {
14+
$collect = $true
15+
continue
16+
}
17+
18+
# Stop when FlagsEx or another header appears
19+
if ($collect -and $line -match "FlagsEx") {
20+
break
21+
}
22+
23+
# Process lines like:
24+
# { 10.244.0.25 : 53 }
25+
# { fdf5:5d67:b9ce:b28f::13f : 4445 }
26+
if ($collect -and $line.Trim().StartsWith("{")) {
27+
28+
# Remove surrounding { } then trim
29+
$clean = $line.Trim().Trim('{','}').Trim()
30+
# Use regex to extract IP before last " : "
31+
if ($clean -match '(.+)\s*:\s*\d+$') {
32+
$ip = $matches[1].Trim()
33+
$dips += $ip
34+
}
35+
}
36+
}
37+
38+
return $dips
39+
}
40+
41+
$regKeyVal = (Get-ItemProperty -Path "HKLM:\SYSTEM\CurrentControlSet\Policies\Microsoft\FeatureManagement\Overrides" -Name 140377743).140377743
42+
if ($regKeyVal -eq 1) {
43+
Write-Host "Registry keys are not zero. Setting reg key to 0 and restarting the node." -ForegroundColor Yellow
44+
Set-ItemProperty -Path "HKLM:\SYSTEM\CurrentControlSet\Policies\Microsoft\FeatureManagement\Overrides" -Name 140377743 -Value 0 -Type DWORD
45+
Restart-Computer -Force
46+
Start-Sleep -Seconds 30
47+
} else {
48+
Write-Host "Registry keys are zero. Continuing the script." -ForegroundColor Green
49+
}
50+
51+
While($true) {
52+
Write-Host "Waiting for $refreshIntervalSeconds seconds for the next iteration..." -ForegroundColor Cyan
53+
Start-Sleep -Seconds $refreshIntervalSeconds
54+
Write-Host "Starting new iteration to check for stale LB DSR rules..." -ForegroundColor Cyan
55+
$dictDstIPs = @{}
56+
57+
$policies = Get-HnsPolicyList
58+
59+
$endpointIds = $policies.References |
60+
Where-Object { $_ -like "/endpoints/*" } |
61+
ForEach-Object { ($_ -split "/")[-1] } |
62+
Sort-Object -Unique
63+
64+
$endpointIds | ForEach-Object {
65+
$ipAddress = (Get-HnsEndpoint -Id $_).IPAddress
66+
if ($ipAddress -ne $null) {
67+
$dictDstIPs[$ipAddress] = $true
68+
}
69+
$ipv6Address = (Get-HnsEndpoint -Id $_).IPv6Address
70+
if ($ipv6Address -ne $null) {
71+
$dictDstIPs[$ipv6Address] = $true
72+
}
73+
}
74+
75+
$ports = (vfpctrl.exe /list-vmswitch-port /format 1 | ConvertFrom-Json).Ports.Name
76+
foreach ($port in $ports) {
77+
foreach ($group in $groups) {
78+
$rules = (vfpctrl /port $port /layer LB_DSR /group $group /list-rule /format 1 | ConvertFrom-Json).Rules
79+
foreach ($rule in $rules) {
80+
$ruleId = $rule.Id
81+
$ruleText = vfpctrl /get-rule-info /port $port /layer LB_DSR /group $group /rule $ruleId 2>&1
82+
if (-not $ruleText) {
83+
Write-Host "No output from vfpctrl"
84+
continue
85+
}
86+
87+
$dips = Get-DipRangesFromRuleText -RuleText $ruleText
88+
# Check which DIPs are missing in the dictionary
89+
$missingDIPs = $dips | Where-Object { -not $dictDstIPs.ContainsKey($_) }
90+
91+
if ($missingDIPs.Count -eq 0) {
92+
Write-Host "All DIP ranges are present in the dictionary." -ForegroundColor Green
93+
} else {
94+
Write-Host "Missing DIP ranges:" -ForegroundColor Red
95+
$missingDIPs | ForEach-Object { Write-Host " - $_" }
96+
Write-Host "Deleting rule : ruleId: $ruleId, port: $port, group: $group" -ForegroundColor Yellow
97+
vfpctrl /remove-rule /port $port /layer LB_DSR /group $group /rule $ruleId
98+
}
99+
}
100+
}
101+
}
102+
}

0 commit comments

Comments
 (0)