Skip to content

client: increase default latency probe interval from 30s to 5m#3532

Merged
ben-dz merged 2 commits intomainfrom
bdz/slower-cli-latency-interval
Apr 15, 2026
Merged

client: increase default latency probe interval from 30s to 5m#3532
ben-dz merged 2 commits intomainfrom
bdz/slower-cli-latency-interval

Conversation

@ben-dz
Copy link
Copy Markdown
Contributor

@ben-dz ben-dz commented Apr 14, 2026

Summary of Changes

  • Increase the default -probe-interval for doublezerod from 30s to 300s (5 minutes), reducing steady-state ICMP control-plane load on DZDs by ~10x. With ~1700 clients each pinging ~90 devices, the old 30s interval contributed ~170+ ICMP pps per device — enough to compete with TWAMP telemetry packets under Arista COPP rate-limiting, causing spurious asymmetric link-down events.
  • When the first probe finds no reachable devices (e.g. daemon starts before network is ready), retry at a fast interval (min of configured interval and 30s) until a device responds, then switch to the steady-state interval. The first probe still fires immediately on startup, and probeReady is set unconditionally after the first pass so the CLI can proceed.

Diff Breakdown

Category Files Lines (+/-) Net
Core logic 2 +25 / -2 +23
Tests 1 +79 / -0 +79
Docs 1 +5 / -0 +5
Generated 1 +2 / -0 +2

Mostly test coverage for the new fast-retry behavior; core logic change is compact.

Key files (click to expand)

Testing Verification

  • New unit test TestLatencyManager_FastRetryWhenUnreachable verifies: probeReady is set after first probe even when all unreachable; second probe fires at the fast interval (~30s) rather than the steady-state interval (1h in test); probe count advances as expected
  • All existing latency manager tests pass (14/14)
  • E2E tests unaffected — client entrypoint hardcodes -probe-interval 5
  • Operators can still override with -probe-interval <seconds>

@ben-dz ben-dz force-pushed the bdz/slower-cli-latency-interval branch from f381bb2 to 413643f Compare April 14, 2026 12:14
@ben-dz ben-dz marked this pull request as ready for review April 14, 2026 12:14
@ben-dz ben-dz force-pushed the bdz/slower-cli-latency-interval branch 2 times, most recently from 7809d78 to e740e85 Compare April 15, 2026 20:11
When the first latency probe finds no reachable devices (e.g. transient
network issue at startup), keep probing at a fast interval (<=30s)
instead of waiting the full steady-state interval (5m). Once a device
responds, set probeReady and switch to the configured probe interval.
@ben-dz ben-dz force-pushed the bdz/slower-cli-latency-interval branch from e740e85 to 24e918c Compare April 15, 2026 21:01
@ben-dz ben-dz merged commit 558ff24 into main Apr 15, 2026
33 checks passed
@ben-dz ben-dz deleted the bdz/slower-cli-latency-interval branch April 15, 2026 21:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants