client: increase default latency probe interval from 30s to 5m#3532
Merged
client: increase default latency probe interval from 30s to 5m#3532
Conversation
f381bb2 to
413643f
Compare
snormore
approved these changes
Apr 15, 2026
7809d78 to
e740e85
Compare
When the first latency probe finds no reachable devices (e.g. transient network issue at startup), keep probing at a fast interval (<=30s) instead of waiting the full steady-state interval (5m). Once a device responds, set probeReady and switch to the configured probe interval.
e740e85 to
24e918c
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary of Changes
-probe-intervalfordoublezerodfrom 30s to 300s (5 minutes), reducing steady-state ICMP control-plane load on DZDs by ~10x. With ~1700 clients each pinging ~90 devices, the old 30s interval contributed ~170+ ICMP pps per device — enough to compete with TWAMP telemetry packets under Arista COPP rate-limiting, causing spurious asymmetric link-down events.probeReadyis set unconditionally after the first pass so the CLI can proceed.Diff Breakdown
Mostly test coverage for the new fast-retry behavior; core logic change is compact.
Key files (click to expand)
client/doublezerod/internal/latency/manager_test.go— new test verifying fast-retry when first probe finds no reachable devices, and that probeReady is set regardlessclient/doublezerod/internal/latency/manager.go— addconvergedflag andhasReachable()check to control fast/slow probe interval; addmaxInitialProbeIntervalconstantclient/doublezerod/cmd/doublezerod/main.go— change-probe-intervaldefault from 30 to 300Testing Verification
TestLatencyManager_FastRetryWhenUnreachableverifies: probeReady is set after first probe even when all unreachable; second probe fires at the fast interval (~30s) rather than the steady-state interval (1h in test); probe count advances as expected-probe-interval 5-probe-interval <seconds>