[ntp] collect debugging information#10489
Conversation
| // which CRDB panics early during control plane startup because the clocks | ||
| // are not synchronized well-enough. We're adding this as part of a | ||
| // two-phase rollout to get around #9290 for now. | ||
| (3, ADD_DEBUG_ENDPOINT), |
There was a problem hiding this comment.
I realise the text above this says not to add new versions. If anyone has a better idea of how to surface all of this information I'm all ears
|
Chatted about this during an update watercooler. The scope of this work will focus on connectivity diagnostics, since every NTP issue to date has essentially been a connectivity problem. The three checks NTP admin should run in an interval (written in Rust rather than shelling out): Can I reach the DNS server? A couple of caveats: the DNS resolution check has to handle the case where the upstream is specified by IP rather than name (the SMF properties allow either), and the ping check should track "last successful" because some customer NTP servers don't respond to ping at all (so a "no" only means something if there was a previous "yes"). Implementation:
To consider: The chrony.conf file should probably just go into the support bundle rather than being logged on an interval |
Closes: #10407