Add ereports for host panic/boot failure#2503
Conversation
The
I would only put ereport type definitions in hubris/drv/gimlet-seq-server/src/main.rs Lines 1637 to 1651 in 33a6acb hubris/drv/psc-seq-server/src/main.rs Lines 1087 to 1094 in 33a6acb If the intended use of the macro wasn't clear in its docs, I should probably go improve that! |
hawkw
left a comment
There was a problem hiding this comment.
Some suggestions about the ereport messages and data contained therein.
I also wonder somewhat if we might want to try to include which BSU (host boot slot) we were booting from in the boot fail (and also panic) messages. See
hubris/task/host-sp-comms/src/main.rs
Lines 954 to 966 in f80c52d
|
@hawkw should be ready for re-review now! I think I've addressed all comments. |
Co-authored-by: Eliza Weisman <eliza@elizas.website>
07e7912 to
5e68f88
Compare
|
@hawkw CI is passing now! |
This PR adds an
ereportfor a host panic, as well as boot failures. This PR does not currently implement the interface necessary to retrieve the panic message, which I will do in a follow-up commit/PR.For @rmustacc, I've chosen the ereport class
hw.host.panicandhw.host.btfail. Open to suggestions on this. Also worth noting, we will truncate any Host Panics or Boot Failures that are each larger thanMAX_HOST_FAIL_MESSAGE_LEN, which is currently 4KiB. I will probably need to figure out how to paginate this for access in a follow up issue, but I wanted to make sure that 4KiB limit wasn't concerning to you (either not enough, or excessive).For @hawkw, is there a guideline for whether
declare_ereportershould be done in the relevant task, or in a central place likelib/ereports? The currenthw.cpuereports are there, butdrv/xxx-seq-servers do it in their own crate (like this PR). I can move this tolib/ereportsif that's the preferred approach.I intend to include the
ttl_ct/panic_lenfields in the host comms API, either having the host include it in the "get" request, or just including it in the send, to ensure that we are actually retrieving the panic expected. We might also crosscheck the SP's boot count, to make sure we aren't getting any extra-stale requests. I'll note that in the follow-up PR.Closes #2140.
Related to #2337.