Clean up ereport data flow in bundle collection by smklein · Pull Request #10500 · oxidecomputer/omicron

smklein · 2026-05-27T23:58:14Z

This PR addresses feedback from #10376 (comment)

It performs a few small-but-related changes that clean up the data flow for support bundle collection:

We distinguish between SupportBundleCollectionReport and SupportBundleActivationReport. The "collection report" is the sequence of steps that were involved in collecting the bundle itself. The activation report includes that, but also auxiliary information about "storing and activating the bundle" durably within Nexus. By distinguishing these two, we can (and now, do) store the "collection report" within the bundle itself. For context: See write_report_file.
We no longer treat "ereport status" as a special case in the bundle collection steps, nor in parsing the collection report itself. Instead, the CollectionStepOutput enum adds a variant called Details that allows any JSON value. We convert ereports to use this, and pretty-print that JSON. This allows parsers of bundle output - like omdb - to only act on the generic "bundle step" structures, rather than embedding knowledge about more step-specific output (like ereports were doing before).

Replaces `BundleCollection.bundle: SupportBundle` with a slim `BundleInfo { id, reason_for_creation }`. Moves the sled-storage chunked transfer (`store_bundle_on_sled`), zip helpers (`bundle_to_zipfile`, `recursively_add_directory_to_zipfile`, `sha2_hash`), the `CHUNK_SIZE` and `TEMPDIR` constants, and the DB-polling cancellation (`check_for_cancellation`) out of the inner `support_bundle/` module and into `support_bundle_collector.rs`. After this change the inner layer is a pure mechanism: it never reads the `support_bundle` DB row, never talks to a sled-agent's bundle storage endpoints, and treats CRDB only as a source of facts about sleds, ereports, and blueprints. The outer collector remains the manager of the bundle lifecycle. This is the first step toward a future shared crate that omdb can use to collect bundles when Nexus is down.

Lifts `nexus/src/app/background/tasks/support_bundle/` (the mechanism layer) into a new top-level crate `support-bundle-collection` so that both Nexus and omdb can call it. No logic changes; pure relocation plus import rewriting.

Wires a new subcommand on omdb that calls into the `support-bundle-collection` crate to gather a bundle locally. Unlike the Nexus background task, this path does not register a row in the `support_bundle` table, does not transfer the bundle to a sled agent, and does not require Nexus to be up — it only needs CRDB, internal DNS, MGS, and sled-agents reachable on the underlay. This is intended for incident response: when Nexus is down (the most important time to gather a bundle), an operator can still produce one locally.

hawkw

this makes sense to me! i'm not sure if erebor is actually the formatter you want for displaying things in omdb though...

hawkw · 2026-05-28T22:03:38Z

+                            "{}",
+                            erebor::Displayer::new(details)
+                                .with_initial_indent_spaces(8)
                        );


n.b. that this will display all integers in hex, which may not be what you want here --- maybe use the other indented JSON display thingy we have lying around?

hawkw · 2026-05-28T22:05:44Z

+                    warn!(
+                        self.log,
+                        "Failed to write report file";
+                        "error" => ?err


should this be an InlineErrorChain or something?

also, does this error make it into the bg task status output someplace? should it?

hawkw · 2026-05-28T22:07:15Z

+        tokio::fs::create_dir_all(&meta_dir).await.with_context(|| {
+            format!("Failed to create meta directory {meta_dir}")
+        })?;
+
+        let report_path = meta_dir.join("report.json");
+        let report_content = serde_json::to_string_pretty(report)
+            .context("Failed to serialize collection report")?;
+
+        tokio::fs::write(&report_path, report_content).await.with_context(
+            || format!("Failed to write report file to {report_path}"),
+        )?;


turbo nitpick: usually errors are lowercased:

Suggested change

tokio::fs::create_dir_all(&meta_dir).await.with_context(|| {

format!("Failed to create meta directory {meta_dir}")

})?;

let report_path = meta_dir.join("report.json");

let report_content = serde_json::to_string_pretty(report)

.context("Failed to serialize collection report")?;

tokio::fs::write(&report_path, report_content).await.with_context(

|| format!("Failed to write report file to {report_path}"),

)?;

tokio::fs::create_dir_all(&meta_dir).await.with_context(|| {

format!("failed to create meta directory {meta_dir}")

})?;

let report_path = meta_dir.join("report.json");

let report_content = serde_json::to_string_pretty(report)

.context("failed to serialize collection report")?;

tokio::fs::write(&report_path, report_content).await.with_context(

|| format!("failed to write report file to {report_path}"),

)?;

hawkw · 2026-05-28T22:09:17Z

+    let step = collection
+        .steps
+        .iter()
+        .find(|s| s.name == SupportBundleCollectionStep::STEP_EREPORTS)
+        .expect("should have ereports step");


this makes me wonder if steps should be an iddqd::IdOrdMap sorted by step names, so that we can look things up in it by key. this would also allow us to serialize it as an array while ensuring that it's always in lexicographic order, which seems maybe nice for making the omdb output consistent every time? dunno if this really matters though.

hawkw · 2026-05-28T22:14:53Z


-    Ok(CollectionStepOutput::Ereports(status))
+    let details = serde_json::to_value(&status)
+        .context("failed to serialize ereport collection status")?;


this really shouldn't happen, right? not that i think we should panic here, but...

smklein added 9 commits May 7, 2026 17:09

Move support bundle collection into a shared crate

863c24b

Lifts `nexus/src/app/background/tasks/support_bundle/` (the mechanism layer) into a new top-level crate `support-bundle-collection` so that both Nexus and omdb can call it. No logic changes; pure relocation plus import rewriting.

merge

641724e

Re-use BundleDataCategory enum

e3e45e4

Allow streaming to stdout, avoid intermediate file

c1a72bb

Add destructive argument

7e4e6bb

merge

707d566

Clean up ereport data flow in bundle collection

ba3a858

smklein mentioned this pull request May 27, 2026

[3/3] Add omdb support-bundle collect subcommand #10376

Merged

Base automatically changed from omdb-support-bundle-collect to main May 28, 2026 00:40

merge

7692d04

hawkw self-requested a review May 28, 2026 16:58

smklein mentioned this pull request May 28, 2026

test failed in CI: oximeter-collector agent::tests::test_self_stat_error_counter #10503

Open

hawkw approved these changes May 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clean up ereport data flow in bundle collection#10500

Clean up ereport data flow in bundle collection#10500
smklein wants to merge 10 commits into
mainfrom
bundle-ereport-cleanup

smklein commented May 27, 2026

Uh oh!

hawkw left a comment

Uh oh!

hawkw May 28, 2026

Uh oh!

hawkw May 28, 2026

Uh oh!

hawkw May 28, 2026

Uh oh!

hawkw May 28, 2026

Uh oh!

hawkw May 28, 2026

Uh oh!

hawkw May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

smklein commented May 27, 2026

Uh oh!

hawkw left a comment

Choose a reason for hiding this comment

Uh oh!

hawkw May 28, 2026

Choose a reason for hiding this comment

Uh oh!

hawkw May 28, 2026

Choose a reason for hiding this comment

Uh oh!

hawkw May 28, 2026

Choose a reason for hiding this comment

Uh oh!

hawkw May 28, 2026

Choose a reason for hiding this comment

Uh oh!

hawkw May 28, 2026

Choose a reason for hiding this comment

Uh oh!

hawkw May 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants