Skip to content

Clean up ereport data flow in bundle collection#10500

Open
smklein wants to merge 10 commits into
mainfrom
bundle-ereport-cleanup
Open

Clean up ereport data flow in bundle collection#10500
smklein wants to merge 10 commits into
mainfrom
bundle-ereport-cleanup

Conversation

@smklein
Copy link
Copy Markdown
Collaborator

@smklein smklein commented May 27, 2026

This PR addresses feedback from #10376 (comment)

It performs a few small-but-related changes that clean up the data flow for support bundle collection:

  • We distinguish between SupportBundleCollectionReport and SupportBundleActivationReport. The "collection report" is the sequence of steps that were involved in collecting the bundle itself. The activation report includes that, but also auxiliary information about "storing and activating the bundle" durably within Nexus. By distinguishing these two, we can (and now, do) store the "collection report" within the bundle itself. For context: See write_report_file.
  • We no longer treat "ereport status" as a special case in the bundle collection steps, nor in parsing the collection report itself. Instead, the CollectionStepOutput enum adds a variant called Details that allows any JSON value. We convert ereports to use this, and pretty-print that JSON. This allows parsers of bundle output - like omdb - to only act on the generic "bundle step" structures, rather than embedding knowledge about more step-specific output (like ereports were doing before).

smklein added 9 commits May 7, 2026 17:09
Replaces `BundleCollection.bundle: SupportBundle` with a slim
`BundleInfo { id, reason_for_creation }`. Moves the sled-storage
chunked transfer (`store_bundle_on_sled`), zip helpers
(`bundle_to_zipfile`, `recursively_add_directory_to_zipfile`,
`sha2_hash`), the `CHUNK_SIZE` and `TEMPDIR` constants, and the
DB-polling cancellation (`check_for_cancellation`) out of the inner
`support_bundle/` module and into `support_bundle_collector.rs`.

After this change the inner layer is a pure mechanism: it never reads
the `support_bundle` DB row, never talks to a sled-agent's bundle
storage endpoints, and treats CRDB only as a source of facts about
sleds, ereports, and blueprints. The outer collector remains the
manager of the bundle lifecycle.

This is the first step toward a future shared crate that omdb can use
to collect bundles when Nexus is down.
Lifts `nexus/src/app/background/tasks/support_bundle/` (the mechanism
layer) into a new top-level crate `support-bundle-collection` so that
both Nexus and omdb can call it. No logic changes; pure relocation
plus import rewriting.
Wires a new subcommand on omdb that calls into the
`support-bundle-collection` crate to gather a bundle locally. Unlike
the Nexus background task, this path does not register a row in the
`support_bundle` table, does not transfer the bundle to a sled
agent, and does not require Nexus to be up — it only needs CRDB,
internal DNS, MGS, and sled-agents reachable on the underlay.

This is intended for incident response: when Nexus is down (the most
important time to gather a bundle), an operator can still produce one
locally.
Base automatically changed from omdb-support-bundle-collect to main May 28, 2026 00:40
Copy link
Copy Markdown
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this makes sense to me! i'm not sure if erebor is actually the formatter you want for displaying things in omdb though...

"{}",
erebor::Displayer::new(details)
.with_initial_indent_spaces(8)
);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

n.b. that this will display all integers in hex, which may not be what you want here --- maybe use the other indented JSON display thingy we have lying around?

warn!(
self.log,
"Failed to write report file";
"error" => ?err
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be an InlineErrorChain or something?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, does this error make it into the bg task status output someplace? should it?

Comment on lines +345 to +355
tokio::fs::create_dir_all(&meta_dir).await.with_context(|| {
format!("Failed to create meta directory {meta_dir}")
})?;

let report_path = meta_dir.join("report.json");
let report_content = serde_json::to_string_pretty(report)
.context("Failed to serialize collection report")?;

tokio::fs::write(&report_path, report_content).await.with_context(
|| format!("Failed to write report file to {report_path}"),
)?;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

turbo nitpick: usually errors are lowercased:

Suggested change
tokio::fs::create_dir_all(&meta_dir).await.with_context(|| {
format!("Failed to create meta directory {meta_dir}")
})?;
let report_path = meta_dir.join("report.json");
let report_content = serde_json::to_string_pretty(report)
.context("Failed to serialize collection report")?;
tokio::fs::write(&report_path, report_content).await.with_context(
|| format!("Failed to write report file to {report_path}"),
)?;
tokio::fs::create_dir_all(&meta_dir).await.with_context(|| {
format!("failed to create meta directory {meta_dir}")
})?;
let report_path = meta_dir.join("report.json");
let report_content = serde_json::to_string_pretty(report)
.context("failed to serialize collection report")?;
tokio::fs::write(&report_path, report_content).await.with_context(
|| format!("failed to write report file to {report_path}"),
)?;

Comment on lines +365 to +369
let step = collection
.steps
.iter()
.find(|s| s.name == SupportBundleCollectionStep::STEP_EREPORTS)
.expect("should have ereports step");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this makes me wonder if steps should be an iddqd::IdOrdMap sorted by step names, so that we can look things up in it by key. this would also allow us to serialize it as an array while ensuring that it's always in lexicographic order, which seems maybe nice for making the omdb output consistent every time? dunno if this really matters though.


Ok(CollectionStepOutput::Ereports(status))
let details = serde_json::to_value(&status)
.context("failed to serialize ereport collection status")?;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this really shouldn't happen, right? not that i think we should panic here, but...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants