Skip to content

[4/3] Place SP task dumps under sp_task_dumps/ in support bundles#10411

Merged
smklein merged 5 commits into
mainfrom
fix-sp-task-dumps-location
May 28, 2026
Merged

[4/3] Place SP task dumps under sp_task_dumps/ in support bundles#10411
smklein merged 5 commits into
mainfrom
fix-sp-task-dumps-location

Conversation

@smklein
Copy link
Copy Markdown
Collaborator

@smklein smklein commented May 8, 2026

While I was working on #10376, I noticed that SP dumps were being placed in the top-level directory of the bundle, not within the intended subdirectory.

This PR fixes that issue.

smklein added 4 commits May 7, 2026 17:09
Replaces `BundleCollection.bundle: SupportBundle` with a slim
`BundleInfo { id, reason_for_creation }`. Moves the sled-storage
chunked transfer (`store_bundle_on_sled`), zip helpers
(`bundle_to_zipfile`, `recursively_add_directory_to_zipfile`,
`sha2_hash`), the `CHUNK_SIZE` and `TEMPDIR` constants, and the
DB-polling cancellation (`check_for_cancellation`) out of the inner
`support_bundle/` module and into `support_bundle_collector.rs`.

After this change the inner layer is a pure mechanism: it never reads
the `support_bundle` DB row, never talks to a sled-agent's bundle
storage endpoints, and treats CRDB only as a source of facts about
sleds, ereports, and blueprints. The outer collector remains the
manager of the bundle lifecycle.

This is the first step toward a future shared crate that omdb can use
to collect bundles when Nexus is down.
Lifts `nexus/src/app/background/tasks/support_bundle/` (the mechanism
layer) into a new top-level crate `support-bundle-collection` so that
both Nexus and omdb can call it. No logic changes; pure relocation
plus import rewriting.
Wires a new subcommand on omdb that calls into the
`support-bundle-collection` crate to gather a bundle locally. Unlike
the Nexus background task, this path does not register a row in the
`support_bundle` table, does not transfer the bundle to a sled
agent, and does not require Nexus to be up — it only needs CRDB,
internal DNS, MGS, and sled-agents reachable on the underlay.

This is intended for incident response: when Nexus is down (the most
important time to gather a bundle), an operator can still produce one
locally.
`spawn_collection_steps` creates an `sp_task_dumps/` directory in the
bundle root, but the per-SP closure it spawned captured the framework's
outer `dir` (the bundle root) instead of the subdirectory. As a result
the dumps landed at `<root>/sled_0/dump-0.zip`, `<root>/switch_0/...`,
etc., while `sp_task_dumps/` was created and left empty.

Capture `sp_dumps_dir` into the spawned closure so dumps land at
`sp_task_dumps/{sp.type}_{sp.slot}/dump-{i}.zip` as the surrounding
code intended.
@smklein smklein force-pushed the omdb-support-bundle-collect branch from b26398c to d180d48 Compare May 8, 2026 01:08
@smklein smklein force-pushed the fix-sp-task-dumps-location branch from 06786a0 to 8c1006c Compare May 8, 2026 01:08
@smklein smklein requested a review from wfchandler May 8, 2026 01:10
@smklein smklein marked this pull request as ready for review May 8, 2026 01:11
Copy link
Copy Markdown
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoopsie, this seems like a good idea!

Copy link
Copy Markdown
Contributor

@wfchandler wfchandler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

Base automatically changed from omdb-support-bundle-collect to main May 28, 2026 00:40
@smklein smklein merged commit e25247e into main May 28, 2026
18 checks passed
@smklein smklein deleted the fix-sp-task-dumps-location branch May 28, 2026 18:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants