Skip to content

Add time-range filtering to support bundle log collection#10423

Draft
smklein wants to merge 4 commits into
mainfrom
support-bundle-time-range
Draft

Add time-range filtering to support bundle log collection#10423
smklein wants to merge 4 commits into
mainfrom
support-bundle-time-range

Conversation

@smklein
Copy link
Copy Markdown
Collaborator

@smklein smklein commented May 8, 2026

Adds a bundle-wide time-range concept that flows from the omdb CLI down through BundleDataSelection, the support-bundle-collection mechanism, the sled-agent API, and into sled-diagnostics, where the per-file mtime gate is applied to archived and extra log files (current logs are always included regardless of the window).

Highlights:

  • New BundleTimeRange { start, end } type in nexus_types::support_bundle. The window is bundle-wide: a single BundleDataSelection::time_range field, applied to both host-info log collection and ereport queries.
  • BundleDataSelection::all() defaults to a 7-day window — same default as ereports already had; logs gain the same cap where they had none.
  • Sled-agent API gets a new VERSION_ADD_LOG_TIME_RANGE (v38) that reshapes SledDiagnosticsLogsDownloadQueryParam: max_rotated becomes optional; start_time/end_time are added.
  • sled_diagnostics::LogsHandle::get_zone_logs keeps oxlog's date_range: None and applies the time gate explicitly inside get_zone_logs_inner. Files with unknown mtime are over-inclusively kept.
  • Sim sled-agent gains a SimLogEntry injection surface so tests can construct synthetic log files with controlled mtimes; default behavior (no entries injected) remains an empty zone list.
  • omdb support-bundle collect gains --since <duration> and --until <duration> flags (humantime, both relative to now).
  • Persistence: schema v257 adds bundle-level support_bundle_data_selection_time_range and the FM equivalent, keyed by bundle / sitrep+request id. Existing per-category time columns on the ereports tables are migrated and dropped.
  • Integration test in nexus/tests/integration_tests/support_bundles.rs injects entries on the sim sled-agent at 30 m, 6 h, and 30 d ago, collects with a 24 h window and a 7 d window, and asserts only the recent entries land in the bundle.

Fixes #10372.

smklein added 3 commits May 7, 2026 17:09
Replaces `BundleCollection.bundle: SupportBundle` with a slim
`BundleInfo { id, reason_for_creation }`. Moves the sled-storage
chunked transfer (`store_bundle_on_sled`), zip helpers
(`bundle_to_zipfile`, `recursively_add_directory_to_zipfile`,
`sha2_hash`), the `CHUNK_SIZE` and `TEMPDIR` constants, and the
DB-polling cancellation (`check_for_cancellation`) out of the inner
`support_bundle/` module and into `support_bundle_collector.rs`.

After this change the inner layer is a pure mechanism: it never reads
the `support_bundle` DB row, never talks to a sled-agent's bundle
storage endpoints, and treats CRDB only as a source of facts about
sleds, ereports, and blueprints. The outer collector remains the
manager of the bundle lifecycle.

This is the first step toward a future shared crate that omdb can use
to collect bundles when Nexus is down.
Lifts `nexus/src/app/background/tasks/support_bundle/` (the mechanism
layer) into a new top-level crate `support-bundle-collection` so that
both Nexus and omdb can call it. No logic changes; pure relocation
plus import rewriting.
Wires a new subcommand on omdb that calls into the
`support-bundle-collection` crate to gather a bundle locally. Unlike
the Nexus background task, this path does not register a row in the
`support_bundle` table, does not transfer the bundle to a sled
agent, and does not require Nexus to be up — it only needs CRDB,
internal DNS, MGS, and sled-agents reachable on the underlay.

This is intended for incident response: when Nexus is down (the most
important time to gather a bundle), an operator can still produce one
locally.
@smklein smklein force-pushed the support-bundle-time-range branch 2 times, most recently from 00ec361 to 2ea3ecc Compare May 8, 2026 21:37
result = client.support_logs_download(zone, DEFAULT_MAX_ROTATED_LOGS) => result,
result = client.support_logs_download(
zone,
/* end_time */ end.as_ref(),
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Query parameters are apparently supplied in alphabetical order, which is why "end time" comes before "start time" 🫠

Resolves #10372. Adds a bundle-wide time-range concept that flows
from the omdb CLI down through BundleDataSelection, the
support-bundle-collection mechanism, the sled-agent API, and into
sled-diagnostics, where the per-file mtime gate is applied to
archived and extra log files (current logs are always included
regardless of the window).

Highlights:

- New `BundleTimeRange { start, end }` type in
  `nexus_types::support_bundle`. The window is bundle-wide: a
  single `BundleDataSelection::time_range` field, applied to both
  host-info log collection and ereport queries.
- `BundleDataSelection::all()` defaults to a 7-day window — same
  default as ereports already had; logs gain the same cap where
  they had none.
- Sled-agent API gets a new `VERSION_ADD_LOG_TIME_RANGE` (v38) that
  reshapes `SledDiagnosticsLogsDownloadQueryParam`: `max_rotated`
  becomes optional; `start_time`/`end_time` are added.
- `sled_diagnostics::LogsHandle::get_zone_logs` keeps oxlog's
  `date_range: None` and applies the time gate explicitly inside
  `get_zone_logs_inner`. Files with unknown mtime are
  over-inclusively kept.
- Sim sled-agent gains a `SimLogEntry` injection surface so tests
  can construct synthetic log files with controlled mtimes; default
  behavior (no entries injected) remains an empty zone list.
- omdb `support-bundle collect` gains `--since <duration>` and
  `--until <duration>` flags (humantime, both relative to now).
- Persistence: schema v257 adds bundle-level
  `support_bundle_data_selection_time_range` and the FM equivalent,
  keyed by bundle / sitrep+request id. Existing per-category time
  columns on the ereports tables are migrated and dropped.
- Integration test in `nexus/tests/integration_tests/support_bundles.rs`
  injects entries on the sim sled-agent at 30 m, 6 h, and 30 d ago,
  collects with a 24 h window and a 7 d window, and asserts only the
  recent entries land in the bundle.
@smklein smklein force-pushed the support-bundle-time-range branch from 2ea3ecc to 74bc50e Compare May 8, 2026 22:23
Base automatically changed from omdb-support-bundle-collect to main May 28, 2026 00:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant