|
| 1 | +--- |
| 2 | +name: add-background-task |
| 3 | +description: Add a new Nexus background task. Use when the user wants to create a periodic background task in Nexus that runs on a timer. |
| 4 | +--- |
| 5 | + |
| 6 | +# Add a Nexus background task |
| 7 | + |
| 8 | +All background tasks live in Nexus. A task implements the `BackgroundTask` trait (`nexus/src/app/background/mod.rs`), runs on a configurable period, and reports status as `serde_json::Value`. |
| 9 | + |
| 10 | +## General approach |
| 11 | + |
| 12 | +There are many existing background tasks in `nexus/src/app/background/tasks/`. Before writing anything, read a few tasks that are similar in shape to the one you're adding (e.g., a simple periodic cleanup vs. a task that watches a channel). Use those as models for structure, naming, logging, error handling, and status reporting. The goal is to conform to the patterns already in use, not to invent new ones. |
| 13 | + |
| 14 | +## Checklist |
| 15 | + |
| 16 | +These are the touch points for adding a new background task. Follow them in order. |
| 17 | + |
| 18 | +### 1. Status type (`nexus/types/src/internal_api/background.rs`) |
| 19 | + |
| 20 | +Define a struct for the task's activation status. Derive `Clone, Debug, Deserialize, Serialize, PartialEq, Eq`. For errors, use `Option<String>` if the task can only fail in one way per activation, or `Vec<String>` if it accumulates multiple independent errors. Match what similar tasks do. |
| 21 | + |
| 22 | +### 2. Task implementation (`nexus/src/app/background/tasks/<name>.rs`) |
| 23 | + |
| 24 | +Create the task module. The struct holds whatever state it needs (typically `Arc<DataStore>` plus config). Implement `BackgroundTask::activate` by delegating to an `actually_activate` helper, then serialize the status to `serde_json::Value`. The `actually_activate` pattern makes unit testing easy without going through the trait. |
| 25 | + |
| 26 | +`actually_activate` can either build and return the status (`async fn actually_activate(&mut self, opctx) -> YourStatus`), or take a mutable reference to one (`async fn actually_activate(&mut self, opctx, status: &mut YourStatus) -> Result<(), Error>`). The first is simpler and works well when the task either fully succeeds or fully fails. The second is better when the task can partially complete (e.g., it loops over work items): `activate` creates the status struct up front, passes it in, and serializes it afterward regardless of `Ok`/`Err`, so any progress already recorded in `status` (items processed, partial counts, earlier errors) is preserved even if the method bails out with `?` later. |
| 27 | + |
| 28 | +Logging conventions: `debug` when there's nothing to do, `info` when routine work was done, `warn` when the work done indicates something is wrong (e.g., cleaning up after a crash), `error` on failure. Log errors as structured fields with the `; &err` slog syntax (which uses the `SlogInlineError` trait), not by interpolating into the message string. For the error string in the status struct, use `InlineErrorChain::new(&err).to_string()` (from `slog_error_chain`) to capture the full cause chain. Status error strings should not repeat the task name — omdb already shows which task you're looking at. |
| 29 | + |
| 30 | +If the task takes config values that need conversion or validation (e.g., converting a `Duration` to `TimeDelta`, or checking a numeric range), do it once in `new()` and store the validated form. Don't re-validate on every activation — if the config is invalid, panic in `new()` with a message that includes the invalid value. |
| 31 | + |
| 32 | +Include a unit test in the same file using `TestDatabase::new_with_datastore` that calls `actually_activate` directly. If the task has a datastore method, a single test exercising the task end-to-end (including the limit/batching behavior) is sufficient — don't add a redundant test for the datastore method separately unless it has complex logic worth testing in isolation. |
| 33 | + |
| 34 | +### 3. Register the module (`nexus/src/app/background/tasks/mod.rs`) |
| 35 | + |
| 36 | +Add `pub mod <name>;` in alphabetical order. |
| 37 | + |
| 38 | +### 4. Activator (`nexus/background-task-interface/src/init.rs`) |
| 39 | + |
| 40 | +Add `pub task_<name>: Activator` to the `BackgroundTasks` struct, maintaining alphabetical order among the task fields. |
| 41 | + |
| 42 | +### 5. Config (`nexus-config/src/nexus_config.rs`) |
| 43 | + |
| 44 | +Add a config struct (e.g., `YourTaskConfig`) with at minimum `period_secs: Duration` (using `#[serde_as(as = "DurationSeconds<u64>")]`). If the task does bounded work per activation, name the limit field `max_<past_tense_verb>_per_activation` (e.g., `max_deleted_per_activation`, `max_timed_out_per_activation`) to match existing conventions. Add the field to `BackgroundTaskConfig`. Update the test config literal and expected parse output at the bottom of the file. |
| 45 | + |
| 46 | +### 6. Config files |
| 47 | + |
| 48 | +Add the new config fields to all of these: |
| 49 | +- `nexus/examples/config.toml` |
| 50 | +- `nexus/examples/config-second.toml` |
| 51 | +- `nexus/tests/config.test.toml` |
| 52 | +- `smf/nexus/single-sled/config-partial.toml` |
| 53 | +- `smf/nexus/multi-sled/config-partial.toml` |
| 54 | + |
| 55 | +### 7. Wire up in `nexus/src/app/background/init.rs` |
| 56 | + |
| 57 | +- Import the task module. |
| 58 | +- Add `Activator::new()` in the `BackgroundTasks` constructor. |
| 59 | +- Destructure it in the `start` method. |
| 60 | +- Call `driver.register(TaskDefinition { ... })` with the task. The last task registered should pass `datastore` by move (not `.clone()`), so adjust the previous last task if needed. |
| 61 | +- If extra data is needed from `BackgroundTasksData`, add the field there and plumb it from `nexus/src/app/mod.rs`. |
| 62 | + |
| 63 | +### 8. Schema migration (if needed) |
| 64 | + |
| 65 | +If the task needs a new index or schema change to support its query, add a migration under `schema/crdb/`. See `schema/crdb/README.adoc` for the procedure. Also update `dbinit.sql` and bump the version in `nexus/db-model/src/schema_versions.rs`. |
| 66 | + |
| 67 | +### 9. Datastore method (if needed) |
| 68 | + |
| 69 | +If the task needs a new query, add it in the appropriate `nexus/db-queries/src/db/datastore/` file. Prefer the Diesel typed DSL over raw SQL (`diesel::sql_query`) for queries and test helpers. Only fall back to raw SQL when the DSL genuinely can't express the query. |
| 70 | + |
| 71 | +If the task modifies rows that other code paths also modify, think about races: what happens if both run concurrently on the same row? Both paths should typically guard their writes so only one succeeds. |
| 72 | + |
| 73 | +### 10. omdb output (`dev-tools/omdb/src/bin/omdb/nexus.rs`) |
| 74 | + |
| 75 | +Add a `print_task_<name>` function and wire it into the match in `print_task_details` (alphabetical order). Import the status type. Use the `const_max_len` + `WIDTH` pattern to align columns: |
| 76 | + |
| 77 | +```rust |
| 78 | +const LABEL: &str = "label:"; |
| 79 | +const WIDTH: usize = const_max_len(&[LABEL, ...]) + 1; |
| 80 | +println!(" {LABEL:<WIDTH$}{}", status.field); |
| 81 | +``` |
| 82 | + |
| 83 | +### 11. Update test output (`dev-tools/omdb/tests/`) |
| 84 | + |
| 85 | +Run the omdb tests with `EXPECTORATE=overwrite` to update the expected output snapshots (`env.out` and `successes.out`): |
| 86 | + |
| 87 | +``` |
| 88 | +EXPECTORATE=overwrite cargo nextest run -p omicron-omdb |
| 89 | +``` |
| 90 | + |
| 91 | +Review the diff to make sure only your new task's output was added. |
| 92 | + |
| 93 | +### 12. Verify |
| 94 | + |
| 95 | +- `cargo check -p omicron-nexus --all-targets` |
| 96 | +- `cargo fmt` |
| 97 | +- `cargo xtask clippy` |
| 98 | +- Run the new task's unit tests |
| 99 | +- Run the omdb tests: `cargo nextest run -p omicron-omdb` |
0 commit comments