Skip to content

[fm] Add guardrails for fact payload representations#10508

Draft
smklein wants to merge 1 commit into
fm-disk-diagnoserfrom
fm-fact-guardrails
Draft

[fm] Add guardrails for fact payload representations#10508
smklein wants to merge 1 commit into
fm-disk-diagnoserfrom
fm-fact-guardrails

Conversation

@smklein
Copy link
Copy Markdown
Collaborator

@smklein smklein commented May 29, 2026

Persisted diagnosis-engine fact payloads (the fm_fact.payload JSONB column) had no signal when their serialized shape changed. Because deployed databases already hold rows in the old shape, such a change is a data migration, and CockroachDB's schema-diff tests do not catch it (the column is unchanged JSONB; only the data changes).

Add payload-stability tests in nexus/fm/src/diagnosis:

  • schemars JSON-schema snapshot per DE payload (structure: variants, fields, types, and embedded-enum domains, including cross-crate types like ZpoolHealth)
  • a central every_de_kind_is_registered_or_exempt check over DiagnosisEngineKind so a future DE cannot silently skip the guardrail

The schemars/strum derives are cfg(test)-gated, so they stay dev-only with no production footprint. Module docs in diagnosis/mod.rs explain additive (serde-compatible) vs breaking changes and how to write a DE-scoped JSONB data migration that reuses the schema/crdb mechanism.

@smklein smklein force-pushed the fm-fact-guardrails branch from adb0d8a to 012096a Compare May 29, 2026 00:47
@smklein smklein changed the title [fm] Add guardrail pinning persisted fact payload representations [fm] Add guardrails for fact payload representations May 29, 2026
@smklein smklein force-pushed the fm-fact-guardrails branch from 012096a to ddeac01 Compare May 29, 2026 15:40
//! directory, bump `KNOWN_VERSIONS` and `dbinit.sql` (see
//! `schema/crdb/README.adoc`) — even though, from CockroachDB's point of view,
//! the column is unchanged JSONB and only the data changes — add a case to
//! `validate_data_migration()` in `nexus/tests/integration_tests/schema.rs`, and
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do still feel like this is perhaps the most critical piece here - validating data migrations, with:

"Shove old format into DB"
"Run migration"
"Migrated old data can deserialize as new data"

But hopefully the rest of scaffolding here helps catch "when we should do that".

Persisted diagnosis-engine fact payloads (the fm_fact.payload JSONB column)
had no signal when their serialized shape changed. Because deployed databases
already hold rows in the old shape, such a change is a data migration, not
just a code change, and CockroachDB's schema-diff tests do not catch it (the
column is unchanged JSONB; only the data changes).

Add payload-stability tests in nexus/fm/src/diagnosis:
- schemars JSON-schema snapshot per DE payload (structure: variants, fields,
  types, and embedded-enum domains, including cross-crate types like
  ZpoolHealth)
- strum-enforced serde byte sample per variant (exact on-disk bytes), plus a
  round-trip test proving the pinned format still deserializes
- a central every_de_payload_is_pinned check over DiagnosisEngineKind so a
  future DE cannot silently skip the guardrail

The schemars/strum derives are cfg(test)-gated, so they stay dev-only with no
production footprint. Module docs in diagnosis/mod.rs explain additive
(serde-compatible) vs breaking changes and how to write a DE-scoped JSONB
data migration that reuses the schema/crdb mechanism.
@smklein smklein force-pushed the fm-fact-guardrails branch from ddeac01 to d7de080 Compare May 29, 2026 17:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant