You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@lde/search couples two artefacts that change together but deploy independently: the SearchSchema (which drives the derived GraphQL SDL via @lde/search-api-graphql) and the physical search index (the Typesense collection built by the engine). A config change ships the new SDL the instant the API binary deploys, but the index only reflects the change once a reindex completes. During that window SearchSchema and the live index drift, and there is currently no shared guidance or primitive for doing the change safely — every consumer (Dataset Register first) has to reinvent it.
The drift is asymmetric and direction-dependent:
Adding a field — the index must lead the SDL. An API that exposes a field the index lacks is the dangerous direction (queries fail or return nulls).
Removing a field — the SDL must lead. Stop exposing it, then drop it from the index.
In-place upsert (where supported) makes it worse than blue/green: a long-lived mixed population where some documents carry the new field and some don’t, with no clean cutover instant. Blue/green keeps each collection internally consistent (wholly vN+1) and gives a single alias-swap cutover point.
Framing: which coupling to keep, which to relax
Two different kinds of coupling are bundled here, and the design should treat them oppositely:
Derivation coupling — one SearchSchema definition projects into both the GraphQL non-null flag (build-schema.ts) and the Typesense optional flag (collection-schema.ts). The same required bit drives both. Keep this. It is the single source of truth that prevents semantic drift; hand-maintaining a GraphQL schema and a Typesense schema separately guarantees they diverge (e.g. a field marked non-null in the API but not required in the index → runtime null-violation). This is a static, build-time coupling — the benign kind.
Temporal coupling — today all projections must flip at the same instant, with only one schema version ever live (simultaneity coupling). This is what blocks migrations. The goal is not to eliminate temporal coupling (a rename inherently requires “backfill before cutover before retire”) but to relax rigid simultaneity into explicit ordered sequencing (connascence of execution order). The cutover-ordering contract below is a specification of that ordered temporal coupling.
One-line statement of intent: keep the static derivation coupling; relax the rigid simultaneous temporal coupling into explicit ordered temporal coupling, with a transitional window where two versions coexist.
Proposal
Treat a SearchSchema change like a database migration. Provide, in @lde/search, the model and helpers that make this the obvious path:
Classify a schema change as additive-nullable (safe to ship the SDL immediately) vs. breaking (rename, retype, remove, tighten-to-required, changed match/filter/sort semantics — must span ≥2 releases via expand/contract). Classify at the SearchSchema level, not the GraphQL level: that single diff is what derives both projections, so it is the only place that can reason about API and index consequences together and emit a cutover order (index-first / api-first / multi-release).
graphql-js already ships findBreakingChanges / findDangerousChanges. Running them over buildSearchSchema(prev) vs buildSearchSchema(next) gives the GraphQL-contract half almost for free — wire it in as a belt-and-suspenders cross-check on the GraphQL projection (it also catches a future buildSearchSchema regression silently altering the SDL, which is the job of the existing printSearchSchema snapshot test).
Note the limits: printSchema is a printer, not a differ — the snapshot test is a tripwire that detects-but-doesn’t-classify. And findBreakingChanges reasons only in GraphQL-consumer terms; it knows nothing about the index half (Typesense optional: false rejecting documents, the data-completeness gate, the sh:minCount precondition). That index half has no graphql-js equivalent and is the net-new piece to write.
New fields are optional in both the collection schema and the GraphQL type — never mocked with empty strings (indistinguishable from real empty values; corrupts facet counts and sorts). This is also the concrete fix for the engine erroring when a sort/facet field is absent on some documents.
Tightening an optional field to required is a breaking change, not an additive one — it is the search-index equivalent of adding NOT NULL. It is gated on data completeness, not on “a reindex finished”: marking required sets Typesense optional: false (the next rebuild rejects any document missing the value) and wraps the GraphQL field in GraphQLNonNull (query-time null-violation for any document missing it). Safe path: ship the field optional, complete a rebuild, verify zero nulls across the whole collection — ideally backed by the source SHACL shape enforcing sh:minCount ≥ 1, which is the real “if applicable” gate — then flip to required. (required is moot for arrays/booleans/id, which are non-null regardless.)
Document the cutover-ordering contract (index-leads-for-add, SDL-leads-for-remove) and the recommendation to reindex into the new collection, then cut over API + alias together — collapsing the drift window to ~zero for fast rebuilds.
Keep the alias swap as the single cutover point so a later async-reindex orchestrator (background reindex, queue the schema change, go live on completion) can be layered on without redesign. That orchestrator is out of scope here — it only earns its keep once a rebuild is too long to block a deploy and the additive-nullable drift window becomes unacceptable.
Build the classifier now. It is a pure function over two SearchSchemas; it encodes this contract as executable code rather than prose, gates CI (a breaking diff without the version bump / multi-release plan fails the build), and is the substrate every later capability builds on. The classifier is the one piece worth front-running, because it is a contract specification, not an abstraction over instances.
Do not add Rails-style up/down transform hooks yet. Under blue/green full rebuild there is no data to migrate — every document is re-derived from source through the new schema and the old collection is discarded, so a transform hook would be dead code. Even a new derived field needs none (derivation already lives in the schema; a rebuild computes it for free). A transform hook only has a job under in-place upserts.
General rule: don’t generalise migration machinery until there are ≥2–3 real migrations to generalise from — otherwise it abstracts a pattern of one.
When the heavier machinery starts paying off
Today (single consumer, full blue/green rebuilds) even a breaking change is just “deploy new schema → full rebuild → swap alias,” with the only exposure being rebuild duration. The transitional-superset / async-orchestrator / transform-hook investment earns its keep only when one of these fires:
Multiple API consumers that can’t redeploy in lockstep (the GraphQL contract then needs a real deprecation window).
Incremental / in-place upserts replace full rebuilds (the mixed-population problem above, plus an actual transform step to hook).
Rebuild time becomes a real outage (large index → the swap window stops being negligible).
Notes
Dataset Register already runs exclusively as blue/green full rebuild (fresh ${alias}_${timestamp} collection, atomic alias swap, drop previous; concurrent triggers skipped, not queued). It is the first consumer that needs this and a good proving ground.
A migration runner must confirm the rebuild it depends on actually completed (alias points at the new timestamped collection) before flipping the API live, because a trigger arriving mid-rebuild is silently skipped.
Problem
@lde/searchcouples two artefacts that change together but deploy independently: theSearchSchema(which drives the derived GraphQL SDL via@lde/search-api-graphql) and the physical search index (the Typesense collection built by the engine). A config change ships the new SDL the instant the API binary deploys, but the index only reflects the change once a reindex completes. During that windowSearchSchemaand the live index drift, and there is currently no shared guidance or primitive for doing the change safely — every consumer (Dataset Register first) has to reinvent it.The drift is asymmetric and direction-dependent:
Framing: which coupling to keep, which to relax
Two different kinds of coupling are bundled here, and the design should treat them oppositely:
SearchSchemadefinition projects into both the GraphQL non-null flag (build-schema.ts) and the Typesenseoptionalflag (collection-schema.ts). The samerequiredbit drives both. Keep this. It is the single source of truth that prevents semantic drift; hand-maintaining a GraphQL schema and a Typesense schema separately guarantees they diverge (e.g. a field marked non-null in the API but not required in the index → runtime null-violation). This is a static, build-time coupling — the benign kind.One-line statement of intent: keep the static derivation coupling; relax the rigid simultaneous temporal coupling into explicit ordered temporal coupling, with a transitional window where two versions coexist.
Proposal
Treat a
SearchSchemachange like a database migration. Provide, in@lde/search, the model and helpers that make this the obvious path:SearchSchemalevel, not the GraphQL level: that single diff is what derives both projections, so it is the only place that can reason about API and index consequences together and emit a cutover order (index-first/api-first/multi-release).findBreakingChanges/findDangerousChanges. Running them overbuildSearchSchema(prev)vsbuildSearchSchema(next)gives the GraphQL-contract half almost for free — wire it in as a belt-and-suspenders cross-check on the GraphQL projection (it also catches a futurebuildSearchSchemaregression silently altering the SDL, which is the job of the existingprintSearchSchemasnapshot test).printSchemais a printer, not a differ — the snapshot test is a tripwire that detects-but-doesn’t-classify. AndfindBreakingChangesreasons only in GraphQL-consumer terms; it knows nothing about the index half (Typesenseoptional: falserejecting documents, the data-completeness gate, thesh:minCountprecondition). That index half has no graphql-js equivalent and is the net-new piece to write.optionalin both the collection schema and the GraphQL type — never mocked with empty strings (indistinguishable from real empty values; corrupts facet counts and sorts). This is also the concrete fix for the engine erroring when a sort/facet field is absent on some documents.requiredis a breaking change, not an additive one — it is the search-index equivalent of addingNOT NULL. It is gated on data completeness, not on “a reindex finished”: markingrequiredsets Typesenseoptional: false(the next rebuild rejects any document missing the value) and wraps the GraphQL field inGraphQLNonNull(query-time null-violation for any document missing it). Safe path: ship the fieldoptional, complete a rebuild, verify zero nulls across the whole collection — ideally backed by the source SHACL shape enforcingsh:minCount ≥ 1, which is the real “if applicable” gate — then flip torequired. (requiredis moot for arrays/booleans/id, which are non-null regardless.)Scope: build the classifier, not transform hooks
Distinguish classify/verify functions (pure, cheap, valuable now) from transform/execute hooks (stateful, premature):
SearchSchemas; it encodes this contract as executable code rather than prose, gates CI (a breaking diff without the version bump / multi-release plan fails the build), and is the substrate every later capability builds on. The classifier is the one piece worth front-running, because it is a contract specification, not an abstraction over instances.up/downtransform hooks yet. Under blue/green full rebuild there is no data to migrate — every document is re-derived from source through the new schema and the old collection is discarded, so a transform hook would be dead code. Even a new derived field needs none (derivation already lives in the schema; a rebuild computes it for free). A transform hook only has a job under in-place upserts.When the heavier machinery starts paying off
Today (single consumer, full blue/green rebuilds) even a breaking change is just “deploy new schema → full rebuild → swap alias,” with the only exposure being rebuild duration. The transitional-superset / async-orchestrator / transform-hook investment earns its keep only when one of these fires:
Notes
${alias}_${timestamp}collection, atomic alias swap, drop previous; concurrent triggers skipped, not queued). It is the first consumer that needs this and a good proving ground.