Skip to content

Migrate to elide toolkit; build engine resource model + orchestrator#287

Open
martsokha wants to merge 7 commits into
mainfrom
migrate/elide
Open

Migrate to elide toolkit; build engine resource model + orchestrator#287
martsokha wants to merge 7 commits into
mainfrom
migrate/elide

Conversation

@martsokha

Copy link
Copy Markdown
Member

Summary

Switches the runtime off the in-tree nvisy-{core,context,pattern,ner,llm,codec,toolkit} crates and onto the upstream elide toolkit, then builds the engine — multi-tenant resource registry + run orchestrator — on top of it.

End-state on this branch:

  • elide is a workspace dep (elide, elide-core, elide-ner, elide-ocr). Recognition primitives, modality types, codec, anonymizer, and provenance all come from upstream.
  • nvisy-core slims to {error, health, policy, plan, schema} — primitives, modality, codec, recognition no longer live here; they come from elide-core / elide.
  • nvisy-engine is the new home for everything runtime-shaped:
    • analyzer/ — compile a per-request AnalyzerSpec into elide::Analyzer<M> per modality (text / tabular / image / audio).
    • anonymizer/ — compile Policy sets into elide::Anonymizer<M> per modality; pattern-matches single-label / single-tag Predicate shapes onto elide's indexed fast paths, falls through to with_catalog_predicate for composites.
    • registry/RegistryHandle over fjall: six keyspaces (policies, contexts, run_headers, run_docs, run_artifacts, run_inputs) under (actor, …) keys, Arc-backed for cheap clone.
    • policies/ and contexts/ — symmetric CRUD over versioned resource blobs.
    • runs/ — two-phase run lifecycle: start mints a UUIDv7 run, persists inputs, fans the analyzer out per doc under futures::buffer_unordered with a hard tokio::time::timeout, then flips the header to AwaitingReview. apply resolves policies, fans the anonymiser out the same way, layers reviewer overrides as high-precedence per-entity rules in front of the policy chain (decorator pattern, first-match-wins gives them priority), writes the redacted bytes to run_artifacts, and flips the header to Applied / PartiallyApplied. override_entity lets reviewers stamp a RuleAction onto one recognised entity by id.
  • elide-bento is a new local crate carrying the BentoML NER + OCR backends that aren't suitable for upstream (vendor-specific).
  • Policy module redesign. UUID identity end-to-end. Rule carries one composable Predicate instead of the old Label/Tag/Predicate three-way kind enum. Reviewer overrides + policy rules both flow through the same Attribution slot — policy_id from the policy's UUID, reason from the rule's UUID (or the entity id for overrides) — so audits trace every redaction back to the exact rule that fired.

Pairs with nvisycom/elide#93 (already merged) which added Anonymizer::with_catalog_predicate so closures can see the per-anonymizer LabelCatalog — unblocks Predicate::TagOneOf inside composite predicates.

Commits

  1. cb4c7445 deps: add elide as upstream toolkit; wire engine to elide deps
  2. 7b848254 workspace: delete superseded crates; add elide-bento; slim nvisy-core
  3. 1033d9aa core: redesign policy module on elide-core types
  4. dc167145 core,engine: pivot to schema-types-as-wire, conversions in core
  5. 786f4ad8 core,engine,bento: full operator set, analyzer plan, attribution wiring
  6. ad7e9418 core,engine: engine resource model + run orchestrator

Test plan

  • cargo check -p nvisy-engine green
  • cargo build -p nvisy-engine green
  • cargo clippy -p nvisy-engine --no-deps green (three pre-existing too_many_arguments lints on the orchestrator fan-out helpers — flagged in code, fixed by the EngineHandle bundle in the next slice)
  • Out of scope this PR: elide-fake still references the pre-migration nvisy_core API surface (nvisy_core::primitive::LanguageTag, nvisy_core::redaction::Anonymizer, etc.) and won't compile. Tracked separately as E4.1 — delete legacy crates after this lands.
  • Out of scope this PR: nvisy-server and nvisy-cli still pin to the old toolkit deps. Switching them is E3.3 / E3.4 — they'll land after the engine surface stabilises.

Follow-ups

  • E3.3 / E3.4 — switch nvisy-server + nvisy-cli to the new engine surface
  • E3.6 — workspace gates green (after E4.1 deletes the legacy crates)
  • E4.1 / E4.2 — delete crates/nvisy-{core,context,pattern,ner,llm,codec,toolkit} directories and elide-fake
  • EngineHandle bundle — fold RegistryHandle + FormatRegistry into one handle, eliminates the three too_many_arguments lints on the orchestrator helpers

🤖 Generated with Claude Code

martsokha and others added 6 commits June 22, 2026 16:18
Workspace gains `elide`, `elide-core`, `elide-llm` as git deps tracking
`nvisycom/elide`'s main branch. `nvisy-engine`/`nvisy-server`/`nvisy-cli`
drop the per-modality `rich` feature (gone from elide; collapsed into
parent modalities with sub-handlers) and the LLM provider toggles
(`openai`/`anthropic`/`google`/`bento`) — all provider backends now
enabled by default through elide-llm.

Engine's manifest now consumes elide+elide-core+elide-llm in place of
the local `nvisy-{core,context,pattern,ner,llm,codec,toolkit}` crates.
Those local crates remain on disk and as workspace path-deps so the
not-yet-migrated consumers (server/cli/fake/ocr/stt/toolkit) keep
parsing; each leaves the workspace alongside its consumer's migration.

Engine source still imports `nvisy_*` paths and will not compile until
the import rewire pass lands (E3.2c).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…sy-core

Bulk teardown of the runtime's toolkit half now that elide ships
upstream equivalents.

Deleted:
  - nvisy-{pattern,llm,ner,ocr,stt}: superseded by elide-{pattern,llm,
    ner,ocr,stt} on nvisycom/elide main
  - nvisy-codec: superseded by elide-codec
  - nvisy-context: superseded by elide-context
  - nvisy-toolkit: superseded by the elide umbrella crate (Analyzer +
    Anonymizer + deduplication layers + operators all ship there)

Renamed:
  - nvisy-fake -> elide-fake (runtime-owned extension over elide types;
    source still uses nvisy_core paths and will be reworked in its own
    pass)

Created:
  - elide-bento: shared BentoML HTTP client wrapper for elide backends
    (per E0.3 plan; minimal boilerplate -- BentoClient + BentoParams +
    BentoError; per-modality backends compose from this in the consuming
    crates)

Slimmed nvisy-core to {error, health, policy}:
  - dropped entity/extraction/modality/primitive/recognition/redaction
    (all re-exported from elide-core at consumer sites)
  - moved nvisy-engine/src/policy/ -> nvisy-core/src/policy/ (policy is
    the runtime's public governance contract; engine consumes it)
  - added elide-core as nvisy-core's only upstream dep so Policy types
    reference elide_core::entity::Label directly

State of the migration: workspace parses; elide-bento is the only crate
that compiles end-to-end. nvisy-core/engine/server/cli/elide-fake source
still imports deleted nvisy-* paths and will be redesigned crate-by-
crate on top of elide's Analyzer/Anonymizer/Orchestrator surface.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
nvisy-core trimmed to {service, policy, schema, context-gated-out}.
Policy is now the runtime's serializable governance spec, structurally
intact from before but rewired to reference elide vocabulary:

  - Label / LabelRef / LabelCatalog from elide_core::entity (renamed
    from EntityLabel*)
  - ConfidenceThreshold from elide_core::primitive
  - Modality + per-modality types (Text/Tabular/Image/Audio) from
    elide_core::modality
  - OperatorId from elide_core::redaction
  - HashAlgorithm mirrored locally (elide's enum is not serializable
    upstream by design; nvisy-core's wire spec owns its vocabulary)

PolicyModality replaces DocumentModality: a runtime-side extension of
elide_core::modality::Modality that pairs each modality with its
serializable redaction spec enum (TextRedaction, ImageRedaction, …).

TextRedaction::Redact renamed to Erase aligning with elide's operator
vocabulary (commit 3598798). No legacy alias — wire format is the
elide vocabulary.

Encrypt operator dropped from the wire enum: reversible AES-256-GCM
needs raw key material, can't safely live in declarative config.
Deployments register custom encrypt operators by OperatorId.

JsonSchema derives kept on policy types. Elide stays free of schemars
(schema generation is an HTTP concern the toolkit doesn't model);
nvisy-core proxies elide types via #[schemars(with = "...")] pointing
at lightweight proxy structs in nvisy-core::schema. Lifted schema to a
top-level module so anything in nvisy-core embedding elide types can
reuse it.

Service module added at crates/nvisy-core/src/service/{mod,error,health}
grouping the runtime's error + healthcheck concerns under one roof.

Context module gated out (// pub mod context) until it gets its own
redesign pass on elide types — its old body referenced deleted
nvisy_core::entity/primitive paths and missed the jiff dep.

nvisy-core now compiles standalone against elide + elide-core.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…core

nvisy-core wire shapes now embed `*Schema` types directly (no
`#[schemars(with = "...")]` overrides). schema.rs gains round-trip
`From`/`Into` impls so engine consumes nvisy-core specs as elide
types via `.into()`. The dual model (embed elide + proxy for
schema) is gone; one type owns each field.

- LabelSchema, OperatorIdSchema, PointSchema, BoundingBoxSchema,
  PolygonSchema, TimeSpanSchema, LanguageTagSchema in
  nvisy_core::schema, each with From/Into to its elide-core
  counterpart (LanguageTag uses TryFrom for parse failures).
- Policy.labels: Vec<LabelSchema>; selector confidence is f32;
  selector labels and tags become Vec<String> (HipStr/LabelRef are
  internal vocabulary, not wire vocabulary).
- TextRedaction/ImageRedaction/AudioRedaction/TabularRedaction
  `Custom { id }` carries OperatorIdSchema.
- PolicyModality trait removed: engine reads `redactions.text` etc.
  directly per modality; the projection trait was dispatch sugar
  not worth a trait hierarchy.
- EntitySelector::matches removed: matching runs in engine where
  the catalog + entity live; nvisy-core only owns the spec.

Engine teardown:

- Wiped crates/nvisy-engine/src/{core,detection,document,modality,
  redaction}. Old detection/redaction/document machinery is
  superseded by elide's Analyzer/Anonymizer/Orchestrator/Report.
- Salvaged registry/{composite_key, fjall_ext, paged} as
  multi-tenant storage primitives (no upstream equivalent;
  CompositeKey actor scoping is genuine engine value).
- Engine src/ skeleton now: lib.rs, registry/, policy_compile.rs
  placeholder for the upcoming compile pass.
- Engine Cargo.toml stripped to its actual deps: nvisy-core,
  elide+elide-core+elide-llm, derive_more, uuid, bytes, fjall.
- Old engine tests deleted (rebuilt as new modules land).

nvisy-core also gains:

- src/source.rs (ContentSource) restored from git history as a
  top-level module — content lineage tracking is service-level,
  not entity-vocabulary.
- src/service/{mod,error,health} groups the runtime's error +
  healthcheck concerns.

`cargo check -p nvisy-core` green; `cargo check -p elide-bento`
green; `cargo machete` reports no unused deps in nvisy-core. Engine
will be filled in module-by-module on top of the new compile seam.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…n wiring

nvisy-core:
  - schema/ becomes a folder module split by elide type: label,
    operator, geometry (Point/BoundingBox/Polygon), time, language,
    color, waveform. Each carries round-trip From/Into to its
    elide-core counterpart so engine consumes specs as elide types
    via `.into()`.
  - policy/redaction expands to the full operator catalogue elide
    ships: Text + Tabular get Erase/Keep/Mask/Replace/Hash/
    Pseudonymize/Encrypt (Tabular adds DropRow/DropColumn); Image
    gets Erase/Keep/Blur/Pixelate/Blackbox; Audio gets Erase/Keep/
    Silence/Beep. `Custom` escape hatches removed -- every wire
    operator is predefined and surfaces here when it lands in elide.
  - new plan/ module sibling to policy/: AnalyzerSpec ties together
    RecognizerSpec (Pattern/Ner/Llm with inline backend configs),
    EnricherSpec (Language/Ocr), DeduplicationSpec (calibrate +
    fusion strategy + resolution strategy + min_confidence), and
    ScopeSpec (languages + jurisdictions). Pure data; JsonSchema
    derives across.
  - ContentSource restored as top-level `source` module; context
    module re-enabled with elide-core primitive imports + schema
    proxies (BoundingBoxSchema, TimeSpanSchema, LanguageTagSchema,
    PolygonSchema, PointSchema) embedded as field types.

nvisy-engine:
  - registry/ salvaged from the pre-rebuild engine: CompositeKey
    actor scoping + fjall_ext utility wrappers + paged. Higher-
    level stores (content/audit/run) get redesigned alongside the
    request/result types they hold.
  - anonymizer/ folder module + per-modality compile_text/tabular/
    image/audio. Walks policies in precedence order; rules attach
    operators via the shared selector::attach helper (single-label/
    tag fast-path, predicate fallback). Stateful operators
    (Pseudonymize, Encrypt) reject with a clear "infrastructure
    not wired" error until vault + KeyProvider plumbing lands.
  - analyzer/ folder module + per-modality compile_*. Pattern
    (always Enhanced-wrapped, modality-generic across
    TextRecognizable), NER (Mock or Bento), LLM (Mock today; real
    providers reject pending credential wiring). Image gets the
    OCR enricher path so a Layout can be stamped before
    recognition runs. Tabular/Audio reject LLM (no upstream
    LlmModality impl).
  - .because(...) on every attached rule: per-rule attribution is
    `Attribution::new("{policy}#{rule}")`; per-policy fallback is
    `"{policy}#<default>"`. Engine threads it via selector::
    {rule_attribution, default_attribution}. Note: this conflates
    PolicyDecisionRef-shaped engine provenance into elide's
    author-facing Attribution slot -- the semantics need a real
    pass before this ships (TBD).

elide-bento:
  - dropped speculative BentoClient + BentoParams wrappers --
    backends now cache `bentoml::Endpoint` directly + clone per
    call to layer `x-request-id`.
  - BentoError moves to pub(crate); every public surface returns
    `Result<_, elide_core::Error>`; internal `?` keeps working via
    the existing From impl.
  - ner/ and ocr/ each split into mod.rs + request.rs + response.rs.
    `WireNerResponse::decode` and `WireOcrResponse::decode` are
    methods; `post_recognize` is a `BentoNer` method.
  - elide-bento depends on elide-ner + elide-ocr directly (it
    implements their backend traits); the umbrella reach is for
    consumers wiring recognizers, not impls.

Engine + bento + core all compile clean and machete-clean. elide-fake
deferred (source still uses pre-rework `nvisy_core::*` paths).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
End-to-end engine over the elide toolkit: persisted policies and
contexts as versioned resources, a run lifecycle that owns analyze
and apply per-document, and a policy module collapsed onto one
composable predicate per rule.

**nvisy-core policy redesign.** `Rule` carries one `predicate:
Predicate` instead of the old `RuleKind` three-way split
(Label/Tag/Predicate). One shape on the wire; the engine
recognises degenerate single-label / single-tag predicates at
compile time and routes them back through elide's `with_label`
and `with_tag` fast paths. Composite predicates (`All`, `Any`,
`Not`) over `LabelOneOf`, `TagOneOf`, `Confidence`, `CoRef`
compose freely — `TagOneOf` inside `All` now evaluates correctly
because the closure receives the per-anonymizer `LabelCatalog`
(elide change in nvisycom/elide#93). `DocumentPredicate` (label /
metadata gating at the doc level) lives alongside in
`policy/document.rs`. Identity is UUID-keyed end to end — every
`Policy`, every `Rule`, every reviewer override lands in the
redaction's `Attribution` so audits trace back to the exact rule
that fired.

**nvisy-engine registry.** `RegistryHandle` opens the fjall
database and pre-opens six keyspaces — policies, contexts,
run_headers, run_docs, run_artifacts, run_inputs. Cheaply cloneable
(`Arc`-backed). Keys are `CompositeKey(actor, id)`, `TripleKey(actor,
run, doc)`, or `VersionedKey(actor, id, semver)` depending on the
resource shape.

**Resources.** `policies::{put, get, latest, list, delete}` and
`contexts::*` are symmetric: immutable per `(actor, id, version)`;
duplicate writes return `Conflict`; lookups by `(id, version)` or
the latest version via a prefix range scan.

**Run orchestrator.** `runs::start(handle, formats, actor, batch)`
mints a UUIDv7 run, persists every input's bytes plus a Queued per-
doc row, writes the run header in `Analyzing`, then fans the
analyzer out per document under `futures::buffer_unordered` with
a hard `tokio::time::timeout`. Each per-doc task decodes via the
codec, picks the modality from `handle.is::<M>()`, compiles the
modality-specific analyzer from the `AnalyzerSpec`, recognises
entities via `analyze_stream`, persists them as `EntityRecord<M>`,
and transitions the row to `AwaitingReview` (or `Failed{reason}`
/ `TimedOut`). When the fan-out drains the header flips to
`AwaitingReview`.

`runs::apply` resolves every referenced policy, fans per-doc
anonymise out the same way, layers reviewer overrides as
high-precedence per-entity rules in front of the policy chain
(decorator pattern — first-match-wins gives overrides priority
without rewriting the policy set), filters policies by
`applies_when` against the merged descriptor + per-request
metadata, runs `Anonymizer::anonymize(&mut handle, &mut entities)`,
encodes back to bytes, writes the redacted artifact to
`run_artifacts`, and transitions the header to `Applied` or
`PartiallyApplied`. `runs::override_entity` lets reviewers stamp a
`RuleAction` onto a single recognised entity by id.

The four per-modality `compile_*` and per-modality `attach_*`
helpers under `engine::anonymizer/` keep the analyzer- and
anonymizer-compile surfaces split for clarity; the apply pipeline
reaches the `attach_policies_*` and `attach_override_*` entry
points directly so it can layer overrides before policies without
cloning the policy set.

Workspace `cargo check`/`build`/`clippy` are green on
`nvisy-engine`; pre-existing `elide-fake` breakage against the
old toolkit nvisy-core API surface is tracked separately under
the E4.1 deletion of legacy crates.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@martsokha martsokha self-assigned this Jun 24, 2026
@martsokha martsokha added feat request for or implementation of a new feature refactor code restructuring without behavior change core content model, errors, shared types engine redaction engine, pipeline runtime, orchestration, configuration architecture architectural decision records and cross-cutting design issues labels Jun 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

architecture architectural decision records and cross-cutting design issues core content model, errors, shared types engine redaction engine, pipeline runtime, orchestration, configuration feat request for or implementation of a new feature refactor code restructuring without behavior change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant