Migrate to elide toolkit; build engine resource model + orchestrator by martsokha · Pull Request #287 · nvisycom/runtime

martsokha · 2026-06-24T16:59:25Z

Summary

Switches the runtime off the in-tree nvisy-{core,context,pattern,ner,llm,codec,toolkit} crates and onto the upstream elide toolkit, then builds the engine — multi-tenant resource registry + run orchestrator — on top of it.

End-state on this branch:

elide is a workspace dep (elide, elide-core, elide-ner, elide-ocr). Recognition primitives, modality types, codec, anonymizer, and provenance all come from upstream.
nvisy-core slims to {error, health, policy, plan, schema} — primitives, modality, codec, recognition no longer live here; they come from elide-core / elide.
nvisy-engine is the new home for everything runtime-shaped:
- analyzer/ — compile a per-request AnalyzerSpec into elide::Analyzer<M> per modality (text / tabular / image / audio).
- anonymizer/ — compile Policy sets into elide::Anonymizer<M> per modality; pattern-matches single-label / single-tag Predicate shapes onto elide's indexed fast paths, falls through to with_catalog_predicate for composites.
- registry/ — RegistryHandle over fjall: six keyspaces (policies, contexts, run_headers, run_docs, run_artifacts, run_inputs) under (actor, …) keys, Arc-backed for cheap clone.
- policies/ and contexts/ — symmetric CRUD over versioned resource blobs.
- runs/ — two-phase run lifecycle: start mints a UUIDv7 run, persists inputs, fans the analyzer out per doc under futures::buffer_unordered with a hard tokio::time::timeout, then flips the header to AwaitingReview. apply resolves policies, fans the anonymiser out the same way, layers reviewer overrides as high-precedence per-entity rules in front of the policy chain (decorator pattern, first-match-wins gives them priority), writes the redacted bytes to run_artifacts, and flips the header to Applied / PartiallyApplied. override_entity lets reviewers stamp a RuleAction onto one recognised entity by id.
elide-bento is a new local crate carrying the BentoML NER + OCR backends that aren't suitable for upstream (vendor-specific).
Policy module redesign. UUID identity end-to-end. Rule carries one composable Predicate instead of the old Label/Tag/Predicate three-way kind enum. Reviewer overrides + policy rules both flow through the same Attribution slot — policy_id from the policy's UUID, reason from the rule's UUID (or the entity id for overrides) — so audits trace every redaction back to the exact rule that fired.

Pairs with nvisycom/elide#93 (already merged) which added Anonymizer::with_catalog_predicate so closures can see the per-anonymizer LabelCatalog — unblocks Predicate::TagOneOf inside composite predicates.

Commits

cb4c7445 deps: add elide as upstream toolkit; wire engine to elide deps
7b848254 workspace: delete superseded crates; add elide-bento; slim nvisy-core
1033d9aa core: redesign policy module on elide-core types
dc167145 core,engine: pivot to schema-types-as-wire, conversions in core
786f4ad8 core,engine,bento: full operator set, analyzer plan, attribution wiring
ad7e9418 core,engine: engine resource model + run orchestrator

Test plan

cargo check -p nvisy-engine green
cargo build -p nvisy-engine green
cargo clippy -p nvisy-engine --no-deps green (three pre-existing too_many_arguments lints on the orchestrator fan-out helpers — flagged in code, fixed by the EngineHandle bundle in the next slice)
Out of scope this PR: elide-fake still references the pre-migration nvisy_core API surface (nvisy_core::primitive::LanguageTag, nvisy_core::redaction::Anonymizer, etc.) and won't compile. Tracked separately as E4.1 — delete legacy crates after this lands.
Out of scope this PR: nvisy-server and nvisy-cli still pin to the old toolkit deps. Switching them is E3.3 / E3.4 — they'll land after the engine surface stabilises.

Follow-ups

E3.3 / E3.4 — switch nvisy-server + nvisy-cli to the new engine surface
E3.6 — workspace gates green (after E4.1 deletes the legacy crates)
E4.1 / E4.2 — delete crates/nvisy-{core,context,pattern,ner,llm,codec,toolkit} directories and elide-fake
EngineHandle bundle — fold RegistryHandle + FormatRegistry into one handle, eliminates the three too_many_arguments lints on the orchestrator helpers

🤖 Generated with Claude Code

Workspace gains `elide`, `elide-core`, `elide-llm` as git deps tracking `nvisycom/elide`'s main branch. `nvisy-engine`/`nvisy-server`/`nvisy-cli` drop the per-modality `rich` feature (gone from elide; collapsed into parent modalities with sub-handlers) and the LLM provider toggles (`openai`/`anthropic`/`google`/`bento`) — all provider backends now enabled by default through elide-llm. Engine's manifest now consumes elide+elide-core+elide-llm in place of the local `nvisy-{core,context,pattern,ner,llm,codec,toolkit}` crates. Those local crates remain on disk and as workspace path-deps so the not-yet-migrated consumers (server/cli/fake/ocr/stt/toolkit) keep parsing; each leaves the workspace alongside its consumer's migration. Engine source still imports `nvisy_*` paths and will not compile until the import rewire pass lands (E3.2c). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…sy-core Bulk teardown of the runtime's toolkit half now that elide ships upstream equivalents. Deleted: - nvisy-{pattern,llm,ner,ocr,stt}: superseded by elide-{pattern,llm, ner,ocr,stt} on nvisycom/elide main - nvisy-codec: superseded by elide-codec - nvisy-context: superseded by elide-context - nvisy-toolkit: superseded by the elide umbrella crate (Analyzer + Anonymizer + deduplication layers + operators all ship there) Renamed: - nvisy-fake -> elide-fake (runtime-owned extension over elide types; source still uses nvisy_core paths and will be reworked in its own pass) Created: - elide-bento: shared BentoML HTTP client wrapper for elide backends (per E0.3 plan; minimal boilerplate -- BentoClient + BentoParams + BentoError; per-modality backends compose from this in the consuming crates) Slimmed nvisy-core to {error, health, policy}: - dropped entity/extraction/modality/primitive/recognition/redaction (all re-exported from elide-core at consumer sites) - moved nvisy-engine/src/policy/ -> nvisy-core/src/policy/ (policy is the runtime's public governance contract; engine consumes it) - added elide-core as nvisy-core's only upstream dep so Policy types reference elide_core::entity::Label directly State of the migration: workspace parses; elide-bento is the only crate that compiles end-to-end. nvisy-core/engine/server/cli/elide-fake source still imports deleted nvisy-* paths and will be redesigned crate-by- crate on top of elide's Analyzer/Anonymizer/Orchestrator surface. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

nvisy-core trimmed to {service, policy, schema, context-gated-out}. Policy is now the runtime's serializable governance spec, structurally intact from before but rewired to reference elide vocabulary: - Label / LabelRef / LabelCatalog from elide_core::entity (renamed from EntityLabel*) - ConfidenceThreshold from elide_core::primitive - Modality + per-modality types (Text/Tabular/Image/Audio) from elide_core::modality - OperatorId from elide_core::redaction - HashAlgorithm mirrored locally (elide's enum is not serializable upstream by design; nvisy-core's wire spec owns its vocabulary) PolicyModality replaces DocumentModality: a runtime-side extension of elide_core::modality::Modality that pairs each modality with its serializable redaction spec enum (TextRedaction, ImageRedaction, …). TextRedaction::Redact renamed to Erase aligning with elide's operator vocabulary (commit 3598798). No legacy alias — wire format is the elide vocabulary. Encrypt operator dropped from the wire enum: reversible AES-256-GCM needs raw key material, can't safely live in declarative config. Deployments register custom encrypt operators by OperatorId. JsonSchema derives kept on policy types. Elide stays free of schemars (schema generation is an HTTP concern the toolkit doesn't model); nvisy-core proxies elide types via #[schemars(with = "...")] pointing at lightweight proxy structs in nvisy-core::schema. Lifted schema to a top-level module so anything in nvisy-core embedding elide types can reuse it. Service module added at crates/nvisy-core/src/service/{mod,error,health} grouping the runtime's error + healthcheck concerns under one roof. Context module gated out (// pub mod context) until it gets its own redesign pass on elide types — its old body referenced deleted nvisy_core::entity/primitive paths and missed the jiff dep. nvisy-core now compiles standalone against elide + elide-core. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…core nvisy-core wire shapes now embed `*Schema` types directly (no `#[schemars(with = "...")]` overrides). schema.rs gains round-trip `From`/`Into` impls so engine consumes nvisy-core specs as elide types via `.into()`. The dual model (embed elide + proxy for schema) is gone; one type owns each field. - LabelSchema, OperatorIdSchema, PointSchema, BoundingBoxSchema, PolygonSchema, TimeSpanSchema, LanguageTagSchema in nvisy_core::schema, each with From/Into to its elide-core counterpart (LanguageTag uses TryFrom for parse failures). - Policy.labels: Vec<LabelSchema>; selector confidence is f32; selector labels and tags become Vec<String> (HipStr/LabelRef are internal vocabulary, not wire vocabulary). - TextRedaction/ImageRedaction/AudioRedaction/TabularRedaction `Custom { id }` carries OperatorIdSchema. - PolicyModality trait removed: engine reads `redactions.text` etc. directly per modality; the projection trait was dispatch sugar not worth a trait hierarchy. - EntitySelector::matches removed: matching runs in engine where the catalog + entity live; nvisy-core only owns the spec. Engine teardown: - Wiped crates/nvisy-engine/src/{core,detection,document,modality, redaction}. Old detection/redaction/document machinery is superseded by elide's Analyzer/Anonymizer/Orchestrator/Report. - Salvaged registry/{composite_key, fjall_ext, paged} as multi-tenant storage primitives (no upstream equivalent; CompositeKey actor scoping is genuine engine value). - Engine src/ skeleton now: lib.rs, registry/, policy_compile.rs placeholder for the upcoming compile pass. - Engine Cargo.toml stripped to its actual deps: nvisy-core, elide+elide-core+elide-llm, derive_more, uuid, bytes, fjall. - Old engine tests deleted (rebuilt as new modules land). nvisy-core also gains: - src/source.rs (ContentSource) restored from git history as a top-level module — content lineage tracking is service-level, not entity-vocabulary. - src/service/{mod,error,health} groups the runtime's error + healthcheck concerns. `cargo check -p nvisy-core` green; `cargo check -p elide-bento` green; `cargo machete` reports no unused deps in nvisy-core. Engine will be filled in module-by-module on top of the new compile seam. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…n wiring nvisy-core: - schema/ becomes a folder module split by elide type: label, operator, geometry (Point/BoundingBox/Polygon), time, language, color, waveform. Each carries round-trip From/Into to its elide-core counterpart so engine consumes specs as elide types via `.into()`. - policy/redaction expands to the full operator catalogue elide ships: Text + Tabular get Erase/Keep/Mask/Replace/Hash/ Pseudonymize/Encrypt (Tabular adds DropRow/DropColumn); Image gets Erase/Keep/Blur/Pixelate/Blackbox; Audio gets Erase/Keep/ Silence/Beep. `Custom` escape hatches removed -- every wire operator is predefined and surfaces here when it lands in elide. - new plan/ module sibling to policy/: AnalyzerSpec ties together RecognizerSpec (Pattern/Ner/Llm with inline backend configs), EnricherSpec (Language/Ocr), DeduplicationSpec (calibrate + fusion strategy + resolution strategy + min_confidence), and ScopeSpec (languages + jurisdictions). Pure data; JsonSchema derives across. - ContentSource restored as top-level `source` module; context module re-enabled with elide-core primitive imports + schema proxies (BoundingBoxSchema, TimeSpanSchema, LanguageTagSchema, PolygonSchema, PointSchema) embedded as field types. nvisy-engine: - registry/ salvaged from the pre-rebuild engine: CompositeKey actor scoping + fjall_ext utility wrappers + paged. Higher- level stores (content/audit/run) get redesigned alongside the request/result types they hold. - anonymizer/ folder module + per-modality compile_text/tabular/ image/audio. Walks policies in precedence order; rules attach operators via the shared selector::attach helper (single-label/ tag fast-path, predicate fallback). Stateful operators (Pseudonymize, Encrypt) reject with a clear "infrastructure not wired" error until vault + KeyProvider plumbing lands. - analyzer/ folder module + per-modality compile_*. Pattern (always Enhanced-wrapped, modality-generic across TextRecognizable), NER (Mock or Bento), LLM (Mock today; real providers reject pending credential wiring). Image gets the OCR enricher path so a Layout can be stamped before recognition runs. Tabular/Audio reject LLM (no upstream LlmModality impl). - .because(...) on every attached rule: per-rule attribution is `Attribution::new("{policy}#{rule}")`; per-policy fallback is `"{policy}#<default>"`. Engine threads it via selector:: {rule_attribution, default_attribution}. Note: this conflates PolicyDecisionRef-shaped engine provenance into elide's author-facing Attribution slot -- the semantics need a real pass before this ships (TBD). elide-bento: - dropped speculative BentoClient + BentoParams wrappers -- backends now cache `bentoml::Endpoint` directly + clone per call to layer `x-request-id`. - BentoError moves to pub(crate); every public surface returns `Result<_, elide_core::Error>`; internal `?` keeps working via the existing From impl. - ner/ and ocr/ each split into mod.rs + request.rs + response.rs. `WireNerResponse::decode` and `WireOcrResponse::decode` are methods; `post_recognize` is a `BentoNer` method. - elide-bento depends on elide-ner + elide-ocr directly (it implements their backend traits); the umbrella reach is for consumers wiring recognizers, not impls. Engine + bento + core all compile clean and machete-clean. elide-fake deferred (source still uses pre-rework `nvisy_core::*` paths). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

End-to-end engine over the elide toolkit: persisted policies and contexts as versioned resources, a run lifecycle that owns analyze and apply per-document, and a policy module collapsed onto one composable predicate per rule. **nvisy-core policy redesign.** `Rule` carries one `predicate: Predicate` instead of the old `RuleKind` three-way split (Label/Tag/Predicate). One shape on the wire; the engine recognises degenerate single-label / single-tag predicates at compile time and routes them back through elide's `with_label` and `with_tag` fast paths. Composite predicates (`All`, `Any`, `Not`) over `LabelOneOf`, `TagOneOf`, `Confidence`, `CoRef` compose freely — `TagOneOf` inside `All` now evaluates correctly because the closure receives the per-anonymizer `LabelCatalog` (elide change in nvisycom/elide#93). `DocumentPredicate` (label / metadata gating at the doc level) lives alongside in `policy/document.rs`. Identity is UUID-keyed end to end — every `Policy`, every `Rule`, every reviewer override lands in the redaction's `Attribution` so audits trace back to the exact rule that fired. **nvisy-engine registry.** `RegistryHandle` opens the fjall database and pre-opens six keyspaces — policies, contexts, run_headers, run_docs, run_artifacts, run_inputs. Cheaply cloneable (`Arc`-backed). Keys are `CompositeKey(actor, id)`, `TripleKey(actor, run, doc)`, or `VersionedKey(actor, id, semver)` depending on the resource shape. **Resources.** `policies::{put, get, latest, list, delete}` and `contexts::*` are symmetric: immutable per `(actor, id, version)`; duplicate writes return `Conflict`; lookups by `(id, version)` or the latest version via a prefix range scan. **Run orchestrator.** `runs::start(handle, formats, actor, batch)` mints a UUIDv7 run, persists every input's bytes plus a Queued per- doc row, writes the run header in `Analyzing`, then fans the analyzer out per document under `futures::buffer_unordered` with a hard `tokio::time::timeout`. Each per-doc task decodes via the codec, picks the modality from `handle.is::<M>()`, compiles the modality-specific analyzer from the `AnalyzerSpec`, recognises entities via `analyze_stream`, persists them as `EntityRecord<M>`, and transitions the row to `AwaitingReview` (or `Failed{reason}` / `TimedOut`). When the fan-out drains the header flips to `AwaitingReview`. `runs::apply` resolves every referenced policy, fans per-doc anonymise out the same way, layers reviewer overrides as high-precedence per-entity rules in front of the policy chain (decorator pattern — first-match-wins gives overrides priority without rewriting the policy set), filters policies by `applies_when` against the merged descriptor + per-request metadata, runs `Anonymizer::anonymize(&mut handle, &mut entities)`, encodes back to bytes, writes the redacted artifact to `run_artifacts`, and transitions the header to `Applied` or `PartiallyApplied`. `runs::override_entity` lets reviewers stamp a `RuleAction` onto a single recognised entity by id. The four per-modality `compile_*` and per-modality `attach_*` helpers under `engine::anonymizer/` keep the analyzer- and anonymizer-compile surfaces split for clarity; the apply pipeline reaches the `attach_policies_*` and `attach_override_*` entry points directly so it can layer overrides before policies without cloning the policy set. Workspace `cargo check`/`build`/`clippy` are green on `nvisy-engine`; pre-existing `elide-fake` breakage against the old toolkit nvisy-core API surface is tracked separately under the E4.1 deletion of legacy crates. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

# Conflicts: # Cargo.lock

martsokha and others added 6 commits June 22, 2026 16:18

martsokha self-assigned this Jun 24, 2026

Merge remote-tracking branch 'origin/main' into migrate/elide

5010636

# Conflicts: # Cargo.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Migrate to elide toolkit; build engine resource model + orchestrator#287

Migrate to elide toolkit; build engine resource model + orchestrator#287
martsokha wants to merge 7 commits into
mainfrom
migrate/elide

martsokha commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

martsokha commented Jun 24, 2026

Summary

Commits

Test plan

Follow-ups

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant