Migrate to elide toolkit; build engine resource model + orchestrator#287
Open
martsokha wants to merge 7 commits into
Open
Migrate to elide toolkit; build engine resource model + orchestrator#287martsokha wants to merge 7 commits into
martsokha wants to merge 7 commits into
Conversation
Workspace gains `elide`, `elide-core`, `elide-llm` as git deps tracking
`nvisycom/elide`'s main branch. `nvisy-engine`/`nvisy-server`/`nvisy-cli`
drop the per-modality `rich` feature (gone from elide; collapsed into
parent modalities with sub-handlers) and the LLM provider toggles
(`openai`/`anthropic`/`google`/`bento`) — all provider backends now
enabled by default through elide-llm.
Engine's manifest now consumes elide+elide-core+elide-llm in place of
the local `nvisy-{core,context,pattern,ner,llm,codec,toolkit}` crates.
Those local crates remain on disk and as workspace path-deps so the
not-yet-migrated consumers (server/cli/fake/ocr/stt/toolkit) keep
parsing; each leaves the workspace alongside its consumer's migration.
Engine source still imports `nvisy_*` paths and will not compile until
the import rewire pass lands (E3.2c).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…sy-core
Bulk teardown of the runtime's toolkit half now that elide ships
upstream equivalents.
Deleted:
- nvisy-{pattern,llm,ner,ocr,stt}: superseded by elide-{pattern,llm,
ner,ocr,stt} on nvisycom/elide main
- nvisy-codec: superseded by elide-codec
- nvisy-context: superseded by elide-context
- nvisy-toolkit: superseded by the elide umbrella crate (Analyzer +
Anonymizer + deduplication layers + operators all ship there)
Renamed:
- nvisy-fake -> elide-fake (runtime-owned extension over elide types;
source still uses nvisy_core paths and will be reworked in its own
pass)
Created:
- elide-bento: shared BentoML HTTP client wrapper for elide backends
(per E0.3 plan; minimal boilerplate -- BentoClient + BentoParams +
BentoError; per-modality backends compose from this in the consuming
crates)
Slimmed nvisy-core to {error, health, policy}:
- dropped entity/extraction/modality/primitive/recognition/redaction
(all re-exported from elide-core at consumer sites)
- moved nvisy-engine/src/policy/ -> nvisy-core/src/policy/ (policy is
the runtime's public governance contract; engine consumes it)
- added elide-core as nvisy-core's only upstream dep so Policy types
reference elide_core::entity::Label directly
State of the migration: workspace parses; elide-bento is the only crate
that compiles end-to-end. nvisy-core/engine/server/cli/elide-fake source
still imports deleted nvisy-* paths and will be redesigned crate-by-
crate on top of elide's Analyzer/Anonymizer/Orchestrator surface.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
nvisy-core trimmed to {service, policy, schema, context-gated-out}.
Policy is now the runtime's serializable governance spec, structurally
intact from before but rewired to reference elide vocabulary:
- Label / LabelRef / LabelCatalog from elide_core::entity (renamed
from EntityLabel*)
- ConfidenceThreshold from elide_core::primitive
- Modality + per-modality types (Text/Tabular/Image/Audio) from
elide_core::modality
- OperatorId from elide_core::redaction
- HashAlgorithm mirrored locally (elide's enum is not serializable
upstream by design; nvisy-core's wire spec owns its vocabulary)
PolicyModality replaces DocumentModality: a runtime-side extension of
elide_core::modality::Modality that pairs each modality with its
serializable redaction spec enum (TextRedaction, ImageRedaction, …).
TextRedaction::Redact renamed to Erase aligning with elide's operator
vocabulary (commit 3598798). No legacy alias — wire format is the
elide vocabulary.
Encrypt operator dropped from the wire enum: reversible AES-256-GCM
needs raw key material, can't safely live in declarative config.
Deployments register custom encrypt operators by OperatorId.
JsonSchema derives kept on policy types. Elide stays free of schemars
(schema generation is an HTTP concern the toolkit doesn't model);
nvisy-core proxies elide types via #[schemars(with = "...")] pointing
at lightweight proxy structs in nvisy-core::schema. Lifted schema to a
top-level module so anything in nvisy-core embedding elide types can
reuse it.
Service module added at crates/nvisy-core/src/service/{mod,error,health}
grouping the runtime's error + healthcheck concerns under one roof.
Context module gated out (// pub mod context) until it gets its own
redesign pass on elide types — its old body referenced deleted
nvisy_core::entity/primitive paths and missed the jiff dep.
nvisy-core now compiles standalone against elide + elide-core.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…core
nvisy-core wire shapes now embed `*Schema` types directly (no
`#[schemars(with = "...")]` overrides). schema.rs gains round-trip
`From`/`Into` impls so engine consumes nvisy-core specs as elide
types via `.into()`. The dual model (embed elide + proxy for
schema) is gone; one type owns each field.
- LabelSchema, OperatorIdSchema, PointSchema, BoundingBoxSchema,
PolygonSchema, TimeSpanSchema, LanguageTagSchema in
nvisy_core::schema, each with From/Into to its elide-core
counterpart (LanguageTag uses TryFrom for parse failures).
- Policy.labels: Vec<LabelSchema>; selector confidence is f32;
selector labels and tags become Vec<String> (HipStr/LabelRef are
internal vocabulary, not wire vocabulary).
- TextRedaction/ImageRedaction/AudioRedaction/TabularRedaction
`Custom { id }` carries OperatorIdSchema.
- PolicyModality trait removed: engine reads `redactions.text` etc.
directly per modality; the projection trait was dispatch sugar
not worth a trait hierarchy.
- EntitySelector::matches removed: matching runs in engine where
the catalog + entity live; nvisy-core only owns the spec.
Engine teardown:
- Wiped crates/nvisy-engine/src/{core,detection,document,modality,
redaction}. Old detection/redaction/document machinery is
superseded by elide's Analyzer/Anonymizer/Orchestrator/Report.
- Salvaged registry/{composite_key, fjall_ext, paged} as
multi-tenant storage primitives (no upstream equivalent;
CompositeKey actor scoping is genuine engine value).
- Engine src/ skeleton now: lib.rs, registry/, policy_compile.rs
placeholder for the upcoming compile pass.
- Engine Cargo.toml stripped to its actual deps: nvisy-core,
elide+elide-core+elide-llm, derive_more, uuid, bytes, fjall.
- Old engine tests deleted (rebuilt as new modules land).
nvisy-core also gains:
- src/source.rs (ContentSource) restored from git history as a
top-level module — content lineage tracking is service-level,
not entity-vocabulary.
- src/service/{mod,error,health} groups the runtime's error +
healthcheck concerns.
`cargo check -p nvisy-core` green; `cargo check -p elide-bento`
green; `cargo machete` reports no unused deps in nvisy-core. Engine
will be filled in module-by-module on top of the new compile seam.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…n wiring
nvisy-core:
- schema/ becomes a folder module split by elide type: label,
operator, geometry (Point/BoundingBox/Polygon), time, language,
color, waveform. Each carries round-trip From/Into to its
elide-core counterpart so engine consumes specs as elide types
via `.into()`.
- policy/redaction expands to the full operator catalogue elide
ships: Text + Tabular get Erase/Keep/Mask/Replace/Hash/
Pseudonymize/Encrypt (Tabular adds DropRow/DropColumn); Image
gets Erase/Keep/Blur/Pixelate/Blackbox; Audio gets Erase/Keep/
Silence/Beep. `Custom` escape hatches removed -- every wire
operator is predefined and surfaces here when it lands in elide.
- new plan/ module sibling to policy/: AnalyzerSpec ties together
RecognizerSpec (Pattern/Ner/Llm with inline backend configs),
EnricherSpec (Language/Ocr), DeduplicationSpec (calibrate +
fusion strategy + resolution strategy + min_confidence), and
ScopeSpec (languages + jurisdictions). Pure data; JsonSchema
derives across.
- ContentSource restored as top-level `source` module; context
module re-enabled with elide-core primitive imports + schema
proxies (BoundingBoxSchema, TimeSpanSchema, LanguageTagSchema,
PolygonSchema, PointSchema) embedded as field types.
nvisy-engine:
- registry/ salvaged from the pre-rebuild engine: CompositeKey
actor scoping + fjall_ext utility wrappers + paged. Higher-
level stores (content/audit/run) get redesigned alongside the
request/result types they hold.
- anonymizer/ folder module + per-modality compile_text/tabular/
image/audio. Walks policies in precedence order; rules attach
operators via the shared selector::attach helper (single-label/
tag fast-path, predicate fallback). Stateful operators
(Pseudonymize, Encrypt) reject with a clear "infrastructure
not wired" error until vault + KeyProvider plumbing lands.
- analyzer/ folder module + per-modality compile_*. Pattern
(always Enhanced-wrapped, modality-generic across
TextRecognizable), NER (Mock or Bento), LLM (Mock today; real
providers reject pending credential wiring). Image gets the
OCR enricher path so a Layout can be stamped before
recognition runs. Tabular/Audio reject LLM (no upstream
LlmModality impl).
- .because(...) on every attached rule: per-rule attribution is
`Attribution::new("{policy}#{rule}")`; per-policy fallback is
`"{policy}#<default>"`. Engine threads it via selector::
{rule_attribution, default_attribution}. Note: this conflates
PolicyDecisionRef-shaped engine provenance into elide's
author-facing Attribution slot -- the semantics need a real
pass before this ships (TBD).
elide-bento:
- dropped speculative BentoClient + BentoParams wrappers --
backends now cache `bentoml::Endpoint` directly + clone per
call to layer `x-request-id`.
- BentoError moves to pub(crate); every public surface returns
`Result<_, elide_core::Error>`; internal `?` keeps working via
the existing From impl.
- ner/ and ocr/ each split into mod.rs + request.rs + response.rs.
`WireNerResponse::decode` and `WireOcrResponse::decode` are
methods; `post_recognize` is a `BentoNer` method.
- elide-bento depends on elide-ner + elide-ocr directly (it
implements their backend traits); the umbrella reach is for
consumers wiring recognizers, not impls.
Engine + bento + core all compile clean and machete-clean. elide-fake
deferred (source still uses pre-rework `nvisy_core::*` paths).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
End-to-end engine over the elide toolkit: persisted policies and contexts as versioned resources, a run lifecycle that owns analyze and apply per-document, and a policy module collapsed onto one composable predicate per rule. **nvisy-core policy redesign.** `Rule` carries one `predicate: Predicate` instead of the old `RuleKind` three-way split (Label/Tag/Predicate). One shape on the wire; the engine recognises degenerate single-label / single-tag predicates at compile time and routes them back through elide's `with_label` and `with_tag` fast paths. Composite predicates (`All`, `Any`, `Not`) over `LabelOneOf`, `TagOneOf`, `Confidence`, `CoRef` compose freely — `TagOneOf` inside `All` now evaluates correctly because the closure receives the per-anonymizer `LabelCatalog` (elide change in nvisycom/elide#93). `DocumentPredicate` (label / metadata gating at the doc level) lives alongside in `policy/document.rs`. Identity is UUID-keyed end to end — every `Policy`, every `Rule`, every reviewer override lands in the redaction's `Attribution` so audits trace back to the exact rule that fired. **nvisy-engine registry.** `RegistryHandle` opens the fjall database and pre-opens six keyspaces — policies, contexts, run_headers, run_docs, run_artifacts, run_inputs. Cheaply cloneable (`Arc`-backed). Keys are `CompositeKey(actor, id)`, `TripleKey(actor, run, doc)`, or `VersionedKey(actor, id, semver)` depending on the resource shape. **Resources.** `policies::{put, get, latest, list, delete}` and `contexts::*` are symmetric: immutable per `(actor, id, version)`; duplicate writes return `Conflict`; lookups by `(id, version)` or the latest version via a prefix range scan. **Run orchestrator.** `runs::start(handle, formats, actor, batch)` mints a UUIDv7 run, persists every input's bytes plus a Queued per- doc row, writes the run header in `Analyzing`, then fans the analyzer out per document under `futures::buffer_unordered` with a hard `tokio::time::timeout`. Each per-doc task decodes via the codec, picks the modality from `handle.is::<M>()`, compiles the modality-specific analyzer from the `AnalyzerSpec`, recognises entities via `analyze_stream`, persists them as `EntityRecord<M>`, and transitions the row to `AwaitingReview` (or `Failed{reason}` / `TimedOut`). When the fan-out drains the header flips to `AwaitingReview`. `runs::apply` resolves every referenced policy, fans per-doc anonymise out the same way, layers reviewer overrides as high-precedence per-entity rules in front of the policy chain (decorator pattern — first-match-wins gives overrides priority without rewriting the policy set), filters policies by `applies_when` against the merged descriptor + per-request metadata, runs `Anonymizer::anonymize(&mut handle, &mut entities)`, encodes back to bytes, writes the redacted artifact to `run_artifacts`, and transitions the header to `Applied` or `PartiallyApplied`. `runs::override_entity` lets reviewers stamp a `RuleAction` onto a single recognised entity by id. The four per-modality `compile_*` and per-modality `attach_*` helpers under `engine::anonymizer/` keep the analyzer- and anonymizer-compile surfaces split for clarity; the apply pipeline reaches the `attach_policies_*` and `attach_override_*` entry points directly so it can layer overrides before policies without cloning the policy set. Workspace `cargo check`/`build`/`clippy` are green on `nvisy-engine`; pre-existing `elide-fake` breakage against the old toolkit nvisy-core API surface is tracked separately under the E4.1 deletion of legacy crates. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
# Conflicts: # Cargo.lock
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Switches the runtime off the in-tree
nvisy-{core,context,pattern,ner,llm,codec,toolkit}crates and onto the upstream elide toolkit, then builds the engine — multi-tenant resource registry + run orchestrator — on top of it.End-state on this branch:
elideis a workspace dep (elide,elide-core,elide-ner,elide-ocr). Recognition primitives, modality types, codec, anonymizer, and provenance all come from upstream.nvisy-coreslims to{error, health, policy, plan, schema}— primitives, modality, codec, recognition no longer live here; they come fromelide-core/elide.nvisy-engineis the new home for everything runtime-shaped:analyzer/— compile a per-requestAnalyzerSpecintoelide::Analyzer<M>per modality (text / tabular / image / audio).anonymizer/— compilePolicysets intoelide::Anonymizer<M>per modality; pattern-matches single-label / single-tagPredicateshapes onto elide's indexed fast paths, falls through towith_catalog_predicatefor composites.registry/—RegistryHandleover fjall: six keyspaces (policies, contexts, run_headers, run_docs, run_artifacts, run_inputs) under(actor, …)keys,Arc-backed for cheap clone.policies/andcontexts/— symmetric CRUD over versioned resource blobs.runs/— two-phase run lifecycle:startmints a UUIDv7 run, persists inputs, fans the analyzer out per doc underfutures::buffer_unorderedwith a hardtokio::time::timeout, then flips the header toAwaitingReview.applyresolves policies, fans the anonymiser out the same way, layers reviewer overrides as high-precedence per-entity rules in front of the policy chain (decorator pattern, first-match-wins gives them priority), writes the redacted bytes torun_artifacts, and flips the header toApplied/PartiallyApplied.override_entitylets reviewers stamp aRuleActiononto one recognised entity by id.elide-bentois a new local crate carrying the BentoML NER + OCR backends that aren't suitable for upstream (vendor-specific).Rulecarries one composablePredicateinstead of the oldLabel/Tag/Predicatethree-way kind enum. Reviewer overrides + policy rules both flow through the sameAttributionslot —policy_idfrom the policy's UUID,reasonfrom the rule's UUID (or the entity id for overrides) — so audits trace every redaction back to the exact rule that fired.Pairs with nvisycom/elide#93 (already merged) which added
Anonymizer::with_catalog_predicateso closures can see the per-anonymizerLabelCatalog— unblocksPredicate::TagOneOfinside composite predicates.Commits
cb4c7445deps: add elide as upstream toolkit; wire engine to elide deps7b848254workspace: delete superseded crates; add elide-bento; slim nvisy-core1033d9aacore: redesign policy module on elide-core typesdc167145core,engine: pivot to schema-types-as-wire, conversions in core786f4ad8core,engine,bento: full operator set, analyzer plan, attribution wiringad7e9418core,engine: engine resource model + run orchestratorTest plan
cargo check -p nvisy-enginegreencargo build -p nvisy-enginegreencargo clippy -p nvisy-engine --no-depsgreen (three pre-existingtoo_many_argumentslints on the orchestrator fan-out helpers — flagged in code, fixed by the EngineHandle bundle in the next slice)elide-fakestill references the pre-migrationnvisy_coreAPI surface (nvisy_core::primitive::LanguageTag,nvisy_core::redaction::Anonymizer, etc.) and won't compile. Tracked separately as E4.1 — delete legacy crates after this lands.nvisy-serverandnvisy-clistill pin to the old toolkit deps. Switching them is E3.3 / E3.4 — they'll land after the engine surface stabilises.Follow-ups
nvisy-server+nvisy-clito the new engine surfacecrates/nvisy-{core,context,pattern,ner,llm,codec,toolkit}directories andelide-fakeRegistryHandle+FormatRegistryinto one handle, eliminates the threetoo_many_argumentslints on the orchestrator helpers🤖 Generated with Claude Code