feat(ner): schema-driven GLiNER2 NER service (single engine) by martsokha · Pull Request #28 · nvisycom/inference

martsokha · 2026-06-12T16:17:01Z

Reworks nvisy-ner into a self-hosted, schema-driven information-extraction service backed by a single SOTA engine — GLiNER2. The earlier commits on this branch explored a multi-engine design (GLiNER v1 + GLiNER2 + token-classification); the branch converges on GLiNER2 alone, because it tops span-level F1 for self-hosted PII (arxiv 2605.09973) and its schema interface subsumes what the other engines did separately.

History is kept intentionally — the multi-engine commits show how the design was reached. The final commit (feat(ner): rework as schema-driven GLiNER2 service) is the end state being proposed.

Contract (`nvisy_core/ner/v1.py`) — schema-driven rewrite

Request: text + schema + threshold. The schema composes three optional, verified-against-the-real-model groups:
- entities — zero-shot spans (per-label description steers the model)
- classifications — single- or multi-label text classification
- structures — named records of fields (each field: dtype, choices, description, regex pattern)
Response: entities, classifications, structures, modelId. Labels stay model-native; the runtime consumer owns taxonomy mapping.

Service (`nvisy-ner`) — single model, self-hosted

One GLiNER2 model from NVISY_NER_MODEL (default fastino/gliner2-privacy-filter-PII-multi). No whitelist, no per-request model, no engine selection.
Collapsed registry.py + backends/ → config.py + engine.py. engine.py translates the wire schema to gliner2.Schema, runs batch_extract, projects the result (confidence → score).
Dropped gliner + transformers direct deps (gliner2 keeps them transitively).

Security (the self-hosting differentiator)

Offline: HF_HUB_OFFLINE / TRANSFORMERS_OFFLINE baked into the service — verified the model loads + infers fully offline.
Reject, don't truncate: the encoder caps at 512 tokens and silently truncates above it (verified) — which would drop PII in the tail. Over-length input is rejected (NVISY_NER_MAX_TOKENS).
No payload logging; a regression test asserts the hosted GLiNER2API/from_api path is never referenced.

Verification

51 default tests pass (faked — contract, config, schema translation, projection, over-length, schema-grouping, security regression).
3 real-marked tests pass against the actual GLiNER2 weights (entities + classification + structured record, offline load+infer, over-length). Gated to the opt-in real-models CI job.
Full service boots and serves offline end-to-end against the real model.
Lint, format, OpenAPI --check, requirements --check all clean.

Every gliner2 API shape was verified against fastino/gliner2-privacy-filter-PII-multi, not assumed from docs (the docs were wrong several times).

Flags for review

Breaking ner.v1 rewrite, no migration shim (pre-release).
The runtime-side consumer must adopt the schema-driven request + structured response — coordinated change, not in this PR.
Branch name (spike/ner-multi-model) is now a misnomer; left as-is to preserve PR continuity.

🤖 Generated with Claude Code

Rewrite the NER service as multi-model: load a configurable whitelist of Hugging Face backends at startup and let each request pick one by id. Spans come back with each model's own native labels and the modelId that produced them; mapping labels onto a canonical vocabulary moves to the consumer (the runtime owns that map), keeping this service engine-native. Contract (ner.v1): - NerRequest: drop `kinds`, add optional `model` (HF id; None = default). - Entity.label is the model-native string, not an EntityKind; classProbs keyed by native labels. - NerResponse.modelId reports the model that ran. Service (nvisy-ner): - registry.py wraps GLiNER and HF token-classification pipelines behind one Backend interface; dispatch by id prefix. - Preload NVISY_NER_MODELS (config, not code); 400 on an unloaded id; NVISY_NER_DEFAULT_MODEL (or first listed) serves requests omitting `model`. - Per-model + per-param batched dispatch within a BentoML batch. Remove the taxonomy from this repo: delete nvisy_core.entity (EntityKind / EntityCategory) and nvisy_ner.label_map. Add transformers as a direct dep; regenerate requirements.txt and docs/openapi/ner.json; refresh the NER README, docs, and design doc (gliner.md -> ner.md). Breaking change: ner.v1 wire contract changes with no migration shim. The coordinated runtime change (consume `model`, own the label map) is tracked separately in nvisy-runtime. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The Audit job flagged three pre-existing CVEs in the workspace lock: - aiohttp 3.13.5 (CVE-2026-34993, CVE-2026-47265): fixed in 3.14.0. aiohttp is transitive via bentoml and not declared directly, so a lock edit alone would be reverted on the next resolve. Add a workspace-level `constraint-dependencies` floor (>=3.14.0) — uv's mechanism for pinning a transitive dep without making it a direct dependency — so the fix sticks. - torch 2.12.0 (CVE-2025-3000): torch.jit.script memory corruption, CVSS 5.3, local; no patched release exists, so no bump resolves it. Ignore it explicitly in the audit step with a comment to drop the flag once upstream ships a fix. Relock + regenerate per-service requirements.txt. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Three bugs found by checking the registry against the real libraries; all were masked by fakes that matched the wrong assumptions. GLiNER is zero-shot and has NO built-in label list — labels must be supplied at call time. The previous code read a nonexistent `config.labels`, fell back to `[]`, and so silently found zero entities (the default backend was a no-op). Fix: add an optional `labels` field to NerRequest; GLiNER requires it (zero-shot models without labels are rejected with 400 via MissingLabelsError -> InvalidArgument), token-classification models carry their own labels and ignore it. Also switch GLiNER to its batched `inference()` entry point (predict_entities/batch_predict_entities are the per-item / deprecated paths) and read its real span keys (text/label/score/start/end). HF token-classification pipeline (verified against transformers 5.1): - start/end come back as None when the tokenizer has no offset mapping; int(None) would crash. Drop offset-less spans instead of emitting broken entities. - the span label key is "entity_group" under aggregation but "entity" with strategy "none"; read entity_group with an "entity" fallback. Dispatch now groups by (labels, threshold, return_class_probs) so each backend call shares a uniform label set. Tests rewritten to mirror the real APIs, plus direct parser unit tests for the None-offset / fallback-key / threshold paths. Regenerate ner.json; refresh README + design doc to state that GLiNER labels come from the request. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Close out the remaining review corners: - Startup: wrap each backend load so a misconfigured whitelist fails liveness with an error naming the offending model id, instead of half-loading and 500-ing on the first request (#7). - threshold: document in the contract that it is the model's native confidence threshold for GLiNER but a post-inference score filter for token classification — same cutoff, different interaction with span selection (#6). - Resources: README note that every whitelisted model loads into each worker, so cpu/memory and batch size must be tuned to the list (#8). - Real-model coverage (#5): the suite fakes models by repo convention (no CI downloads). Add a manual smoke script that runs the real GLiNER and token-classification backends end to end, and verified locally that both produce correct native-labelled spans with valid offsets — confirming the fakes match the real libraries. README documents how to run it. The token-cls batched return shape (#4) was already verified against the transformers source during the previous fix; the list-normalisation path is unit-tested. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Faked tests can't catch a fake that encodes the wrong library API — exactly the GLiNER bug that shipped green. Add real-model tests that close that gap without taxing every PR: - New tests/test_real_models.py (marked `real`): load the actual GLiNER and token-classification models and assert native-labelled spans with valid offsets. Replaces the standalone smoke script. - Register the `real` marker and exclude it by default (addopts `-m "not real"`) in both the workspace root and the nvisy-ner package config — pytest uses the nearest config, so it must be declared in both. Default suite stays fast and fully faked (53 passed, 2 deselected, <1s). - New `real-models` CI job runs `pytest -m real`, gated to schedule (weekly) + workflow_dispatch only — never on PRs/pushes, since it downloads weights and runs inference. Verified locally: `pytest -m real` -> 2 passed against real weights. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Replace the multi-engine NER service with a single, SOTA engine — GLiNER2 — and a schema-driven contract. GLiNER2 tops span-level F1 for self-hosted PII extraction (arxiv 2605.09973) and its schema interface (entities + classification + structured records) subsumes what zero-shot NER and fixed-taxonomy token classifiers did separately, so one engine serves the whole contract. Contract (nvisy_core/ner/v1.py) — clean rewrite, schema-driven: - Request: text + Schema(entities | classifications | structures) + threshold. - Response: entities, classifications (single or multi-label), structures (named records of field -> spans), modelId. - Labels stay model-native; the consumer owns taxonomy mapping. Service (nvisy-ner) — single model, self-hosted: - One GLiNER2 model from NVISY_NER_MODEL (default fastino/gliner2-privacy-filter-PII-multi). No whitelist, no per-request model. - engine.py translates the wire schema to gliner2.Schema (incl. per-field regex validators), runs batch_extract, projects the result (confidence -> score). - Collapse registry.py + backends/ into config.py + engine.py. - Drop gliner + transformers direct deps (gliner2 keeps them transitively). Security (the self-hosting differentiator): - HF_HUB_OFFLINE / TRANSFORMERS_OFFLINE baked into the service; verified the model loads + infers fully offline. - Reject inputs over the 512-token limit rather than letting the model silently truncate (and miss PII in the tail). - No payload logging; regression test asserts the hosted GLiNER2API/from_api path is never referenced. Every gliner2 API shape (batch_extract, the result dict for all three task groups, the tokenizer, offline load, quantize) was verified against the real model, not assumed from docs. Tests: faked default suite + real-marked engine tests; README + docs/design/ner.md rewritten. OpenAPI + requirements regenerated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

martsokha self-assigned this Jun 12, 2026

martsokha added refactor code restructuring without behavior change ner NLP inference service: GLiNER, NER, etc. labels Jun 12, 2026

martsokha and others added 5 commits June 13, 2026 05:40

martsokha changed the title ~~feat(ner): per-request model selection over multiple backends~~ feat(ner): schema-driven GLiNER2 NER service (single engine) Jun 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ner): schema-driven GLiNER2 NER service (single engine)#28

feat(ner): schema-driven GLiNER2 NER service (single engine)#28
martsokha wants to merge 6 commits into
mainfrom
spike/ner-multi-model

martsokha commented Jun 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

martsokha commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Contract (nvisy_core/ner/v1.py) — schema-driven rewrite

Service (nvisy-ner) — single model, self-hosted

Security (the self-hosting differentiator)

Verification

Flags for review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

martsokha commented Jun 12, 2026 •

edited

Loading

Contract (`nvisy_core/ner/v1.py`) — schema-driven rewrite

Service (`nvisy-ner`) — single model, self-hosted