Skip to content

Future: add optional in-process PaddleOCR backend (oar-ocr) for zero-sidecar deployments #192

Description

@martsokha

Status

Deferred — depends on completion of inference externalization (#194).

Context

After #194 lands, the production OCR path will be the BentoML-backed `nvisy-inference-ocr` service called over HTTP. This works for SaaS and self-hosted Pro customers who can run a second container.

Some customers may want a zero-sidecar deployment — single Rust binary, no Python, no separate inference container. For them, an in-process Rust PaddleOCR backend remains valuable.

`oar-ocr` is the most mature option: PP-OCRv5, actively maintained, exposes raw detection polygons, Apache-2.0.

Why this is currently blocked (post-#194 perspective)

`oar-ocr` pins `ort = "2.0.0-rc.12"`. After #194:

  • The workspace no longer depends on `ort` at all (in-process inference removed).
  • So there's no version conflict anymore — adding `oar-ocr` would just pull `ort` for its own use, no resolver fight with gline-rs (which is also gone from the workspace).

However, both gline-rs and SemplificaAI/gliner2-rs deliberately stay on rc.9 because rc.11/12 have empirical hang issues. If we ever want both an in-process OCR and an in-process NER in the same Rust binary, the version war comes back. That's a constraint for future planning.

What this issue becomes

A future feature: add `OarOcrBackend` to `nvisy-ocr` behind an `in-process-ocr` cargo feature, as an alternative to the default `BentoMlBackend`. Customers who don't need GLiNER (or who use `BentoMlNerBackend` for NER but want in-process OCR) can enable it.

Triggers to revisit

  • Customer demand for zero-sidecar deployment in the self-hosted Pro tier
  • ort 2.0.0 final ships and rc.11/12 hang issues are resolved upstream
  • `oar-ocr` reaches a stable v1 release

Scope when implemented

  • Add `OarOcrBackend` (or similarly named) implementing `nvisy_ocr::Backend`
  • Bundled PP-OCRv5 model download via existing `nvisy-core::Downloader` pattern (mirroring `nvisy-nlp/preset/`)
  • Per-call language selection in `RunParams`
  • Matching `Page → Block → Line → Word` output shape
  • Gated behind `in-process-ocr` cargo feature, off by default
  • BentoML path stays the default

Metadata

Metadata

Assignees

No one assigned

    Labels

    featrequest for or implementation of a new featureocrOCR backends and providers

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions