Skip to content

Future: reintroduce AWS Textract OCR backend #201

@martsokha

Description

@martsokha

Status

Deferred — removed in #195 to clear the deck for the externalised inference architecture (#194). Reintroduce as opt-in when there's user demand.

Context

AwsTextractBackend was a direct cloud OCR backend in nvisy-ocr/src/provider/aws_textract/, gated behind the aws-textract cargo feature (forwarded as amazon through nvisy-engine/nvisy-server/nvisy-cli). It used the AWS Sigv4 signing flow over reqwest-middleware, with the runtime calling Textract's AnalyzeDocument API directly.

The new architecture (#194) centres on externalised inference services (see nvisycom/inference) called over HTTP. Cloud OCR backends — Textract, Vision, DocAI — are conceptually closer to that path than to the in-process model backends, but each carries provider-specific request signing, retry, and error-shape work that doesn't pay off until a user actually deploys against it.

Rather than carry three dormant cloud backends through every refactor, this PR deletes them and tracks the reintroduction here.

What this issue becomes

Add AwsTextractBackend back to nvisy-ocr behind an aws-textract cargo feature when a customer or first-party deployment wants it.

Triggers to revisit

  • A customer requests Textract OCR for self-hosted runtime
  • We want Textract OCR as a fallback when the externalised Bento OCR is unavailable
  • We want a quick "no extra infrastructure" path for AWS-resident deployments

Scope when reintroduced

  • Restore nvisy-ocr/src/backend/aws_textract_backend.rs (mirroring the new backend/-based layout)
  • Restore the aws-textract feature on nvisy-ocr (with sha2 + hmac deps for Sigv4)
  • Restore the OcrBackend::AwsTextract { … } variant + OcrExtractor::from_config dispatch
  • Restore feature forwarding nvisy-ocr/aws-textractnvisy-engine/amazonnvisy-server/amazonnvisy-cli/amazon
  • Auth: pull AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY / AWS_REGION from env or config
  • Map AWS confidence (0–100) → 0.0..=1.0 on the wire types

Reference

The deleted code is preserved in git history; the last commit including it is the parent of the removal commit on #195.

Metadata

Metadata

Assignees

No one assigned

    Labels

    featrequest for or implementation of a new featureocrOCR backends and providers

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions