Skip to content

DanyGoT/pocr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pocr — snip & OCR with PaddleOCR

Snip a region of the screen, OCR it with PaddleOCR, copy the result to the clipboard. Two modes:

command output
pocr-text recognized text (English, line-separated)
pocr-math LaTeX source of a math formula (PP-FormulaNet)

Both fire a desktop notification with a short preview when done. Works on X11 (maim + xclip) and Wayland (grim + slurp + wl-copy); the right tools are picked at runtime.

Requirements

  • uv
  • One screenshot + clipboard pair, depending on session:
    • X11: maim, xclip, libnotify (notify-send)
    • Wayland: grim, slurp, wl-clipboard, libnotify

Install — Linux (Ubuntu / Fedora / Arch / …)

./install.sh

Runs uv sync and symlinks pocr-text / pocr-math into ~/.local/bin/. After that the commands are on your PATH like any other binary.

Install — NixOS

A proper Nix package via uv2nix: every Python dep is built into /nix/store, autoPatchelfHook fixes the prebuilt PaddlePaddle / OpenCV wheels so libGL & friends resolve, and the wrapper scripts have the screenshot/clipboard tools on PATH.

Via your NixOS flake (recommended)

Add the flake to your inputs:

inputs.pocr = {
  url = "github:DanyGoT/pocr";
  inputs.nixpkgs.follows = "nixpkgs";
};

Pass it into your modules (via specialArgs on your nixosSystem call) and add the package:

{ pkgs, pocr, ... }: {
  environment.systemPackages = [
    pocr.packages.${pkgs.system}.default
  ];
}

Then sudo nixos-rebuild switchpocr-text and pocr-math are on PATH.

Ad-hoc

nix run github:DanyGoT/pocr#pocr-text
nix run github:DanyGoT/pocr#pocr-math

Development

Editable, uv-managed venv inside a Nix-provided shell:

nix develop
uv sync

Usage

pocr-text          # draw a rectangle, get text
pocr-math          # draw a rectangle around a formula, get LaTeX

pocr-text --file img.png        # OCR a file, skip the snip
pocr-text --print               # also echo result to stdout

The first call to either command downloads its PaddleOCR model into ~/.paddlex/official_models/ (a few hundred MB; one-time).

Daemon

The client commands talk to a long-lived pocrd process over a UNIX socket at $XDG_RUNTIME_DIR/pocrd.sock. The daemon holds the PaddleOCR engines warm so consecutive snips skip the paddle bootstrap and stay around 500 ms each.

pocrd is auto-started on demand the first time you run pocr-text or pocr-math after boot — no service to manage. Latency:

call wall-clock
first pocr-text (daemon cold) ~4–5 s
every subsequent pocr-text ~500–700 ms
first pocr-math (formula engine) ~5–6 s
every subsequent pocr-math ~700 ms–1 s

The daemon shuts itself down after 2 minutes of idle (no incoming requests), releasing PaddleOCR's ~1.6 GB resident set. The next snip auto-respawns it (~5 s cold). To change the timeout:

export POCR_IDLE_TIMEOUT=600   # ten minutes
export POCR_IDLE_TIMEOUT=0     # disable; daemon stays warm forever

The daemon logs to ~/.cache/pocr/daemon.log. To restart it manually:

pkill pocrd      # client will spawn a fresh one on the next snip

To run the daemon in the foreground (for debugging):

pocrd            # blocks; ctrl-c to stop

Keyboard shortcuts

After install.sh, pocr-text / pocr-math are on PATH and bind like any other command.

i3

bindsym $mod+Shift+s exec --no-startup-id pocr-text
bindsym $mod+Shift+m exec --no-startup-id pocr-math

sxhkd

super + shift + s
    pocr-text

super + shift + m
    pocr-math

Hyprland

bind = SUPER SHIFT, S, exec, pocr-text
bind = SUPER SHIFT, M, exec, pocr-math

GNOME / KDE

Add a custom shortcut in Settings → Keyboard → Shortcuts pointing at pocr-text / pocr-math.

Tuning

  • Text language. Edit pocr/ocr.py, change lang="en" to e.g. "es", "ch", "fr". Full list in the PaddleOCR docs.
  • Formula model. Edit pocr/ocr.py, change model_name="PP-FormulaNet_plus-M" to "PP-FormulaNet-S" for a quarter the size and faster cold start (some accuracy lost), or "UniMERNet" for Chinese + older documents.
  • GPU. Change device="cpu" to "gpu" in pocr/ocr.py if you have CUDA-enabled PaddlePaddle installed.

Paddle-side workarounds (universal, not NixOS-specific)

A few one-liners in the code paper over known PaddlePaddle wheel bugs that hit every Linux user, not just NixOS — each is in one place with a comment:

  • pocr/ocr.py sets FLAGS_use_mkldnn=0 + enable_mkldnn=False because PaddlePaddle 3.x's PIR executor crashes on OneDNN ops in CPU inference.
  • pocr/ocr.py sets PADDLE_PDX_DISABLE_MODEL_SOURCE_CHECK=True so first-run doesn't hang for ~30s probing model hosters we can't reach.
  • pocr/cli.py exits via os._exit because paddle's C++ thread pools aren't daemon threads and would otherwise keep the interpreter alive long after the OCR result is already on the clipboard.

Remove any of these once the corresponding upstream issue is fixed.

About

Vibe-coded screenshot -> OCR -> clipboard tool

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors