Pipeline and tools for HTR and HTR-OMR alignment on medieval chant manuscripts.
The primary artifact is run_pipeline.py, which runs a folio image through line segmentation,
optional Cantus text alignment, and word/syllable geometry generation. Two visualization tools
(Pipeline Inspector GUI and PAGE XML Viewer) let you inspect the output.
For comparative segmentation experiments and PyLaia baselines, see experiments/README.md.
run_pipeline.py runs a single folio image through the full pipeline:
- Kraken BLLA — baseline line segmentation
- Column clustering — auto-detect 1 vs 2 columns; sort lines into reading order
- Co-linear segment fusion — fuse BLLA sub-segments belonging to the same physical text line (≥50% y-extent overlap) into logical lines, correcting BLLA over-segmentation on chant manuscripts with neume notation
- Kraken HTR — text recognition per logical line
- NW chant allocator (skipped in OCR-only mode) — align Cantus CSV text to detected
lines via Needleman-Wunsch, using volpiano break markers as alignment anchors; supports
folio-to-folio continuation via JSON sidecar (
--folio-state-out); in no-volpiano mode, automatically locates where this folio's first chant begins via NW matching and assigns pre-start lines to the previous folio's continuation (seelocate_folio_startandpre_start_suffix_aligninsteps/README.md) - Word segmentation — distribute ground-truth Cantus words across each line's pixel extent; falls back to OCR word splitting when no Cantus text is available (OCR-only mode or lines with no match)
- Syllable segmentation — subdivide each word node into character-proportional
syllable regions using Latin syllabification from
volpiano-display-utilities
See steps/README.md for details on each step.
python run_pipeline.py \
--image path/to/folio.jpg \
--folio "006r" \
--source-id 123672 \
--export-json ~/Downloads/DDMAL/006r.jsonKey flags:
| Flag | Description |
|---|---|
--folio STR |
Folio identifier (e.g. "006r"). Required when --csv or --source-id is given; defaults to the image filename stem in OCR-only mode. |
--source-id INT |
Cantus source ID (fetched from cantusdatabase.org). Omit with --csv to enter OCR-only mode. |
--csv PATH |
Local Cantus-format CSV file. Omit with --source-id to enter OCR-only mode. |
--segmentation-model PATH |
Custom Kraken BLLA model (.mlmodel or .safetensors); omit for Kraken built-in |
--column-count {1,2} |
Declare column count; skips bimodal auto-detection |
--recognition-model PATH |
Kraken HTR model; defaults to Tridis if installed |
--stub-mode |
Skip text recognition; pipeline still runs using ground-truth text |
--line-offset N |
Skip first N volpiano line-break markers (for cropped images; Cantus mode only) |
--prev-folio-state PATH |
JSON sidecar from the previous folio run (post-77 continuation words; Cantus mode only) |
--folio-state-out PATH |
Write folio state JSON for the next folio run (Cantus mode only) |
--export-json PATH |
Write output JSON for the Pipeline Inspector GUI |
--mei-json PATH |
Write MEI Text Alignment JSON (flat syl_boxes format) |
--debug-ocr |
Print per-line OCR transcripts and NW alignment detail; in OCR-only mode also prints a startup banner and lists any ignored flags |
OCR-only mode: When neither --csv nor --source-id is given, the pipeline skips
Cantus data loading and NW alignment entirely. Steps 1–4 run normally; word boundaries come
from OCR word splitting and syllables are Latin-syllabified from the OCR text. The exported
JSON will contain "mode": "ocr_only" instead of "cantus_aligned". Flags --line-offset,
--prev-folio-state, and --folio-state-out are ignored with a warning.
# OCR-only (no Cantus data needed)
python run_pipeline.py \
--image path/to/folio.jpg \
--export-json ~/Downloads/DDMAL/folio.jsonRecognition model: The Tridis model (Tridis_Medieval_EarlyModern.mlmodel) is used by
default if installed via htrmopo. To install:
python -m htrmopo get 10.5281/zenodo.7899855Use --stub-mode to skip recognition entirely (pipeline still produces GT word/syllable geometry).
Multi-folio runs (manual):
# First folio
python run_pipeline.py --image 006r.jpg --folio 006r --source-id 123672 \
--export-json ~/Downloads/DDMAL/006r.json --folio-state-out state_006r.json
# Next folio, with continuation from the previous
python run_pipeline.py --image 007v.jpg --folio 007v --source-id 123672 \
--prev-folio-state state_006r.json --export-json ~/Downloads/DDMAL/007v.jsonMulti-folio runs (automated): Use run_chain.py to chain any number of consecutive
folios in a single command — intermediate FolioState sidecar files are managed
automatically:
python run_chain.py \
--images 006r.jpg 007v.jpg 008r.jpg \
--folios 006r 007v 008r \
--source-id 123672 \
--export-json ~/Downloads/DDMAL/006r.json \
~/Downloads/DDMAL/007v.json \
~/Downloads/DDMAL/008r.json| Flag | Description |
|---|---|
--images PATH [...] |
Ordered folio image paths |
--folios STR [...] |
Folio identifiers matching the CSV (same order as --images) |
--export-json PATH [...] |
One output JSON path per folio; parent dirs created automatically |
--folio-states-dir PATH |
Save intermediate state_{folio}.json files here for debugging |
--debug-ocr |
Print per-line OCR and NW alignment detail for every folio |
All model and device flags from run_pipeline.py (--segmentation-model,
--recognition-model, --device, --stub-mode, --column-count,
--column-bimodal-threshold) are forwarded unchanged to every folio run.
The chain aborts on the first failure to avoid propagating corrupt state.
The manual --prev-folio-state / --folio-state-out approach above remains
available for one-off runs or non-consecutive folios.
A browser-based viewer for inspecting pipeline output — folio image overlaid with line polygons, word bounding boxes, and syllable regions, with per-layer toggles.
Live: https://ddmal.github.io/mothra-text/ — load any folio image + pipeline JSON
generated by run_pipeline.py --export-json, no install required.
Word box colors: teal = Cantus ground truth, rose = OCR fallback (no GT available).
See gui/README.md for usage and local development instructions.
A lightweight Python desktop viewer for inspecting PAGE XML annotation files overlaid on
their source manuscript images. Useful for verifying ground-truth annotations produced by
scripts/mothra_to_page.py or any other PAGE XML source without needing a browser.
Launch:
python page_viewer.py # open files via dialog
python page_viewer.py annotation.xml # load XML, locate image interactively
python page_viewer.py image.jpg annotation.xml # pre-load both on startupFeatures:
- Renders TextRegions, TextLines, Words, Baselines, and Glyphs as colour-coded overlays
- Per-layer visibility toggles
- Scroll-wheel zoom centred on cursor; click-and-drag pan
- Click an annotation to see its ID, type, text, and attributes in the sidebar
- No extra dependencies beyond Pillow (already required by other scripts)
Utility and conversion scripts in scripts/:
| Script | Description |
|---|---|
mothra_to_page.py |
Convert Mothra Annotator JSON → PAGE XML (for BLLA training data) |
convert_to_mei_input.py |
Convert pipeline JSON → MEI Text Alignment JSON |
debug_column_detection.py |
Visualize bimodal column detection coverage profile |
See scripts/README.md for usage.
mothra-text/
├── experiments/ # comparative research (not part of main pipeline)
│ ├── README.md # experiments documentation
│ ├── run_htrflow.py # YOLO/RTMDet segmentation runner
│ ├── run_all.py # runs all three models
│ ├── pipelines/ # htrflow YAML configs for YOLO and RTMDet
│ └── pylaia_baseline/ # zero-shot PyLaia HTR baselines
├── gui/ # Pipeline Inspector browser app
│ └── README.md
├── scripts/ # utility and conversion scripts
│ └── README.md
├── steps/ # pipeline step implementations
│ ├── column_clustering.py
│ ├── ground_truth_word_segmentation.py
│ ├── gt_manifest.py
│ ├── kraken_recognition.py
│ ├── kraken_segmentation.py
│ ├── nw_chant_allocator.py
│ ├── syllable_segmentation.py
│ └── README.md
├── tests/ # pytest suite (200+ tests)
├── page_viewer.py # PAGE XML Viewer desktop GUI
├── run_kraken.py # standalone Kraken BLLA runner + visualization
├── run_pipeline.py # end-to-end pipeline (single folio)
└── run_chain.py # automated multi-folio chaining wrapper
Folio images and model outputs are stored on HuggingFace, not in this repo. Pull them locally before running:
# Pull folio images → data/folios/
ddmal-hfsync pull-groundtruth --shared --dir data
# Pull model outputs → outputs/
ddmal-hfsync pull-runs --project mothra-text --model kraken --dir outputs/kraken_bllaSee DDMAL/ddmal_hfsync for setup instructions
(~/.hfconfig must be configured).
conda create -n line-seg-eval python=3.10 -y
conda activate line-seg-eval
pip install htrflow kraken biopython volpiano-display-utilitiesPython environment note: Always use the full path to the conda Python — pyenv intercepts the bare
pythoncommand and picks the wrong interpreter:/Users/cassiebastress/miniconda3/envs/line-seg-eval/bin/python run_pipeline.py ...
For experiment-specific dependencies (OpenMMLab stack for RTMDet, PyLaia environment),
see experiments/README.md.
/Users/cassiebastress/miniconda3/envs/line-seg-eval/bin/python -m pytest tests/ -v