Cropilot AI

This repository hosts models used to predict tranformation steps needed to extract pages from scanned books and other printed media. It is split into base_model_trainer (main codebase for the ML models), scripts (data analysis), and cropilot_api_tools (set of scripts for creating new content in the app). We use 2 models to create page predictions:

Finetuned YOLO model

A finetuned YOLO network based on YOLO11s (see: https://docs.ultralytics.com/models/yolo11/). It is used to predict the number of pages in a document together with its location.

RotateNET

RotateNET is a ResNET based model used to predict angle of each page.

Dataset creation

Input

The dataset is created based on ScanTailor metadata files. Your folders should follow this structure:

scan-id/
├─ rawdata/
│  ├─ 1/
│  │  └─ <*.tif images>
│  ├─ 2/
│  └─ ...
└─ scanTailor/
    ├─ 1.scanTailor
    ├─ 2.scanTailor
    └─ ...

Steps

Compress the input data
- Run:
```
base_model_trainer/create_dataset/compress_input_images.py
```
- The script compressed the input structure described above from tifs into jpgs, and saves in in a format used by other scripts.
Extract ScanTailor metadata
- Run:
```
base_model_trainer/create_dataset/extract_scantailor_data.py
```
- This script extracts crop coordinates and other metadata from the .scanTailor files and saves them as metadata.json files, stored in their respective folder.
- It also cleans mistakes in the training data and assigns objects to classes: page, back title cover, and unified doublepage.
Create dataset structure
- Run:
```
base_model_trainer/create_dataset/create_yolo_dataset.py
```
- It consumes the JSONs produced in step 1 and arranges files into the structure expected by Ultralytics YOLO (train / val / test). See: https://docs.ultralytics.com/datasets/detect/
- Images are padded by 10 % from left/right. This ensures rotation augmentation can be applied without getting page edges out ouf frame.
Assign classes and clean up
- Run:
```
base_model_trainer/create_dataset/assign_classes_and_cleanup.py
```
- This script cleans mistakes in the training data and assigns objects to classes: page, back title cover, and unified doublepage.

Output

You can use the output structure as an input for rotate and crop finetune nets.

Training

Scripts for model finetuning are stored in base_model_trainer.training.crop_train and base_model_trainer.training.rotate_train. Both models utilize the same dataset. Training reports are periodically saved to CometML, set your environment variable COMET_ML_API_KEY to enable this.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
base_model_trainer		base_model_trainer
base_models		base_models
cropilot_api_tools		cropilot_api_tools
scripts		scripts
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
setup-runpod.sh		setup-runpod.sh
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cropilot AI

Finetuned YOLO model

RotateNET

Dataset creation

Input

Steps

Output

Training

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Cropilot AI

Finetuned YOLO model

RotateNET

Dataset creation

Input

Steps

Output

Training

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages