📄 Image2FILE

A GTK4 application that converts document images to organized PDFs using OCR technology. It automatically detects page numbers, organizes documents, and allows manual corrections when OCR fails.

Features

Parallel OCR Processing: Uses multiple CPU cores for faster image processing
Automatic Page Detection: Extracts page numbers using Tesseract OCR
Manual Correction: Interactive dialog for correcting OCR failures
Smart Organization: Automatically organizes PDFs by page numbers
Cache System: Skips already processed images to avoid reprocessing
Modern UI: Built with GTK4 and Libadwaita for a native Linux experience
Real-time Logs: Live monitoring of processing status and errors
Configurable Settings: Adjustable maximum pages and processing threads

Prerequisites

System Requirements

Linux operating system
Python 3.8 or higher
GTK4 development libraries
Tesseract OCR engine

Installing System Dependencies

Ubuntu/Debian

sudo apt update
sudo apt install python3 python3-pip tesseract-ocr tesseract-ocr-por libgtk-4-dev libadwaita-1-dev

Fedora

sudo dnf install python3 python3-pip tesseract tesseract-langpack-por gtk4-devel libadwaita-devel

Arch Linux

sudo pacman -S python python-pip tesseract tesseract-data-por gtk4 libadwaita

Installation

Clone the repository:

git clone https://github.com/EmanuProds/ncx-book-organizer.git
cd img2doc

Create a virtual environment (recommended):

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install Python dependencies:

pip install pytesseract pillow pygobject

Usage

Activate the virtual environment (if created):

source venv/bin/activate

Run the application:

python main.py

How to Use

Select Input Directory: Choose the folder containing your document images (JPG/JPEG)
Select Output Directory: Choose where the organized PDFs will be saved
Configure Settings (optional):
- Maximum pages: Set the total number of pages in your document
- Number of processes: Adjust parallel processing (0 = auto-detect)
Start Processing: Click "Start Processing" and monitor progress in the Logs tab
Manual Corrections: If OCR fails, the app will prompt for manual page number input

Output Structure

The application creates organized PDFs with the following naming convention:

FL. 001.pdf, FL. 002.pdf, etc. - Regular pages
FL. 001-verso.pdf - Back sides of pages
TERMO DE ABERTURA.pdf - Opening terms
TERMO DE ENCERRAMENTO.pdf - Closing terms
ERRO_OCR_filename.pdf - Files that couldn't be processed

Configuration

OCR Settings

Language: Portuguese (por)
PSM Mode: 6 (Uniform block of text)
ROI: Configurable region of interest for page number detection

Processing Settings

Maximum Pages: Default 300 pages
Parallel Processes: Default 4 workers
Cache System: Automatically detects and skips already processed files

Architecture

The application follows a modern, service-oriented architecture with clear separation of concerns:

src/
├── models.py           # Data models and domain entities (dataclasses & enums)
├── exceptions.py       # Custom exception hierarchy
├── config.py           # Application configuration
├── core.py             # Legacy processing logic (backward compatibility)
├── services/           # Modern service layer
│   ├── file_service.py     # File operations and caching
│   ├── ocr_service.py      # OCR processing and image manipulation
│   └── processing_service.py # Main processing coordination
├── interface/          # GTK4 UI layer
│   ├── entrypoint.py       # Application initialization
│   ├── gui.py              # Main window and navigation
│   ├── home.py             # Processing interface
│   ├── pref.py             # Preferences/settings page
│   ├── logs.py             # Logging interface
│   └── about.py            # About dialog
├── ocr.py              # Legacy OCR functions (deprecated)
└── __init__.py         # Package initialization

Performance Tips

Use SSD storage for faster I/O
Increase parallel processes for multi-core systems
Process images in batches for better cache utilization

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
src		src
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📄 Image2FILE

Features

Prerequisites

System Requirements

Installing System Dependencies

Ubuntu/Debian

Fedora

Arch Linux

Installation

Usage

How to Use

Output Structure

Configuration

OCR Settings

Processing Settings

Architecture

Performance Tips

License

About

Uh oh!

Releases

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

📄 Image2FILE

Features

Prerequisites

System Requirements

Installing System Dependencies

Ubuntu/Debian

Fedora

Arch Linux

Installation

Usage

How to Use

Output Structure

Configuration

OCR Settings

Processing Settings

Architecture

Performance Tips

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Contributors

Uh oh!

Languages