Skip to content

EmanuProds/image2file

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 

Repository files navigation

📄 Image2FILE

Python GTK Tesseract License: MIT

A GTK4 application that converts document images to organized PDFs using OCR technology. It automatically detects page numbers, organizes documents, and allows manual corrections when OCR fails.

Features

  • Parallel OCR Processing: Uses multiple CPU cores for faster image processing
  • Automatic Page Detection: Extracts page numbers using Tesseract OCR
  • Manual Correction: Interactive dialog for correcting OCR failures
  • Smart Organization: Automatically organizes PDFs by page numbers
  • Cache System: Skips already processed images to avoid reprocessing
  • Modern UI: Built with GTK4 and Libadwaita for a native Linux experience
  • Real-time Logs: Live monitoring of processing status and errors
  • Configurable Settings: Adjustable maximum pages and processing threads

Prerequisites

System Requirements

  • Linux operating system
  • Python 3.8 or higher
  • GTK4 development libraries
  • Tesseract OCR engine

Installing System Dependencies

Ubuntu/Debian

sudo apt update
sudo apt install python3 python3-pip tesseract-ocr tesseract-ocr-por libgtk-4-dev libadwaita-1-dev

Fedora

sudo dnf install python3 python3-pip tesseract tesseract-langpack-por gtk4-devel libadwaita-devel

Arch Linux

sudo pacman -S python python-pip tesseract tesseract-data-por gtk4 libadwaita

Installation

  1. Clone the repository:
git clone https://github.com/EmanuProds/ncx-book-organizer.git
cd img2doc
  1. Create a virtual environment (recommended):
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
  1. Install Python dependencies:
pip install pytesseract pillow pygobject

Usage

  1. Activate the virtual environment (if created):
source venv/bin/activate
  1. Run the application:
python main.py

How to Use

  1. Select Input Directory: Choose the folder containing your document images (JPG/JPEG)
  2. Select Output Directory: Choose where the organized PDFs will be saved
  3. Configure Settings (optional):
    • Maximum pages: Set the total number of pages in your document
    • Number of processes: Adjust parallel processing (0 = auto-detect)
  4. Start Processing: Click "Start Processing" and monitor progress in the Logs tab
  5. Manual Corrections: If OCR fails, the app will prompt for manual page number input

Output Structure

The application creates organized PDFs with the following naming convention:

  • FL. 001.pdf, FL. 002.pdf, etc. - Regular pages
  • FL. 001-verso.pdf - Back sides of pages
  • TERMO DE ABERTURA.pdf - Opening terms
  • TERMO DE ENCERRAMENTO.pdf - Closing terms
  • ERRO_OCR_filename.pdf - Files that couldn't be processed

Configuration

OCR Settings

  • Language: Portuguese (por)
  • PSM Mode: 6 (Uniform block of text)
  • ROI: Configurable region of interest for page number detection

Processing Settings

  • Maximum Pages: Default 300 pages
  • Parallel Processes: Default 4 workers
  • Cache System: Automatically detects and skips already processed files

Architecture

The application follows a modern, service-oriented architecture with clear separation of concerns:

src/
├── models.py           # Data models and domain entities (dataclasses & enums)
├── exceptions.py       # Custom exception hierarchy
├── config.py           # Application configuration
├── core.py             # Legacy processing logic (backward compatibility)
├── services/           # Modern service layer
│   ├── file_service.py     # File operations and caching
│   ├── ocr_service.py      # OCR processing and image manipulation
│   └── processing_service.py # Main processing coordination
├── interface/          # GTK4 UI layer
│   ├── entrypoint.py       # Application initialization
│   ├── gui.py              # Main window and navigation
│   ├── home.py             # Processing interface
│   ├── pref.py             # Preferences/settings page
│   ├── logs.py             # Logging interface
│   └── about.py            # About dialog
├── ocr.py              # Legacy OCR functions (deprecated)
└── __init__.py         # Package initialization

Performance Tips

  • Use SSD storage for faster I/O
  • Increase parallel processes for multi-core systems
  • Process images in batches for better cache utilization

License

This project is licensed under the MIT License.

About

A GTK4 application that converts document images to organized PDFs using OCR technology. It automatically detects page numbers, organizes documents, and allows manual corrections when OCR fails.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Contributors

Languages