PDF to Text Converter

A Python script that converts PDF files to text using the docling library. This tool is designed to batch process PDF files, making it easy to extract text content from multiple documents at once.

Features

Automatically processes all PDF files in the assets directory
Creates text output files in the output directory
Maintains the original filename (with .txt extension)
Handles conversion errors gracefully
Reports conversion progress and status

Requirements

Python 3.x
docling library
Virtual environment (included in the repository)

Project Structure

docling_converter/
├── assets/            # Place PDF files here
├── output/           # Converted text files appear here
├── env/             # Virtual environment
├── main.py          # Main conversion script
└── README.md        # This file

Setup

Ensure Python 3.x is installed on your system
Activate the virtual environment:
- Windows: .\env\Scripts\activate
- Unix/MacOS: source env/bin/activate

Usage

Place your PDF files in the assets directory
Run the script:
```
python main.py
```
Find the converted text files in the output directory

How It Works

The script uses the docling library's DocumentConverter to process PDF files. For each PDF file in the assets directory, it:

Creates an output directory if it doesn't exist
Converts the PDF content to text
Saves the text with the same filename (but .txt extension) in the output directory
Reports success or any errors encountered

Error Handling

The script includes error handling for individual file conversions. If a file fails to convert, the script:

Prints an error message with the specific file name
Continues processing remaining files
Maintains a log of any conversion failures

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
assets		assets
output		output
.gitignore		.gitignore
README.md		README.md
docling.txt		docling.txt
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDF to Text Converter

Features

Requirements

Project Structure

Setup

Usage

How It Works

Error Handling

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PDF to Text Converter

Features

Requirements

Project Structure

Setup

Usage

How It Works

Error Handling

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages