PDFwithText

This project supports the use of positional text within a PDF file that displays an image. Our use case is for newspapers but the same approach could be used for book pages and other types of materials. The PDF handling has been done with the iText library, and the project is built with Maven. The pieces can be pulled together with:

mvn assembly:assembly

The jar with all of the needed libraries should end up in the target directory and everything is brought together in PDFwithText-exe.jar. ODW uses a simple XML format for OCR text that provides coordinates for individual terms on an image:

<word x1="1973" y1="725">november<ends x2="2453" y2="777"/></word>

The command line options are:

usage: PDFwithText
-b,--black            set page background colour to black.
-h,--help             show help.
-i,--input <arg>      input image (required).
-o,--output <arg>     output PDF file (default name from image).
-p,--pagesize <arg>   L - LETTER (default), T - TABLOID, A - A4.
-v,--verbose          show underlying text on image.
-x,--xmlfile <arg>    specify XML file (default name from image)

For example:

java -jar PDFwithText-exe.jar -b -i 1935-01-03-0001.jpg

This puts a black background on the image, and only specifies an input file. In this scenario, the XML file has the same name format (1935-01-03-0001.xml), and the resulting PDF file has a similar pattern (1935-01-03-0001.pdf). The XML file and output PDF file can be specified directly if different naming conventions are used.

art rhyno ourdigitalworld/cdigs

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
src		src
LICENSE		LICENSE
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDFwithText

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PDFwithText

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages