Skip to content

Commit 550178f

Browse files
committed
Improve orientation predictor results for long words/lines
While trying to get EasyOCR and PaddleOCR text detection models to work, a problem popped-up with the OnnxTR orientation predictor model. It works fine for words, and since OnnxTR text detection step outputs words, it works fine. But in EasyOCR and PaddleOCR the output is lines, which by their nature can be very long. And when long lines get squeezed to fit a 256x256 square with a ratio-preserving resize, too much information is lost in one of the dimensions, which causes the output to be pretty random. As a workaround, now we will truncate input images, when the ratio is too extreme. On the surface, this shouldn't cause any issues, as even if you truncate a perfectly fine word, it would still look like text, so there will still be enough information to work with. The truncation is done symmetrically on both sides, which is pretty important. This way we have a higher chance to catch text and not the margins/padding. Doing it only on one side gives worse results. Somewhat surprisingly, based on the tests that required updating, it also improved results there with the docTR model, which is also a positive sign.
1 parent dea212f commit 550178f

15 files changed

Lines changed: 244 additions & 14 deletions

File tree

pdfocr-onnxtr/src/main/java/com/itextpdf/pdfocr/onnxtr/orientation/OnnxOrientationPredictor.java

Lines changed: 29 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,12 @@ This file is part of the iText (R) project.
3939
public class OnnxOrientationPredictor
4040
extends AbstractOnnxPredictor<BufferedImage, TextOrientation>
4141
implements IOrientationPredictor {
42+
/**
43+
* If an input image ratio exceeds this value, then the image will be
44+
* truncated.
45+
*/
46+
private static final double IMAGE_RATIO_LIMIT = 4;
47+
4248
/**
4349
* Configuration properties of the predictor.
4450
*/
@@ -94,8 +100,29 @@ public OnnxOrientationPredictorProperties getProperties() {
94100
*/
95101
@Override
96102
protected FloatBufferMdArray toInputBuffer(List<BufferedImage> batch) {
97-
// Just your regular BCHW input
98-
return BufferedImageUtil.toBchwInput(batch, properties.getInputProperties());
103+
/*
104+
* This orientation predictor was initially made based on the OnnxTR
105+
* one. There the output of the text detection model is words, which
106+
* are pretty narrow. So when they were resized to fit a square input
107+
* buffer, there were no issues.
108+
*
109+
* But in the other tools, like EasyOCR and PaddleOCR, the text
110+
* detection step outputs lines. And since they could be really wide,
111+
* when they are resized for the square input buffer, there will be,
112+
* like, 1 or 2 pixels in one of the dimensions, which is not enough
113+
* for the orientation model to work.
114+
*
115+
* To counteract that we will just truncate the images. This shouldn't
116+
* make the output worse, as you don't need the whole line or word to
117+
* figure out the orientation. On the other hand, not having one of the
118+
* dimensions getting degraded to nothing is much more useful.
119+
*/
120+
final List<BufferedImage> truncatedBatch = new ArrayList<>(batch.size());
121+
for (int i = 0; i < batch.size(); ++i) {
122+
truncatedBatch.add(BufferedImageUtil.truncateToRatio(batch.get(i), IMAGE_RATIO_LIMIT));
123+
}
124+
// After that it is just a regular BCHW input conversion
125+
return BufferedImageUtil.toBchwInput(truncatedBatch, properties.getInputProperties());
99126
}
100127

101128
/**

pdfocr-onnxtr/src/main/java/com/itextpdf/pdfocr/onnxtr/util/BufferedImageUtil.java

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -289,6 +289,47 @@ private static BufferedImage resize(BufferedImage image, int width, int height,
289289
return result;
290290
}
291291

292+
/**
293+
* Truncates the input image, so that neither width/height, nor
294+
* height/width ratios exceed the limit.
295+
*
296+
* <p>
297+
* If width/height ratio exceeds the limit, the image will be truncated
298+
* on left and right equally.
299+
*
300+
* <p>
301+
* If height/width ratio exceeds the limit, the image will be truncated
302+
* on top and bottom equally.
303+
*
304+
* @param image input image to truncate
305+
* @param ratioLimit target ratio limit
306+
*
307+
* @return the truncated image
308+
*/
309+
public static BufferedImage truncateToRatio(BufferedImage image, double ratioLimit) {
310+
final int width = image.getWidth();
311+
final int height = image.getHeight();
312+
313+
// If w/h ratio is too big, truncating by width
314+
final double imageRatio = (double) width / height;
315+
if (imageRatio > ratioLimit) {
316+
final int newWidth = Math.max(1, (int) (ratioLimit * height));
317+
final int newX = (width - newWidth) / 2;
318+
return image.getSubimage(newX, 0, newWidth, height);
319+
}
320+
321+
// If h/w ratio is too big, truncating by height
322+
final double imageRatioInv = 1. / imageRatio;
323+
if (imageRatioInv > ratioLimit) {
324+
final int newHeight = Math.max(1, (int) (ratioLimit * width));
325+
final int newY = (height - newHeight) / 2;
326+
return image.getSubimage(0, newY, width, newHeight);
327+
}
328+
329+
// Otherwise leaving as-is
330+
return image;
331+
}
332+
292333
private static void putRgbImageWithNormalization(
293334
FloatBuffer outputBuffer,
294335
BufferedImage image,
Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
/*
2+
This file is part of the iText (R) project.
3+
Copyright (c) 1998-2025 Apryse Group NV
4+
Authors: Apryse Software.
5+
6+
This program is offered under a commercial and under the AGPL license.
7+
For commercial licensing, contact us at https://itextpdf.com/sales. For AGPL licensing, see below.
8+
9+
AGPL licensing:
10+
This program is free software: you can redistribute it and/or modify
11+
it under the terms of the GNU Affero General Public License as published by
12+
the Free Software Foundation, either version 3 of the License, or
13+
(at your option) any later version.
14+
15+
This program is distributed in the hope that it will be useful,
16+
but WITHOUT ANY WARRANTY; without even the implied warranty of
17+
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
18+
GNU Affero General Public License for more details.
19+
20+
You should have received a copy of the GNU Affero General Public License
21+
along with this program. If not, see <https://www.gnu.org/licenses/>.
22+
*/
23+
package com.itextpdf.pdfocr.onnxtr.util;
24+
25+
import java.util.Objects;
26+
27+
/**
28+
* A basic 2-element tuple with a width and a height.
29+
*/
30+
public class Dimensions2D {
31+
private final int width;
32+
private final int height;
33+
34+
public Dimensions2D(int width, int height) {
35+
this.width = width;
36+
this.height = height;
37+
}
38+
39+
public int getWidth() {
40+
return width;
41+
}
42+
43+
public int getHeight() {
44+
return height;
45+
}
46+
47+
@Override
48+
public boolean equals(Object o) {
49+
if (o == null || this.getClass() != o.getClass()) {
50+
return false;
51+
}
52+
final Dimensions2D that = (Dimensions2D) o;
53+
return width == that.width && height == that.height;
54+
}
55+
56+
@Override
57+
public int hashCode() {
58+
return Objects.hash(width, height);
59+
}
60+
61+
@Override
62+
public String toString() {
63+
return width + "x" + height;
64+
}
65+
}
Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
/*
2+
This file is part of the iText (R) project.
3+
Copyright (c) 1998-2025 Apryse Group NV
4+
Authors: Apryse Software.
5+
6+
This program is offered under a commercial and under the AGPL license.
7+
For commercial licensing, contact us at https://itextpdf.com/sales. For AGPL licensing, see below.
8+
9+
AGPL licensing:
10+
This program is free software: you can redistribute it and/or modify
11+
it under the terms of the GNU Affero General Public License as published by
12+
the Free Software Foundation, either version 3 of the License, or
13+
(at your option) any later version.
14+
15+
This program is distributed in the hope that it will be useful,
16+
but WITHOUT ANY WARRANTY; without even the implied warranty of
17+
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
18+
GNU Affero General Public License for more details.
19+
20+
You should have received a copy of the GNU Affero General Public License
21+
along with this program. If not, see <https://www.gnu.org/licenses/>.
22+
*/
23+
package com.itextpdf.pdfocr.onnxtr.orientation;
24+
25+
import com.itextpdf.pdfocr.TextOrientation;
26+
import com.itextpdf.test.ExtendedITextTest;
27+
28+
import java.awt.image.BufferedImage;
29+
import java.io.File;
30+
import java.io.IOException;
31+
import java.util.Arrays;
32+
import java.util.Collections;
33+
import javax.imageio.ImageIO;
34+
import org.junit.jupiter.api.AfterAll;
35+
import org.junit.jupiter.api.Assertions;
36+
import org.junit.jupiter.api.BeforeAll;
37+
import org.junit.jupiter.api.Tag;
38+
import org.junit.jupiter.params.ParameterizedTest;
39+
import org.junit.jupiter.params.provider.MethodSource;
40+
41+
@Tag("IntegrationTest")
42+
public class OnnxOrientationPredictorTest extends ExtendedITextTest {
43+
private static final String TEST_DIRECTORY =
44+
"./src/test/resources/com/itextpdf/pdfocr/onnxtr/orientation/OnnxOrientationPredictorTest/";
45+
private static final String TARGET_DIRECTORY =
46+
"./target/test/resources/com/itextpdf/pdfocr/onnxtr/orientation/OnnxOrientationPredictorTest/";
47+
private static final String MOBILENETV3 =
48+
"./src/test/resources/com/itextpdf/pdfocr/models/mobilenet_v3_small_crop_orientation-5620cf7e.onnx";
49+
private static IOrientationPredictor PREDICTOR;
50+
51+
@BeforeAll
52+
public static void beforeClass() {
53+
createOrClearDestinationFolder(TARGET_DIRECTORY);
54+
PREDICTOR = OnnxOrientationPredictor.mobileNetV3(MOBILENETV3);
55+
}
56+
57+
@AfterAll
58+
public static void afterClass() throws Exception {
59+
PREDICTOR.close();
60+
}
61+
62+
public static Iterable<Object[]> predictWithLongLinesParams() {
63+
return Arrays.asList(new Object[][] {
64+
{TextOrientation.HORIZONTAL, "line_0.png"},
65+
{TextOrientation.HORIZONTAL_ROTATED_90, "line_90.png"},
66+
{TextOrientation.HORIZONTAL_ROTATED_180, "line_180.png"},
67+
{TextOrientation.HORIZONTAL_ROTATED_270, "line_270.png"},
68+
});
69+
}
70+
71+
@ParameterizedTest(name = "predictWithLongLines: {1}")
72+
@MethodSource("predictWithLongLinesParams")
73+
public void predictWithLongLines(TextOrientation expectedResult, String inputFileName) throws IOException {
74+
final BufferedImage inputImage = ImageIO.read(new File(TEST_DIRECTORY + inputFileName));
75+
final TextOrientation actualResult = PREDICTOR.predict(Collections.singleton(inputImage)).next();
76+
Assertions.assertEquals(expectedResult, actualResult);
77+
}
78+
}

pdfocr-onnxtr/src/test/java/com/itextpdf/pdfocr/onnxtr/util/BufferedImageUtilTest.java

Lines changed: 24 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,8 @@ This file is part of the iText (R) project.
3434
import org.junit.jupiter.api.Assertions;
3535
import org.junit.jupiter.api.Tag;
3636
import org.junit.jupiter.api.Test;
37+
import org.junit.jupiter.params.ParameterizedTest;
38+
import org.junit.jupiter.params.provider.MethodSource;
3739

3840
@Tag("UnitTest")
3941
public class BufferedImageUtilTest extends ExtendedITextTest {
@@ -66,6 +68,23 @@ public void toBchwInputRgbBasicTest() {
6668
toBchwInputBasicTest(expectedShape, expectedData, images, props);
6769
}
6870

71+
public static Iterable<Object[]> truncateToRatioTestParams() {
72+
return Arrays.asList(new Object[][] {
73+
{new Dimensions2D(100, 20), new Dimensions2D(100, 20), 8.},
74+
{new Dimensions2D(160, 20), new Dimensions2D(1000, 20), 8.},
75+
{new Dimensions2D(100, 800), new Dimensions2D(100, 2000), 8.},
76+
});
77+
}
78+
79+
@ParameterizedTest(name = "truncateToRatioTest: {1}")
80+
@MethodSource("truncateToRatioTestParams")
81+
public void truncateToRatioTest(Dimensions2D expectedSize, Dimensions2D inputSize, double ratioLimit) {
82+
final BufferedImage img = newBlankInputImage(inputSize);
83+
final BufferedImage truncated = BufferedImageUtil.truncateToRatio(img, ratioLimit);
84+
Assertions.assertEquals(expectedSize.getWidth(), truncated.getWidth());
85+
Assertions.assertEquals(expectedSize.getHeight(), truncated.getHeight());
86+
}
87+
6988
private static void toBchwInputBasicTest(
7089
long[] expectedShape,
7190
float[] expectedData,
@@ -79,10 +98,14 @@ private static void toBchwInputBasicTest(
7998
Assertions.assertArrayEquals(expectedData, actualData, 1E-6F);
8099
}
81100

101+
private static BufferedImage newBlankInputImage(Dimensions2D dims) {
102+
return new BufferedImage(dims.getWidth(), dims.getHeight(), BufferedImage.TYPE_3BYTE_BGR);
103+
}
104+
82105
private static BufferedImage newRgbImage(int width, int height, int[] pixels) {
83106
final BufferedImage img = new BufferedImage(width, height, BufferedImage.TYPE_INT_RGB);
84107
final WritableRaster raster = img.getRaster();
85108
raster.setDataElements(0, 0, width, height, pixels);
86109
return img;
87110
}
88-
}
111+
}
Binary file not shown.

pdfocr-onnxtr/src/test/resources/com/itextpdf/pdfocr/OnnxCreateTxtFileTest/cmp_createTxtFile.txt

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -33,15 +33,14 @@ X?
3333
a
3434
what is Lovew
3535
Lorem Ipsum simply dummy text of the printing typesetting
36-
industry. Lorem Ipsum has been the standard dummy
36+
industry. Lorem Ipsum has been the industry's standard dummy
3737
text ever since the 1500s, when an unknown printer took galley
3838
of type and scrambled it to make type specimen book. It has
3939
survived not only five centuries, but also the leap into electronic
4040
typesetting remaining essentially unchanged. It was popularised in
4141
the 1960s with the release of Letraset sheets containing Lorem
4242
Ipsum passages, and more recently with desktop publishing
4343
software like Aldus PageMaker including versions of Lorem Ipsum.
44-
sAusnpur
4544
pue s!
4645
jwMsdI
4746
a

pdfocr-onnxtr/src/test/resources/com/itextpdf/pdfocr/OnnxTRCmykIntegrationTest/cmp_rainbowCmykNoProfileTest.txt

Lines changed: 5 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -6,27 +6,23 @@ cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non pr
66
qui officia deserunt mollit anim id est laborum.
77
Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium,
88
totam rem aperiam. eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae vitae dicta
9-
sunt explicabo, Nemo enim ipsam quia voluptas sit aut odit aut fugit, sed quia
9+
sunt explicabo, Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia
1010
consequuntur magni dolores eos ratione voluptatem sequi nesciunt. Neque porro quisquam est,
1111
qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit, sed quia non numquam eius modi
1212
tempora incidunt ut labore et dolore magnam aliquam quaerat voluptatem, Ut enim ad minima
13-
veniam, quis nostrum ullam corporis suscipit laboriosam, nisi ut aliquid ex ea
13+
veniam, quis nostrum exercitatationem ullam corporis suscipit laboriosam, nisi ut aliquid ex ea
1414
commodi consequatur? Quis autem vel eum iure reprehenderit qui ea voluptate velit esse quam
1515
nihil molestiae consequatur, vel illum qui dolorem eum fugiat volupt-as nulla pariatur?
16-
At vero eos et accusamus et justo odio dignissimos ducimus qui blanditiis voluptatum
16+
At vero eos et accusamus et justo odio dignissimos ducimus qui blanditiis praesentium voluptatum
1717
deleniti atque corrupti dolores et quas molestias excepturi sint occaecati cupiditate non
1818
provident, similique sunt in culpa qui officia deserunt mollitia animi, id est laborum et dolorum fuga.
1919
Et harum quidem rerum facilis est et expedita distinctio, Nam libero tempore, cum soluta nobis est
2020
eligendi optio cumque nihil impedit minus id maxime placeat facere possimus, omnis
2121
voluptas assumenda est, omnis dolor repellendus. Temporibus autem quibusdam et aut officiis
22-
debitis aut rerum saepe eveniet ut et voluptates repudiandae sint et molestiae non
22+
debitis aut rerum necessitatibus saepe eveniet ut et voluptates repudiandae sint et molestiae non
2323
recusandae. Itaque earum rerum hic tenetur a sapiente delectus, ut aut reiciendis voluptatibus
2424
maiores alias consequatur aut perferendis doloribus asperiores repellat.
25-
singessepou
2625
ponb onb
2726
sonb
28-
unguaseeid
2927
onb
30-
weuopepexe
31-
Inb
32-
ngeuadse wajeidnjoA
28+
Inb

0 commit comments

Comments
 (0)