Skip to content

Commit 3eeec4f

Browse files
authored
Upgrade PDFBox 2.0.36 -> 3.0.7 (#345)
1 parent b806893 commit 3eeec4f

9 files changed

Lines changed: 1086 additions & 648 deletions

File tree

design-doc/3.0.0-openpdf3-vs-pdfbox3.md

Lines changed: 204 additions & 0 deletions
Large diffs are not rendered by default.

design-doc/3.0.0-pdfbox-upgrade.md

Lines changed: 196 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,196 @@
1+
# Upgrade PDFBox 2.0.27 → 3.0.x
2+
3+
## Context
4+
5+
The project currently pins `pdfbox.version=2.0.27` in `pom.xml`. Signing itself
6+
is done by OpenPDF (fork of iText 2.1), not PDFBox — so the PDFBox dependency
7+
has a very narrow footprint:
8+
9+
- **Main code (runtime):** one site in `jsignpdf/src/main/java/net/sf/jsignpdf/preview/Pdf2Image.java`
10+
`getImageUsingPdfBox()` uses `PDDocument.load(File, String)` +
11+
`PDFRenderer.renderImageWithDPI(...)`. PDFBox is one of three rendering
12+
strategies (`jpedal,pdfbox,pdfrenderer`) declared in `Constants.PDF2IMAGE_LIBRARIES_DEFAULT`
13+
and `distribution/conf/conf.properties`.
14+
- **Test code only:**
15+
- `jsignpdf/src/test/java/net/sf/jsignpdf/signing/SigningTestBase.java`
16+
builds a minimal unsigned PDF (`PDDocument`, `PDPage`, `PDPageContentStream`,
17+
`PDType1Font.HELVETICA`).
18+
- `jsignpdf/src/test/java/net/sf/jsignpdf/PdfExtraInfoTest.java` — builds
19+
unprotected / owner-only / both-password PDFs (`StandardProtectionPolicy`,
20+
`AccessPermission`, `PDType1Font.HELVETICA`).
21+
- `jsignpdf/src/test/java/net/sf/jsignpdf/signing/validation/PdfSignatureValidator.java`
22+
— validates signed output; uses `PDDocument.load(byte[])`/`load(File)`,
23+
`getSignatureDictionaries()`, `PDAcroForm`, `PDSignatureField`,
24+
`PDAnnotationWidget`, `PDFStreamParser`, COS traversal, `PDFont`.
25+
26+
Nothing in the signing pipeline touches PDFBox. Migration risk is therefore
27+
scoped to **PDF page preview rendering** and **test fixtures / test
28+
validator**. This is a favorable shape for an upgrade.
29+
30+
Upstream state (Apr 2026): PDFBox **3.0.7** is the current feature release;
31+
PDFBox **2.0.36** is still receiving maintenance patches. Both branches are
32+
active, so the upgrade is not time-critical — but 2.x will eventually be EOL.
33+
34+
## Pros
35+
36+
- **Active main branch.** New fixes (signature field handling, rendering
37+
correctness, encryption parsers) land on 3.x first; 2.x gets a shrinking
38+
subset of backports.
39+
- **Lower memory footprint for previews.** 3.0 parses PDFs on demand
40+
(incremental parsing), so opening a large PDF just to render page 1 no
41+
longer loads the entire object tree. This directly benefits the
42+
JavaFX preview path.
43+
- **Cleaner IO boundary.** The new `pdfbox-io` module (`RandomAccessRead`,
44+
`RandomAccessReadBufferedFile`, `RandomAccessReadMemoryMappedFile`) and
45+
`StreamCacheCreateFunction` replace the old `MemoryUsageSetting` knob —
46+
easier to configure correctly for a desktop app that opens one PDF at a
47+
time.
48+
- **Dependency freshness.** 3.x tracks modern Bouncy Castle (1.75+ transitively,
49+
`jdk18on` artifacts — the same flavour JSignPdf already uses; see
50+
`pom.xml:51-59`). No more `bcprov-jdk15on` pulled in transitively alongside
51+
our `bcprov-jdk18on`, which removes a long-standing enforcer-convergence
52+
irritant.
53+
- **Deprecations gone.** 2.x carried ~8 years of deprecations. The 3.x API
54+
surface is smaller and more predictable for future maintainers.
55+
- **Still Java 8 baseline.** No Java-floor impact — the project already
56+
targets Java 11.
57+
58+
## Cons / risks
59+
60+
- **Breaking API changes touch every call site.** All three `PDDocument.load(...)`
61+
forms are removed and become `Loader.loadPDF(...)`. `PDType1Font.HELVETICA`
62+
(and the other 13 standard fonts) are no longer static singletons —
63+
callers construct `new PDType1Font(Standard14Fonts.FontName.HELVETICA)`
64+
instead. Impact here is small (6 files, ~8 lines) but non-zero.
65+
- **Compression default flip.** 3.0 saves compressed by default. Our test
66+
fixtures save small PDFs and then re-sign them; this should be harmless
67+
but means byte-for-byte output diffs vs. 2.x if anyone has baselined them.
68+
- **2.x is not dead yet.** 2.0.36 shipped in March 2026. Staying on 2.x
69+
a bit longer is a legitimate option; the upgrade is "should", not "must".
70+
- **Distribution side-effects:**
71+
- `distribution/linux/flatpak/maven-dependencies.json` must be regenerated
72+
(see recent `chore(flatpak): regenerate maven-dependencies.json` commits).
73+
The new `pdfbox-io` artifact and any transitive shifts must be captured.
74+
- Shaded fat-jar contents change. Fat-jar size is similar (pdfbox 3 is
75+
comparable to 2.x + fontbox), but the shade plugin's
76+
`ServicesResourceTransformer` and manifest filters need re-verification.
77+
- **Preview rendering parity.** `PDFRenderer.renderImageWithDPI(pageIndex, dpi)`
78+
is preserved, but subtle differences in font substitution, transparency,
79+
or edge cases (e.g. forms XObjects inside signature appearances) are
80+
possible. Needs a visual smoke test against the test corpus.
81+
- **Encrypted PDF handling.** `Loader.loadPDF(RandomAccessRead, String password)`
82+
replaces `PDDocument.load(File, String)`. Behaviour for owner-only,
83+
user-only, both-password, and wrong-password cases must match what
84+
`PdfExtraInfoTest` asserts today — specifically that
85+
`BadPasswordException` is still thrown from OpenPDF (which is what the
86+
test expects), not from PDFBox, since PDFBox is only used in tests for
87+
fixture creation. That should be unchanged, but worth re-running.
88+
- **Dependency convergence.** `maven-enforcer-plugin`'s convergence rule
89+
will flag any transitive BC/logging version drift introduced by pdfbox 3.
90+
Already-present overrides (`bcprov-jdk18on` 1.84, `commons-io` 2.21,
91+
etc.) should cover this, but a clean `mvn -Pelease verify` is required.
92+
93+
## Recommended actions (in order)
94+
95+
1. **Bump the version property.**
96+
`pom.xml:27``<pdfbox.version>3.0.7</pdfbox.version>` (latest stable at
97+
time of writing; pin to a specific patch version, not a range).
98+
99+
2. **Add `pdfbox-io` to `dependencyManagement`.**
100+
In the root `pom.xml`, next to the existing `pdfbox` entry:
101+
```xml
102+
<dependency>
103+
<groupId>org.apache.pdfbox</groupId>
104+
<artifactId>pdfbox-io</artifactId>
105+
<version>${pdfbox.version}</version>
106+
</dependency>
107+
```
108+
Add it as a compile dependency in `jsignpdf/pom.xml` if it isn't pulled
109+
transitively (it should be, but declaring it keeps the convergence rule
110+
happy).
111+
112+
3. **Rewrite the single main-code call site** in
113+
`jsignpdf/src/main/java/net/sf/jsignpdf/preview/Pdf2Image.java`:
114+
```java
115+
// was: tmpDoc = PDDocument.load(tmpFile, options.getPdfOwnerPwdStrX());
116+
tmpDoc = Loader.loadPDF(tmpFile, options.getPdfOwnerPwdStrX());
117+
```
118+
Keep the file-based overload so we don't lose the memory-mapped read
119+
path. No changes needed to `PDFRenderer.renderImageWithDPI`.
120+
121+
4. **Update test fixtures** in `SigningTestBase.java` and
122+
`PdfExtraInfoTest.java`:
123+
```java
124+
// was: cs.setFont(PDType1Font.HELVETICA, 12);
125+
cs.setFont(new PDType1Font(Standard14Fonts.FontName.HELVETICA), 12);
126+
```
127+
(Hoist the font instance to a `@BeforeClass` field if it starts to
128+
matter — the standard 14 fonts are cheap to construct but no longer
129+
free.)
130+
131+
5. **Update the validator** in
132+
`jsignpdf/src/test/java/net/sf/jsignpdf/signing/validation/PdfSignatureValidator.java`:
133+
Replace both `PDDocument.load(fileBytes)` and `PDDocument.load(signedPdf)`
134+
with `Loader.loadPDF(...)`. Everything downstream (`getSignatureDictionaries`,
135+
`PDAcroForm`, `PDSignatureField`, `PDAnnotationWidget`, `PDFStreamParser`,
136+
`PDFont.readCode`, `PDSimpleFont`) is carried over unchanged in 3.x.
137+
138+
6. **Regenerate the flatpak manifest.**
139+
Run the existing regeneration script (or `mvn` with the offline-prep
140+
profile) that produces
141+
`distribution/linux/flatpak/maven-dependencies.json`. Replace all
142+
`pdfbox/2.0.27/*` and `fontbox/2.0.27/*` entries with `3.0.7` equivalents;
143+
add `pdfbox-io/3.0.7/*` entries. Verify SHA-256 sums against Maven
144+
Central.
145+
146+
7. **Re-verify the shaded jar.**
147+
`mvn -pl jsignpdf package` and inspect the resulting
148+
`jsignpdf-*-jar-with-dependencies.jar` for:
149+
- no split-package / service-loader regression (the `ServicesResourceTransformer`
150+
should fold `org.apache.pdfbox.io.spi`-style service descriptors cleanly),
151+
- expected size delta (< ~10% change either way),
152+
- `META-INF/MANIFEST.MF` still has `Main-Class: net.sf.jsignpdf.Signer`.
153+
154+
8. **Run the full test suite.**
155+
`mvn verify`. Pay attention to `PdfExtraInfoTest` (password handling)
156+
and any `signing/` tests that sign fixture PDFs generated by
157+
`SigningTestBase`.
158+
159+
9. **Manual smoke tests** (can't be covered by CI):
160+
- Open a 200+ page PDF in the JavaFX UI; confirm preview still renders
161+
(memory should be lower).
162+
- Open a password-protected PDF, enter owner password, render preview.
163+
- Open a PDF containing fonts that lean on font-substitution; visually
164+
compare a page against the 2.x render.
165+
- Sign a document and reopen the signed output — the JavaFX preview of
166+
the signed page should match the preview of the unsigned page (i.e.
167+
no rendering regression at the signature widget).
168+
169+
10. **Decide on a follow-up (not part of this upgrade):** drop PDFBox from
170+
the preview strategy list entirely. `Constants.PDF2IMAGE_LIBRARIES_DEFAULT`
171+
already falls back through `jpedal` and `pdfrenderer`, so removing
172+
`pdfbox` from the default would let us demote the runtime dependency to
173+
`test`-scope and shrink the fat jar by ~8 MB. This is attractive but
174+
orthogonal — track it separately once the 3.x bump is stable. (The test
175+
validator still needs PDFBox at test-scope regardless.)
176+
177+
## Scope explicitly NOT in this upgrade
178+
179+
- Replacing OpenPDF's signing pipeline with PDFBox's `addSignature(...)` /
180+
`saveIncrementalForExternalSigning(...)`. That would be a much larger
181+
rewrite (CMS/PKCS#7 assembly, TSA integration, visible-signature layer
182+
composition — all currently owned by OpenPDF). It is worth considering
183+
on its own merits (OpenPDF is a fork of iText 2.1 from 2009, and PDFBox
184+
is the more active signing implementation today), but is a separate
185+
design doc.
186+
- Swapping the remaining renderers (`jpedal`, `pdfrenderer`) for PDFBox-only.
187+
See action item 10.
188+
- Any change to the `pdf2image.libraries` default order or configuration
189+
surface in `conf/conf.properties`.
190+
191+
## Sources
192+
193+
- [PDFBox 3.0 Migration Guide](https://pdfbox.apache.org/3.0/migration.html)
194+
- [PDFBox downloads / release status](https://pdfbox.apache.org/download.html)
195+
- [`Loader` javadoc (3.0.x)](https://javadoc.io/doc/org.apache.pdfbox/pdfbox/latest/)
196+
- [DeepWiki summary of the 3.0 migration](https://deepwiki.com/apache/pdfbox-docs/3.2-3.0-migration)

distribution/demo/jsmith-sig.png

35.6 KB
Loading

0 commit comments

Comments
 (0)