|
| 1 | +# Upgrade PDFBox 2.0.27 → 3.0.x |
| 2 | + |
| 3 | +## Context |
| 4 | + |
| 5 | +The project currently pins `pdfbox.version=2.0.27` in `pom.xml`. Signing itself |
| 6 | +is done by OpenPDF (fork of iText 2.1), not PDFBox — so the PDFBox dependency |
| 7 | +has a very narrow footprint: |
| 8 | + |
| 9 | +- **Main code (runtime):** one site in `jsignpdf/src/main/java/net/sf/jsignpdf/preview/Pdf2Image.java` |
| 10 | + — `getImageUsingPdfBox()` uses `PDDocument.load(File, String)` + |
| 11 | + `PDFRenderer.renderImageWithDPI(...)`. PDFBox is one of three rendering |
| 12 | + strategies (`jpedal,pdfbox,pdfrenderer`) declared in `Constants.PDF2IMAGE_LIBRARIES_DEFAULT` |
| 13 | + and `distribution/conf/conf.properties`. |
| 14 | +- **Test code only:** |
| 15 | + - `jsignpdf/src/test/java/net/sf/jsignpdf/signing/SigningTestBase.java` — |
| 16 | + builds a minimal unsigned PDF (`PDDocument`, `PDPage`, `PDPageContentStream`, |
| 17 | + `PDType1Font.HELVETICA`). |
| 18 | + - `jsignpdf/src/test/java/net/sf/jsignpdf/PdfExtraInfoTest.java` — builds |
| 19 | + unprotected / owner-only / both-password PDFs (`StandardProtectionPolicy`, |
| 20 | + `AccessPermission`, `PDType1Font.HELVETICA`). |
| 21 | + - `jsignpdf/src/test/java/net/sf/jsignpdf/signing/validation/PdfSignatureValidator.java` |
| 22 | + — validates signed output; uses `PDDocument.load(byte[])`/`load(File)`, |
| 23 | + `getSignatureDictionaries()`, `PDAcroForm`, `PDSignatureField`, |
| 24 | + `PDAnnotationWidget`, `PDFStreamParser`, COS traversal, `PDFont`. |
| 25 | + |
| 26 | +Nothing in the signing pipeline touches PDFBox. Migration risk is therefore |
| 27 | +scoped to **PDF page preview rendering** and **test fixtures / test |
| 28 | +validator**. This is a favorable shape for an upgrade. |
| 29 | + |
| 30 | +Upstream state (Apr 2026): PDFBox **3.0.7** is the current feature release; |
| 31 | +PDFBox **2.0.36** is still receiving maintenance patches. Both branches are |
| 32 | +active, so the upgrade is not time-critical — but 2.x will eventually be EOL. |
| 33 | + |
| 34 | +## Pros |
| 35 | + |
| 36 | +- **Active main branch.** New fixes (signature field handling, rendering |
| 37 | + correctness, encryption parsers) land on 3.x first; 2.x gets a shrinking |
| 38 | + subset of backports. |
| 39 | +- **Lower memory footprint for previews.** 3.0 parses PDFs on demand |
| 40 | + (incremental parsing), so opening a large PDF just to render page 1 no |
| 41 | + longer loads the entire object tree. This directly benefits the |
| 42 | + JavaFX preview path. |
| 43 | +- **Cleaner IO boundary.** The new `pdfbox-io` module (`RandomAccessRead`, |
| 44 | + `RandomAccessReadBufferedFile`, `RandomAccessReadMemoryMappedFile`) and |
| 45 | + `StreamCacheCreateFunction` replace the old `MemoryUsageSetting` knob — |
| 46 | + easier to configure correctly for a desktop app that opens one PDF at a |
| 47 | + time. |
| 48 | +- **Dependency freshness.** 3.x tracks modern Bouncy Castle (1.75+ transitively, |
| 49 | + `jdk18on` artifacts — the same flavour JSignPdf already uses; see |
| 50 | + `pom.xml:51-59`). No more `bcprov-jdk15on` pulled in transitively alongside |
| 51 | + our `bcprov-jdk18on`, which removes a long-standing enforcer-convergence |
| 52 | + irritant. |
| 53 | +- **Deprecations gone.** 2.x carried ~8 years of deprecations. The 3.x API |
| 54 | + surface is smaller and more predictable for future maintainers. |
| 55 | +- **Still Java 8 baseline.** No Java-floor impact — the project already |
| 56 | + targets Java 11. |
| 57 | + |
| 58 | +## Cons / risks |
| 59 | + |
| 60 | +- **Breaking API changes touch every call site.** All three `PDDocument.load(...)` |
| 61 | + forms are removed and become `Loader.loadPDF(...)`. `PDType1Font.HELVETICA` |
| 62 | + (and the other 13 standard fonts) are no longer static singletons — |
| 63 | + callers construct `new PDType1Font(Standard14Fonts.FontName.HELVETICA)` |
| 64 | + instead. Impact here is small (6 files, ~8 lines) but non-zero. |
| 65 | +- **Compression default flip.** 3.0 saves compressed by default. Our test |
| 66 | + fixtures save small PDFs and then re-sign them; this should be harmless |
| 67 | + but means byte-for-byte output diffs vs. 2.x if anyone has baselined them. |
| 68 | +- **2.x is not dead yet.** 2.0.36 shipped in March 2026. Staying on 2.x |
| 69 | + a bit longer is a legitimate option; the upgrade is "should", not "must". |
| 70 | +- **Distribution side-effects:** |
| 71 | + - `distribution/linux/flatpak/maven-dependencies.json` must be regenerated |
| 72 | + (see recent `chore(flatpak): regenerate maven-dependencies.json` commits). |
| 73 | + The new `pdfbox-io` artifact and any transitive shifts must be captured. |
| 74 | + - Shaded fat-jar contents change. Fat-jar size is similar (pdfbox 3 is |
| 75 | + comparable to 2.x + fontbox), but the shade plugin's |
| 76 | + `ServicesResourceTransformer` and manifest filters need re-verification. |
| 77 | +- **Preview rendering parity.** `PDFRenderer.renderImageWithDPI(pageIndex, dpi)` |
| 78 | + is preserved, but subtle differences in font substitution, transparency, |
| 79 | + or edge cases (e.g. forms XObjects inside signature appearances) are |
| 80 | + possible. Needs a visual smoke test against the test corpus. |
| 81 | +- **Encrypted PDF handling.** `Loader.loadPDF(RandomAccessRead, String password)` |
| 82 | + replaces `PDDocument.load(File, String)`. Behaviour for owner-only, |
| 83 | + user-only, both-password, and wrong-password cases must match what |
| 84 | + `PdfExtraInfoTest` asserts today — specifically that |
| 85 | + `BadPasswordException` is still thrown from OpenPDF (which is what the |
| 86 | + test expects), not from PDFBox, since PDFBox is only used in tests for |
| 87 | + fixture creation. That should be unchanged, but worth re-running. |
| 88 | +- **Dependency convergence.** `maven-enforcer-plugin`'s convergence rule |
| 89 | + will flag any transitive BC/logging version drift introduced by pdfbox 3. |
| 90 | + Already-present overrides (`bcprov-jdk18on` 1.84, `commons-io` 2.21, |
| 91 | + etc.) should cover this, but a clean `mvn -Pelease verify` is required. |
| 92 | + |
| 93 | +## Recommended actions (in order) |
| 94 | + |
| 95 | +1. **Bump the version property.** |
| 96 | + `pom.xml:27` → `<pdfbox.version>3.0.7</pdfbox.version>` (latest stable at |
| 97 | + time of writing; pin to a specific patch version, not a range). |
| 98 | + |
| 99 | +2. **Add `pdfbox-io` to `dependencyManagement`.** |
| 100 | + In the root `pom.xml`, next to the existing `pdfbox` entry: |
| 101 | + ```xml |
| 102 | + <dependency> |
| 103 | + <groupId>org.apache.pdfbox</groupId> |
| 104 | + <artifactId>pdfbox-io</artifactId> |
| 105 | + <version>${pdfbox.version}</version> |
| 106 | + </dependency> |
| 107 | + ``` |
| 108 | + Add it as a compile dependency in `jsignpdf/pom.xml` if it isn't pulled |
| 109 | + transitively (it should be, but declaring it keeps the convergence rule |
| 110 | + happy). |
| 111 | + |
| 112 | +3. **Rewrite the single main-code call site** in |
| 113 | + `jsignpdf/src/main/java/net/sf/jsignpdf/preview/Pdf2Image.java`: |
| 114 | + ```java |
| 115 | + // was: tmpDoc = PDDocument.load(tmpFile, options.getPdfOwnerPwdStrX()); |
| 116 | + tmpDoc = Loader.loadPDF(tmpFile, options.getPdfOwnerPwdStrX()); |
| 117 | + ``` |
| 118 | + Keep the file-based overload so we don't lose the memory-mapped read |
| 119 | + path. No changes needed to `PDFRenderer.renderImageWithDPI`. |
| 120 | + |
| 121 | +4. **Update test fixtures** in `SigningTestBase.java` and |
| 122 | + `PdfExtraInfoTest.java`: |
| 123 | + ```java |
| 124 | + // was: cs.setFont(PDType1Font.HELVETICA, 12); |
| 125 | + cs.setFont(new PDType1Font(Standard14Fonts.FontName.HELVETICA), 12); |
| 126 | + ``` |
| 127 | + (Hoist the font instance to a `@BeforeClass` field if it starts to |
| 128 | + matter — the standard 14 fonts are cheap to construct but no longer |
| 129 | + free.) |
| 130 | + |
| 131 | +5. **Update the validator** in |
| 132 | + `jsignpdf/src/test/java/net/sf/jsignpdf/signing/validation/PdfSignatureValidator.java`: |
| 133 | + Replace both `PDDocument.load(fileBytes)` and `PDDocument.load(signedPdf)` |
| 134 | + with `Loader.loadPDF(...)`. Everything downstream (`getSignatureDictionaries`, |
| 135 | + `PDAcroForm`, `PDSignatureField`, `PDAnnotationWidget`, `PDFStreamParser`, |
| 136 | + `PDFont.readCode`, `PDSimpleFont`) is carried over unchanged in 3.x. |
| 137 | + |
| 138 | +6. **Regenerate the flatpak manifest.** |
| 139 | + Run the existing regeneration script (or `mvn` with the offline-prep |
| 140 | + profile) that produces |
| 141 | + `distribution/linux/flatpak/maven-dependencies.json`. Replace all |
| 142 | + `pdfbox/2.0.27/*` and `fontbox/2.0.27/*` entries with `3.0.7` equivalents; |
| 143 | + add `pdfbox-io/3.0.7/*` entries. Verify SHA-256 sums against Maven |
| 144 | + Central. |
| 145 | + |
| 146 | +7. **Re-verify the shaded jar.** |
| 147 | + `mvn -pl jsignpdf package` and inspect the resulting |
| 148 | + `jsignpdf-*-jar-with-dependencies.jar` for: |
| 149 | + - no split-package / service-loader regression (the `ServicesResourceTransformer` |
| 150 | + should fold `org.apache.pdfbox.io.spi`-style service descriptors cleanly), |
| 151 | + - expected size delta (< ~10% change either way), |
| 152 | + - `META-INF/MANIFEST.MF` still has `Main-Class: net.sf.jsignpdf.Signer`. |
| 153 | + |
| 154 | +8. **Run the full test suite.** |
| 155 | + `mvn verify`. Pay attention to `PdfExtraInfoTest` (password handling) |
| 156 | + and any `signing/` tests that sign fixture PDFs generated by |
| 157 | + `SigningTestBase`. |
| 158 | + |
| 159 | +9. **Manual smoke tests** (can't be covered by CI): |
| 160 | + - Open a 200+ page PDF in the JavaFX UI; confirm preview still renders |
| 161 | + (memory should be lower). |
| 162 | + - Open a password-protected PDF, enter owner password, render preview. |
| 163 | + - Open a PDF containing fonts that lean on font-substitution; visually |
| 164 | + compare a page against the 2.x render. |
| 165 | + - Sign a document and reopen the signed output — the JavaFX preview of |
| 166 | + the signed page should match the preview of the unsigned page (i.e. |
| 167 | + no rendering regression at the signature widget). |
| 168 | + |
| 169 | +10. **Decide on a follow-up (not part of this upgrade):** drop PDFBox from |
| 170 | + the preview strategy list entirely. `Constants.PDF2IMAGE_LIBRARIES_DEFAULT` |
| 171 | + already falls back through `jpedal` and `pdfrenderer`, so removing |
| 172 | + `pdfbox` from the default would let us demote the runtime dependency to |
| 173 | + `test`-scope and shrink the fat jar by ~8 MB. This is attractive but |
| 174 | + orthogonal — track it separately once the 3.x bump is stable. (The test |
| 175 | + validator still needs PDFBox at test-scope regardless.) |
| 176 | + |
| 177 | +## Scope explicitly NOT in this upgrade |
| 178 | + |
| 179 | +- Replacing OpenPDF's signing pipeline with PDFBox's `addSignature(...)` / |
| 180 | + `saveIncrementalForExternalSigning(...)`. That would be a much larger |
| 181 | + rewrite (CMS/PKCS#7 assembly, TSA integration, visible-signature layer |
| 182 | + composition — all currently owned by OpenPDF). It is worth considering |
| 183 | + on its own merits (OpenPDF is a fork of iText 2.1 from 2009, and PDFBox |
| 184 | + is the more active signing implementation today), but is a separate |
| 185 | + design doc. |
| 186 | +- Swapping the remaining renderers (`jpedal`, `pdfrenderer`) for PDFBox-only. |
| 187 | + See action item 10. |
| 188 | +- Any change to the `pdf2image.libraries` default order or configuration |
| 189 | + surface in `conf/conf.properties`. |
| 190 | + |
| 191 | +## Sources |
| 192 | + |
| 193 | +- [PDFBox 3.0 Migration Guide](https://pdfbox.apache.org/3.0/migration.html) |
| 194 | +- [PDFBox downloads / release status](https://pdfbox.apache.org/download.html) |
| 195 | +- [`Loader` javadoc (3.0.x)](https://javadoc.io/doc/org.apache.pdfbox/pdfbox/latest/) |
| 196 | +- [DeepWiki summary of the 3.0 migration](https://deepwiki.com/apache/pdfbox-docs/3.2-3.0-migration) |
0 commit comments