fix: allow startxref offset to include leading whitespace#797
fix: allow startxref offset to include leading whitespace#797vitormattos wants to merge 4 commits into
Conversation
Some PDFs set startxref to the whitespace immediately before the xref keyword instead of the first letter of xref. The parser required an exact match and incorrectly switched to xref stream decoding, which then failed with Invalid object reference. Changes: - Skip PDF whitespace before checking startxref position - Use adjusted offset when decoding classic xref - Apply same whitespace tolerance for Unix line-ending detection - Tighten trailer key regexes to match /Size /Root /Encrypt /Info /Prev - Add regression fixture and integration test Regression fixture: - samples/bugs/PullRequestXrefWhitespaceStart.pdf Test: - DocumentIssueFocusTest::testParseFileWhenStartxrefPointsToLeadingWhitespace Signed-off-by: Vitor Mattos <1079143+vitormattos@users.noreply.github.com>
Signed-off-by: Vitor Mattos <1079143+vitormattos@users.noreply.github.com>
Signed-off-by: Vitor Mattos <1079143+vitormattos@users.noreply.github.com>
|
Added one more regression sample from the pdf.js corpus: (copied as ). It exercises the same startxref/xref recovery path and passes with the current fix. |
|
Added one more regression sample from the pdf.js corpus: pdfkit_compressed.pdf (copied as samples/bugs/PullRequest797.pdf). It exercises the same startxref/xref recovery path and passes with the current fix. |
Signed-off-by: Vitor Mattos <1079143+vitormattos@users.noreply.github.com>
|
Renamed the two regression fixtures to make the source corpus explicit: PullRequest797-vera.pdf and PullRequest797-pdf.js.pdf. |
|
This PR has been restacked into a per-file consolidation flow for RawDataParser changes.\n\nSuperseded-by chain:\n- base (upstream): #796\n- stacked continuation (fork): https://github.com/vitormattos/pdfparser/pull/26\n\nThe stacked branch keeps equivalent PR797 fix intent while reducing cross-PR conflicts in shared test files. Closing this standalone PR to avoid duplicate merge paths. |
Summary
Problem
Fix
Regression coverage
Validation
PDF Sources