Skip to content

Fix parsing when startxref points near xref keyword#803

Closed
vitormattos wants to merge 2 commits into
smalot:masterfrom
vitormattos:fix/metadata-stream-invalid-object-reference
Closed

Fix parsing when startxref points near xref keyword#803
vitormattos wants to merge 2 commits into
smalot:masterfrom
vitormattos:fix/metadata-stream-invalid-object-reference

Conversation

@vitormattos
Copy link
Copy Markdown

Bug

A valid PDF from the veraPDF corpus failed with:

Exception: Invalid object reference for $obj.

pdfinfo reports Pages: 1, so the document should be parsed successfully.

Root Cause

In this file, startxref points to a byte adjacent to the xref keyword (off by one).
The parser required an exact xref match at startxref, then incorrectly fell back to cross-reference stream parsing.
That path tried to resolve an invalid object reference and raised the exception.

Fix

  • Normalize the startxref probe offset by skipping leading PDF whitespace.
  • Tolerate the case where startxref points one byte inside xref (ref) by probing one byte back.
  • Use the normalized offset consistently for deciding between xref table and xref stream parsing.

Tests

  • Added regression fixture: samples/bugs/PullRequest794.pdf
  • Added integration test: DocumentIssueFocusTest::testParseFileWhenStartxrefPointsNearXrefKeyword

The new test fails on master and passes with this patch.

Source PDF

Signed-off-by: Vitor Mattos <1079143+vitormattos@users.noreply.github.com>
Signed-off-by: Vitor Mattos <1079143+vitormattos@users.noreply.github.com>
@vitormattos
Copy link
Copy Markdown
Author

Closing this as duplicate of #798 to keep discussion/history in a single PR.

@vitormattos vitormattos deleted the fix/metadata-stream-invalid-object-reference branch April 27, 2026 17:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant