Skip to content

fix(rawdata): recover malformed xref_command_missing startxref path#815

Closed
vitormattos wants to merge 1 commit into
smalot:masterfrom
vitormattos:fix/rawdata-xref-fallback-next
Closed

fix(rawdata): recover malformed xref_command_missing startxref path#815
vitormattos wants to merge 1 commit into
smalot:masterfrom
vitormattos:fix/rawdata-xref-fallback-next

Conversation

@vitormattos
Copy link
Copy Markdown

Bug fixed in this PR

Some malformed PDFs contain an unusable startxref target and/or a missing usable xref command near the advertised offset.
In this scenario, parsing failed with Unable to find xref.

This PR improves RawDataParser recovery by:

  • accepting subsection-start offsets in decodeXref (not only exact xref keyword offsets),
  • probing a nearby xref anchor before deciding decode path,
  • recovering xref/trailer data from nearby trailer/object headers when startxref is missing or unusable,
  • backfilling missing xref offsets from object headers when trailer references are incomplete.

Fixture(s) and source

Tests

Focused gate executed with filter:

  • RawDataParserTest::testParseRawDataIssuePullRequest815XrefCommandMissing

Result:

  • OK (1 test, 1 assertion)

Note to maintainer

You can merge this PR directly, or use it as a reviewable base fix and merge through the integration branch/aggregator flow.

@vitormattos
Copy link
Copy Markdown
Author

Closing as duplicate/superseded: this fix was incorporated into PR #796 (branch fix/invalid-object-reference-tolerant-parser), including regression test and fixture, to avoid parallel PRs touching RawDataParser.

@vitormattos vitormattos deleted the fix/rawdata-xref-fallback-next branch April 27, 2026 17:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant