Fix parsing when startxref points near xref keyword#803
Closed
vitormattos wants to merge 2 commits into
Closed
Conversation
Signed-off-by: Vitor Mattos <1079143+vitormattos@users.noreply.github.com>
Signed-off-by: Vitor Mattos <1079143+vitormattos@users.noreply.github.com>
Author
|
Closing this as duplicate of #798 to keep discussion/history in a single PR. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bug
A valid PDF from the veraPDF corpus failed with:
Exception: Invalid object reference for $obj.pdfinforeportsPages: 1, so the document should be parsed successfully.Root Cause
In this file,
startxrefpoints to a byte adjacent to thexrefkeyword (off by one).The parser required an exact
xrefmatch atstartxref, then incorrectly fell back to cross-reference stream parsing.That path tried to resolve an invalid object reference and raised the exception.
Fix
startxrefprobe offset by skipping leading PDF whitespace.startxrefpoints one byte insidexref(ref) by probing one byte back.Tests
samples/bugs/PullRequest794.pdfDocumentIssueFocusTest::testParseFileWhenStartxrefPointsNearXrefKeywordThe new test fails on
masterand passes with this patch.Source PDF
https://raw.githubusercontent.com/veraPDF/veraPDF-corpus/master/PDF_A-2b/6.6%20Metadata/6.6.2%20Metadata%20streams/6.6.2.3%20Schemas/6.6.2.3.2%20Extension%20schemas/veraPDF%20test%20suite%206-6-2-3-2-t01-pass-c.pdf