Skip to content

fix(rawdata): consolidate invalid xref/object-reference recovery#812

Closed
vitormattos wants to merge 0 commit into
smalot:masterfrom
vitormattos:fix/recover-invalid-xref-offsets
Closed

fix(rawdata): consolidate invalid xref/object-reference recovery#812
vitormattos wants to merge 0 commit into
smalot:masterfrom
vitormattos:fix/recover-invalid-xref-offsets

Conversation

@vitormattos
Copy link
Copy Markdown

@vitormattos vitormattos commented Apr 25, 2026

Summary

Improve raw-data recovery for malformed cross-reference data so parser can still locate valid objects and pages when xref offsets are missing, invalid, or inconsistent.

Scope

  • Strengthen cross-reference recovery in RawDataParser.
  • Improve resilience when startxref points to malformed or shifted regions.
  • Keep parsing stable when object references are partially corrupted.
  • Consolidate regression coverage around raw-data recovery behavior.

Files Updated

  • src/Smalot/PdfParser/RawData/RawDataParser.php
  • tests/PHPUnit/Integration/RawData/RawDataParserTest.php
  • tests/PHPUnit/Integration/DocumentIssueFocusTest.php
  • samples/bugs/PullRequest797-pdf.js.pdf
  • samples/bugs/PullRequest797-vera.pdf
  • samples/bugs/PullRequest813-pdf.js.pdf
  • samples/bugs/PullRequest814-pdf.js.pdf
  • samples/bugs/PullRequest815-xref-command-missing.pdf
  • samples/bugs/PullRequestInvalidObjectReference.pdf

Validation

  • Focused integration coverage in RawDataParserTest for malformed xref and object-reference recovery.
  • Regression assertions for fixture-driven parse stability and page recovery.

History Note

This pull request remains open on smalot/pdfparser. Earlier close and reopen events happened during branch consolidation and do not indicate a merged upstream pull request.

@vitormattos vitormattos reopened this Apr 26, 2026
@vitormattos vitormattos changed the title fix: recover missing or invalid xref offsets fix(rawdata): consolidate invalid xref/object-reference recovery Apr 27, 2026
vitormattos added a commit to vitormattos/pdfparser that referenced this pull request Apr 27, 2026
# Conflicts:
#	tests/PHPUnit/Integration/DocumentIssueFocusTest.php
@vitormattos vitormattos force-pushed the fix/recover-invalid-xref-offsets branch 2 times, most recently from 2412eb7 to b9504ed Compare April 28, 2026 18:09
@vitormattos vitormattos force-pushed the fix/recover-invalid-xref-offsets branch from b9504ed to 2cfa0d9 Compare April 28, 2026 18:15
vitormattos added a commit to vitormattos/pdfparser that referenced this pull request Apr 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant