Skip to content

fix: support multi-space object headers#800

Closed
vitormattos wants to merge 1 commit into
smalot:masterfrom
vitormattos:fix/nearby-object-header-fallback
Closed

fix: support multi-space object headers#800
vitormattos wants to merge 1 commit into
smalot:masterfrom
vitormattos:fix/nearby-object-header-fallback

Conversation

@vitormattos
Copy link
Copy Markdown

@vitormattos vitormattos commented Apr 24, 2026

Summary

  • Accept indirect object headers that use one or more whitespace characters between object number and generation (for example 1 0 obj).
  • Keep robust fallback for slightly inaccurate xref offsets by searching nearby expected object headers.
  • Add regression fixture and integration test.

Problem

  • Some malformed PDFs contain object headers with extra spaces.
  • The parser matched only a single whitespace in object headers, causing objects to resolve as null.
  • This propagated to missing catalog/page detection.

Fix

  • Relax object header pattern to accept one or more PDF whitespace characters.
  • Use the actual matched object-header token length when advancing parsing offset.
  • Preserve nearby-header recovery when xref offset is near the real object header.

Regression coverage

  • Added fixture: samples/bugs/PullRequestNearbyObjectHeaderOffset.pdf
  • Added test: DocumentIssueFocusTest::testParseFileWhenObjectHeaderIsNearXrefOffset

Validation

  • make run-phpunit ARGS="tests/PHPUnit/Integration/DocumentIssueFocusTest.php --filter testParseFileWhenObjectHeaderIsNearXrefOffset"
  • make run-phpunit ARGS="tests/PHPUnit/Integration/DocumentIssueFocusTest.php"

PDF Source

Signed-off-by: Vitor Mattos <1079143+vitormattos@users.noreply.github.com>
@vitormattos
Copy link
Copy Markdown
Author

Superseded by the RawDataParser consolidation chain in the fork.

This fix is included in vitormattos#30 (stacked on PR796 base via PR797→798→799→800).

Fixture consolidated to samples/bugs/rawdata/, test migrated to RawDataParserTest data provider.

@vitormattos vitormattos deleted the fix/nearby-object-header-fallback branch April 27, 2026 17:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant