Skip to content

Guard low-memory parsing by bounding Flate decode and image payload retention#801

Closed
vitormattos wants to merge 14 commits into
smalot:masterfrom
vitormattos:fix/flate-decode-memory-guard
Closed

Guard low-memory parsing by bounding Flate decode and image payload retention#801
vitormattos wants to merge 14 commits into
smalot:masterfrom
vitormattos:fix/flate-decode-memory-guard

Conversation

@vitormattos
Copy link
Copy Markdown

@vitormattos vitormattos commented Apr 24, 2026

Summary

  • derive a conservative effective Flate decode cap when decodeMemoryLimit is not explicitly configured
  • apply the same cap to the compress.zlib fallback path
  • skip retaining image stream payloads when PHP memory is constrained to avoid cumulative OOM during parsing
  • add focused regression coverage in DocumentIssueFocusTest for PullRequest457.pdf

Reproduction

Before this patch, the command below fails with fatal OOM at memory_limit=128M:

docker run --rm -v "$PWD":/work -w /work smalot-pdf-bench:local \
  php -d memory_limit=128M -r 'require "vendor/autoload.php"; $parser = new Smalot\\PdfParser\\Parser(); $doc = $parser->parseFile("samples/bugs/PullRequest457.pdf"); echo count($doc->getPages()), PHP_EOL;'

After this patch it prints 28.

Tests

  • make run-phpunit ARGS="tests/PHPUnit/Integration/DocumentIssueFocusTest.php --filter testParseFileWithLargeFlateStreams"
  • make run-phpunit ARGS="tests/PHPUnit/Integration/PageTest.php --filter testGetTextPullRequest457"

PDF Source

  • File: samples/bugs/PullRequest457.pdf
  • Source URL: N/A (existing repository fixture reused for this fix)

Signed-off-by: Vitor Mattos <1079143+vitormattos@users.noreply.github.com>
Signed-off-by: Vitor Mattos <1079143+vitormattos@users.noreply.github.com>
Signed-off-by: Vitor Mattos <1079143+vitormattos@users.noreply.github.com>
Signed-off-by: Vitor Mattos <1079143+vitormattos@users.noreply.github.com>
Signed-off-by: Vitor Mattos <1079143+vitormattos@users.noreply.github.com>
Signed-off-by: Vitor Mattos <1079143+vitormattos@users.noreply.github.com>
Signed-off-by: Vitor Mattos <1079143+vitormattos@users.noreply.github.com>
Signed-off-by: Vitor Mattos <1079143+vitormattos@users.noreply.github.com>
@vitormattos vitormattos force-pushed the fix/flate-decode-memory-guard branch from 401b916 to 10c7ffa Compare April 24, 2026 04:06
Signed-off-by: Vitor Mattos <1079143+vitormattos@users.noreply.github.com>
Signed-off-by: Vitor Mattos <1079143+vitormattos@users.noreply.github.com>
Signed-off-by: Vitor Mattos <1079143+vitormattos@users.noreply.github.com>
Signed-off-by: Vitor Mattos <1079143+vitormattos@users.noreply.github.com>
Signed-off-by: Vitor Mattos <1079143+vitormattos@users.noreply.github.com>
Signed-off-by: Vitor Mattos <1079143+vitormattos@users.noreply.github.com>
@vitormattos
Copy link
Copy Markdown
Author

Superseded by the RawDataParser consolidation chain in the fork.

This fix (flate decode memory guard, MemoryLimit helper, test stabilization) is included in vitormattos#31, stacked directly on the fix/invalid-object-reference-tolerant-parser consolidation branch.

Supersedes #801

@vitormattos vitormattos deleted the fix/flate-decode-memory-guard branch April 27, 2026 17:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant