fix: validate input is a real PDF before encryption (closes #24) by ambicuity · Pull Request #53 · SUPAIDEAS/passifypdf

ambicuity · 2026-03-27T17:32:01Z

Summary

This PR fixes issue #24 by validating that input files are real PDFs before encryption.

Reproduction (before)

Create a fake file with .pdf extension containing plain text.
Run passifypdf against it.
CLI failed with unclear parser internals (for example, Stream has ended unexpectedly).

Root cause

The implementation relied on PdfReader initialization for file-format validation and wrapped errors generically, which produced unclear user-facing output for invalid/non-PDF content.

Fix

Added a magic-byte validation check for %PDF- before reading with PdfReader.
Added targeted PdfReadError handling to normalize parser failures into one friendly error.
Standardized invalid-input message to:
- Input file '<path>' is not a valid PDF file.

Behavior change

Invalid/non-PDF input now returns a deterministic, friendly error message without parser internals.
Existing file-not-found / directory checks and successful encryption flow are preserved.

Tests

Unit

Updated happy-path test for Path.open("rb") magic-byte flow.
Added invalid magic-byte rejection test.
Added PdfReadError fallback rejection test.

Integration

Added CLI test rejecting fake .pdf content.
Added CLI test rejecting fake extensionless content.
Existing encryption integration test remains passing.

Local verification

uv sync --all-extras
uv run pytest tests/unittests/ -v --cov=passifypdf --cov-report=xml --cov-report=term
uv run pytest tests/integrationtests/ -v
Manual CLI smoke checks for fake .pdf and extensionless files both return exit code 1 with the friendly invalid-PDF error.

Copilot

Pull request overview

Adds early validation to ensure inputs are real PDFs before attempting encryption, improving user-facing errors for invalid/non-PDF content (per #24) and extending test coverage to match the new behavior.

Changes:

Added a PDF “magic bytes” validation step and normalized PdfReader parse failures into a friendly ValueError.
Updated unit tests to cover magic-byte rejection and PdfReader parse-failure handling.
Added integration tests to ensure the CLI rejects fake PDF content (with and without .pdf extension).

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File	Description
`passifypdf/encryptpdf.py`	Adds magic-byte validation and targeted `PdfReadError` handling to produce consistent invalid-PDF errors.
`tests/unittests/test_encryptpdf.py`	Updates mocks for new validation flow; adds unit tests for invalid inputs.
`tests/integrationtests/test_encryptpdf_integration.py`	Adds CLI subprocess tests that assert friendly error output and non-creation of outputs for fake inputs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-27T17:34:33Z


+    invalid_pdf_error = ValueError(f"Input file '{input_pdf}' is not a valid PDF file.")
+    with input_path.open("rb") as source_file:
+        if source_file.read(5) != b"%PDF-":


The magic-byte check is overly strict: some valid PDFs can include leading whitespace/bytes before the "%PDF-" header (the header is typically expected near the start, not necessarily at byte 0). Reading and comparing only the first 5 bytes may incorrectly reject such files. Consider scanning the first ~1KB for the "%PDF-" marker (or skipping leading whitespace) before failing, and keep the error message the same.

Suggested change

if source_file.read(5) != b"%PDF-":

# Read the first chunk of the file and look for the PDF header anywhere within it,

# since some valid PDFs may contain leading bytes before the "%PDF-" marker.

header_chunk = source_file.read(1024)

if b"%PDF-" not in header_chunk:

fix: validate real pdf input before encryption (closes SUPAIDEAS#24)

3897a30

Copilot AI review requested due to automatic review settings March 27, 2026 17:32

Copilot started reviewing on behalf of ambicuity March 27, 2026 17:32 View session

Copilot AI reviewed Mar 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: validate input is a real PDF before encryption (closes #24)#53

fix: validate input is a real PDF before encryption (closes #24)#53
ambicuity wants to merge 1 commit intoSUPAIDEAS:mainfrom
ambicuity:fix/issue-24-validate-real-pdf-input

ambicuity commented Mar 27, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-        if source_file.read(5) != b"%PDF-":
+        # Read the first chunk of the file and look for the PDF header anywhere within it,
+        # since some valid PDFs may contain leading bytes before the "%PDF-" marker.
+        header_chunk = source_file.read(1024)
+        if b"%PDF-" not in header_chunk:

Conversation

ambicuity commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Reproduction (before)

Root cause

Fix

Behavior change

Tests

Unit

Integration

Local verification

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ambicuity commented Mar 27, 2026 •

edited

Loading