Skip to content

fix: validate input is a real PDF before encryption (closes #24)#53

Open
ambicuity wants to merge 1 commit intoSUPAIDEAS:mainfrom
ambicuity:fix/issue-24-validate-real-pdf-input
Open

fix: validate input is a real PDF before encryption (closes #24)#53
ambicuity wants to merge 1 commit intoSUPAIDEAS:mainfrom
ambicuity:fix/issue-24-validate-real-pdf-input

Conversation

@ambicuity
Copy link
Copy Markdown
Contributor

@ambicuity ambicuity commented Mar 27, 2026

Summary

This PR fixes issue #24 by validating that input files are real PDFs before encryption.

Reproduction (before)

  1. Create a fake file with .pdf extension containing plain text.
  2. Run passifypdf against it.
  3. CLI failed with unclear parser internals (for example, Stream has ended unexpectedly).

Root cause

The implementation relied on PdfReader initialization for file-format validation and wrapped errors generically, which produced unclear user-facing output for invalid/non-PDF content.

Fix

  • Added a magic-byte validation check for %PDF- before reading with PdfReader.
  • Added targeted PdfReadError handling to normalize parser failures into one friendly error.
  • Standardized invalid-input message to:
    • Input file '<path>' is not a valid PDF file.

Behavior change

  • Invalid/non-PDF input now returns a deterministic, friendly error message without parser internals.
  • Existing file-not-found / directory checks and successful encryption flow are preserved.

Tests

Unit

  • Updated happy-path test for Path.open("rb") magic-byte flow.
  • Added invalid magic-byte rejection test.
  • Added PdfReadError fallback rejection test.

Integration

  • Added CLI test rejecting fake .pdf content.
  • Added CLI test rejecting fake extensionless content.
  • Existing encryption integration test remains passing.

Local verification

  • uv sync --all-extras
  • uv run pytest tests/unittests/ -v --cov=passifypdf --cov-report=xml --cov-report=term
  • uv run pytest tests/integrationtests/ -v
  • Manual CLI smoke checks for fake .pdf and extensionless files both return exit code 1 with the friendly invalid-PDF error.

Copilot AI review requested due to automatic review settings March 27, 2026 17:32
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds early validation to ensure inputs are real PDFs before attempting encryption, improving user-facing errors for invalid/non-PDF content (per #24) and extending test coverage to match the new behavior.

Changes:

  • Added a PDF “magic bytes” validation step and normalized PdfReader parse failures into a friendly ValueError.
  • Updated unit tests to cover magic-byte rejection and PdfReader parse-failure handling.
  • Added integration tests to ensure the CLI rejects fake PDF content (with and without .pdf extension).

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
passifypdf/encryptpdf.py Adds magic-byte validation and targeted PdfReadError handling to produce consistent invalid-PDF errors.
tests/unittests/test_encryptpdf.py Updates mocks for new validation flow; adds unit tests for invalid inputs.
tests/integrationtests/test_encryptpdf_integration.py Adds CLI subprocess tests that assert friendly error output and non-creation of outputs for fake inputs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread passifypdf/encryptpdf.py

invalid_pdf_error = ValueError(f"Input file '{input_pdf}' is not a valid PDF file.")
with input_path.open("rb") as source_file:
if source_file.read(5) != b"%PDF-":
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The magic-byte check is overly strict: some valid PDFs can include leading whitespace/bytes before the "%PDF-" header (the header is typically expected near the start, not necessarily at byte 0). Reading and comparing only the first 5 bytes may incorrectly reject such files. Consider scanning the first ~1KB for the "%PDF-" marker (or skipping leading whitespace) before failing, and keep the error message the same.

Suggested change
if source_file.read(5) != b"%PDF-":
# Read the first chunk of the file and look for the PDF header anywhere within it,
# since some valid PDFs may contain leading bytes before the "%PDF-" marker.
header_chunk = source_file.read(1024)
if b"%PDF-" not in header_chunk:

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants