Skip to content

Fix : replace binary PPTX fixture with programmatic generator#2081

Open
PratikWayase wants to merge 1 commit into
microsoft:mainfrom
PratikWayase:fix/pptx-fixture-generator
Open

Fix : replace binary PPTX fixture with programmatic generator#2081
PratikWayase wants to merge 1 commit into
microsoft:mainfrom
PratikWayase:fix/pptx-fixture-generator

Conversation

@PratikWayase

Copy link
Copy Markdown
Contributor

Pull Request

Description

This PR resolves the security and auditability concerns raised in #1135 regarding the committed binary minimal_test_fixture.pptx.

Changes Made:

  1. Removed Opaque Binary: Deleted tests/fixtures/minimal_test_fixture.pptx to eliminate the risk of hidden macros, OLE objects, or malicious XML embedded in an unreviewable binary blob.
  2. Programmatic Fixture Generation: Added logic in conftest.py to dynamically generate the minimal PPTX fixture at test runtime using python-pptx. Used lxml to manipulate the theme XML directly to ensure deterministic theme color resolution (dk1, accent1).
  3. Roundtrip Validation: Added a new integration test test_generated_fixture_passes_validate_deck in test_extract_content_integration.py. This ensures the programmatically generated PPTX passes structural validation via validate_deck.py before extraction tests run, creating a closed-loop guarantee.

Benefits:

  • Security: No more opaque binary blobs in the repository.
  • Auditability: The exact contents of the test fixture are now fully visible and reviewable as standard Python code.
  • Maintainability: The fixture can be easily updated in the future without needing external tools to regenerate a .pptx file.

Related Issue(s)

Fixes #1135

Type of Change

Select all that apply:

Code & Documentation:

  • Bug fix (non-breaking change fixing an issue)
  • New feature (non-breaking change adding functionality)
  • Breaking change (fix or feature causing existing functionality to change)
  • Documentation update

Infrastructure & Configuration:

  • GitHub Actions workflow
  • Linting configuration (markdown, PowerShell, etc.)
  • Security configuration (Removes opaque binary blob from repository)
  • DevContainer configuration
  • Dependency update

AI Artifacts:

  • Reviewed contribution with prompt-builder agent and addressed all feedback
  • Copilot instructions (.github/instructions/*.instructions.md)
  • Copilot prompt (.github/prompts/*.prompt.md)
  • Copilot agent (.github/agents/*.agent.md)
  • Copilot skill (.github/skills/*/SKILL.md)

Other:

  • Script/automation (.ps1, .sh, .py)
  • Other (please describe): Test infrastructure / Pytest fixtures

Testing

  • Ran the full Python test suite locally using npm run test:py.
  • Verified that all 758 PowerPoint tests pass, including the new test_generated_fixture_passes_validate_deck roundtrip validation test.
  • Confirmed that the extraction tests correctly resolve the dynamically generated theme colors and metadata.
  • (Note: The tts-voiceover test failures observed in the CI logs are pre-existing argparse issues in a completely separate skill and are unrelated to this PR).

Checklist

Required Checks

  • Documentation is updated (if applicable)
  • Files follow existing naming conventions
  • Changes are backwards compatible (if applicable)
  • Tests added for new functionality (if applicable)

Required Automated Checks

The following validation commands must pass before merging:

  • Markdown linting: npm run lint:md
  • Spell checking: npm run spell-check
  • Frontmatter validation: npm run lint:frontmatter
  • Skill structure validation: npm run validate:skills
  • Link validation: npm run lint:md-links
  • PowerShell analysis: npm run lint:ps
  • Plugin freshness: npm run plugin:generate
  • Docusaurus tests: npm run docs:test

Security Considerations

  • This PR does not contain any sensitive or NDA information
  • Any new dependencies have been reviewed for security issues (No new dependencies added; utilizes existing python-pptx and lxml)
  • Security-related scripts follow the principle of least privilege

@PratikWayase PratikWayase requested a review from a team as a code owner June 19, 2026 14:49
Comment thread .github/skills/experimental/powerpoint/tests/test_extract_content_integration.py Outdated
Comment thread .github/skills/experimental/powerpoint/tests/test_extract_content_integration.py Outdated
Comment thread .github/skills/experimental/powerpoint/tests/conftest.py Outdated
@codecov-commenter

codecov-commenter commented Jun 23, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 85.95%. Comparing base (a847cfa) to head (bd10ae8).
⚠️ Report is 20 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2081      +/-   ##
==========================================
+ Coverage   80.82%   85.95%   +5.13%     
==========================================
  Files         117       84      -33     
  Lines       19095    11686    -7409     
==========================================
- Hits        15433    10045    -5388     
+ Misses       3662     1641    -2021     
Flag Coverage Δ
pester 84.23% <ø> (-0.42%) ⬇️
pytest 90.34% <ø> (+12.51%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.
see 35 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@katriendg katriendg left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tackling this — replacing the opaque binary .pptx with a transparent, code-reviewable generator is a real security and auditability win, and the roundtrip validate_deck test is a nice closed-loop guarantee.
Note I reviewed the current version of the branch, though due to a merge conflict some of the comments may be stale. Please rebase and re-run the CI checks before merge.

Before merge, a few items: the two modified Python files currently fail the repo's ruff lint gate (npm run lint:py) with 13 errors — 9 trailing-whitespace blank lines, 2 lines over 88 chars, and 2 unsorted import blocks. Most are auto-fixable with ruff check --fix / ruff format. Beyond lint, please add type hints to the new helper functions, mark the new roundtrip test with @pytest.mark.integration (matching its sibling tests) and give it AAA structure, and reconsider the direct lxml import (it's only a transitive dep of python-pptx) and the author = "ChatGPT" fixture value. Details are in the inline review.

@katriendg katriendg left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline findings from the code review (the overall verdict remains Request changes from my earlier review). Each comment maps to a finding in the review summary.

Comment thread .github/skills/experimental/powerpoint/tests/conftest.py
"""Shared fixtures for PowerPoint skill tests."""

import io
from lxml import etree

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Standards] Undeclared direct dependency. lxml is imported directly but is not listed in pyproject.toml dependencies — it only works because python-pptx pulls it in transitively. Either add lxml to dependencies, or manipulate the theme XML through python-pptx's own element API (theme_part.element) instead of re-parsing theme_part.blob with lxml.etree.

+ _chunk(b"IEND", b"")
)

def _set_theme_colors(prs):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Standards] Missing type hints. Repo Python conventions require annotations, and the sibling helpers in this file are annotated. Suggest def _set_theme_colors(prs: Presentation) -> None:, and annotate the nested set_color(color_name: str, hex_val: str) -> None:.

)

def _set_theme_colors(prs):
"""Sets specific theme colors by modifying the theme part's blob via its public setter."""

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Standards] E501 line too long (94 > 88). Wrap this docstring to <=88 chars to satisfy the skill's ruff line-length = 88.

set_color('dk1', '000000')
set_color('accent1', '4F81BD')

theme_part.blob = etree.tostring(theme_element, xml_declaration=True, encoding='UTF-8', standalone=True)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Standards] E501 line too long (108 > 88). Break this etree.tostring(...) call across multiple lines to satisfy line-length = 88.


theme_part.blob = etree.tostring(theme_element, xml_declaration=True, encoding='UTF-8', standalone=True)

def generate_minimal_fixture(output_path: Path):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Standards] Missing return annotation. generate_minimal_fixture returns nothing; annotate as def generate_minimal_fixture(output_path: Path) -> None:.

prs = make_blank_presentation()

prs.core_properties.title = "Minimal Test Fixture"
prs.core_properties.author = "ChatGPT"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Standards] Fixture author hardcoded to "ChatGPT". Setting an external AI product name as the fixture author is inappropriate for a repo fixture. Use a neutral value (e.g. "HVE Core Test Fixture") and update the matching EXPECTED_FIXTURE["metadata"]["author"] assertion in the integration test.

slide2 = prs.slides.add_slide(slide_layout_2)
slide2.placeholders[0].text = "Slide with Image"
slide2.placeholders[1].text = "Below is an embedded image."
slide2.notes_slide

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Standards] Bare expression reads as dead code. slide2.notes_slide on its own line only has the side effect of lazily creating an empty notes slide; it looks like a leftover and would trip ruff B018 if B rules were on. Either remove it, or make the intent explicit: _ = slide2.notes_slide # ensure notes part exists.

import yaml
from pathlib import Path
from extract_content import main
from validate_deck import validate_deck, max_severity

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Standards] Import block I001 (unsorted). from pathlib import Path (stdlib) is placed after the third-party pytest/yaml imports. Group stdlib first, then third-party, then local (extract_content, validate_deck). ruff check --fix resolves this.

slide_1["elements"][0]["font_color"] == EXPECTED_FIXTURE["slide_1_font_color"]
)

def test_generated_fixture_passes_validate_deck(minimal_test_fixture_path: Path):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Functional] Missing @pytest.mark.integration + AAA structure. This builds a full deck and runs structural validation (a roundtrip integration test) but, unlike the two sibling tests in this module, is not decorated with @pytest.mark.integration (a marker registered in pyproject.toml). Add the marker and structure the body with explicit Arrange/Act/Assert sections per the repo test conventions.

@PratikWayase PratikWayase force-pushed the fix/pptx-fixture-generator branch from b9ab10b to e23cca5 Compare June 27, 2026 14:49
@PratikWayase PratikWayase force-pushed the fix/pptx-fixture-generator branch from e23cca5 to 2fad6cf Compare June 28, 2026 03:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

test(skills): replace committed PPTX binary fixture with programmatic generation

4 participants