Skip to content

feat(docx-core): add AI revision validator with MCP transactional enforcement#454

Open
stevenobiajulu wants to merge 5 commits into
mainfrom
add-ai-revision-validator
Open

feat(docx-core): add AI revision validator with MCP transactional enforcement#454
stevenobiajulu wants to merge 5 commits into
mainfrom
add-ai-revision-validator

Conversation

@stevenobiajulu

Copy link
Copy Markdown
Member

Summary

  • Adds the add-ai-revision-validator OpenSpec proposal for issue Add validator for AI-emitted tracked-change OOXML #121.
  • Defines docx-core validator behavior for AI-scoped revision errors, foreign revision warnings, range/field/package checks, and shared tracked-change vocabulary.
  • Defines MCP server transactional enforcement requirements for clone-preflight write validation and save-time hard failures on invalid AI revisions.

Validation

  • openspec validate add-ai-revision-validator --strict

Ref: #121

Add the OpenSpec proposal, design notes, task list, and spec deltas for the AI-scoped revision validator work. This captures the validation and transactionality contract before implementation per the repository OpenSpec workflow.

Ref: #121
@vercel

vercel Bot commented Jun 11, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
site Ready Ready Preview, Comment Jun 12, 2026 1:54am

Request Review

@stevenobiajulu stevenobiajulu changed the title [codex] Propose AI revision validator feat(docx-core): add AI revision validator with MCP transactional enforcement Jun 11, 2026
@github-actions github-actions Bot added the feat label Jun 11, 2026
…orcement

Implements the add-ai-revision-validator OpenSpec change (issue #121):

- New validateAiRevisions() in docx-core: AI-scoped hard errors vs
  foreign-revision warnings across all story parts; integer w:id, author,
  date checks; package-wide AI id uniqueness; id-matched range pairing;
  field-structure rules (extracted to shared/field-structure.ts and
  re-exported from the atomizer pipeline); structural placement rules;
  package invariants (relative rel-target resolution with
  TargetMode="External" exemption, [Content_Types].xml registration).
  Range-end milestones (OOXML CT_MarkupRange/CT_Markup) require only w:id.
- Shared Table A vocabulary constants consumed by the validator, save
  diagnostics, and revision-id seeding (previously three divergent sets).
- Clone-preflight guard in docx-mcp wired into all 10 revision-emitting
  write tools: mutations validate on a preview document and reject with
  AI_REVISION_VALIDATION_FAILED before the live session is touched.
- save fails with INVALID_AI_REVISIONS on AI-attributed errors; anomalies
  already present in the originally-loaded file demote to warnings so
  real-world documents are never bricked by pre-existing markup.

15/15 delta scenarios covered by TEST_FEATURE-tagged tests; spec-coverage,
allure-labels, openspec --strict, builds, and both package suites green.

Peer review (Codex + agy) pending.

Closes #121.
@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

LLM-Based Quality Gate

Overall: ⚠️ WARN (7 pass · 3 warn · 5 skipped · 15 total)

Check Verdict
⏭️ SKIPPED read_file response metadata parity paths not touched by this PR
⏭️ SKIPPED Live DOM namespace-safe OOXML writes paths not touched by this PR
⚠️ Deleted field markup keeps w:fldChar outside w:del Unable to verify: Gemini CLI failed to produce valid gate output across all retry attempts. Review this question manually.
Field validation per story, not global Field validation is run independently per ECMA story as splitStories at packages/docx-core/src/baselines/atomizer/pipeline.ts:390 segments footnotes and endnotes into FieldStory fragments, which are then validated per-story in validateFieldStructure at packages/docx-core/src/shared/field-structure.ts:54-63, ensuring global counter balance is not treated as sufficient.
Revision IDs seeded from all revision-bearing side parts The PR updates revision-ID seeding in packages/docx-mcp/src/session/manager.ts:189 to scan comments, footnotes, endnotes, glossary, and header/footer side parts via enumerateRevisionStoryPartPaths. It checks only revision-carrying elements via REVISION_ID_ELEMENT_NAME_SET at line 167 to ignore comment/bookmark IDs, and handles malformed parts gracefully with a try-catch block at line 197.
⚠️ Accept/reject sweep side parts and caches Unable to verify: Gemini CLI failed to produce valid gate output across all retry attempts. Review this question manually.
⏭️ SKIPPED DocumentViewNode.heading stays canonical paths not touched by this PR
⏭️ SKIPPED AI-author parity across entry points paths not touched by this PR
Property-change wrapper discipline In packages/docx-mcp/src/tools/clear_formatting.ts:122-127, tracked clear-formatting edits emit exactly one correct rPrChange wrapper carrying a snapshot of prior live properties, while removing stale wrappers to avoid stacking.
SUPPORT.md Table A drift vs. implementation The PR only introduces read-only AI revision validation logic and does not modify OOXML revision emission behavior in packages/docx-core/src/primitives/ or touch packages/docx-core/SUPPORT.md.
⏭️ SKIPPED Table A / Table B boundary on side-part revisions paths not touched by this PR
Canonical-emission surface completeness The PR does not add or change any tracked-edit surface or XML emission, but rather introduces a validation layer (packages/docx-core/src/primitives/validate_ai_revisions.ts:393) and preflight guard (packages/docx-mcp/src/tools/ai_revision_guard.ts:108) around the existing Table A surface.
Lean predicate drift against engine semantics (asymmetric) The PR touches packages/docx-core/src/baselines/atomizer/pipeline.ts:73-86 but does not shift TS engine semantics; validateFieldStructure in packages/docx-core/src/shared/field-structure.ts:57-65 explicitly filters out new text-placement rules to maintain perfect extensional alignment with the Lean-pinned model without requiring Lean-side updates.
Unit-test quality (avoid tautological / change-detector tests) a: ok; b: ok; c: ok; d: n/a. The new tests in validate_ai_revisions.test.ts and ai_revision_validation.test.ts construct expected values from first principles, avoid mocking the SUT, and make highly specific semantic assertions about validator codes and state change invariants.
⚠️ Re-derived facts vs canonical sources Unable to verify: Gemini CLI failed to produce valid gate output across all retry attempts. Review this question manually.
Full checklist questions
  1. read_file response metadata parity: If this PR touches packages/docx-mcp/src/tools/read_file.ts, budgeted pagination returns, or additive response metadata like warnings / comment_load_error, do every successful return path (default budgeted early return, non-budget fallthrough, explicit limit/node_ids) preserve the same additive diagnostic fields? read_file has multiple success exits; diagnostics have already disappeared on one path before. Reference: fix(docx-core): declare xmlns:w14/w15 on comments root before writing prefixed attributes (#154) #180 surfaced comment_load_error, fix(docx-mcp): warn when read_file budget is exceeded by a single node (closes #184) #186 added an early budget return + warnings, fix(docx-mcp): surface comment_load_error on the default budgeted read path (closes #189) #191 fixed the missing comment_load_error on the default budgeted path.

  2. Live DOM namespace-safe OOXML writes: If this PR touches packages/docx-core/src/primitives/comments.ts or writes prefixed OOXML attributes/elements (w14:*, w15:*, xmlns:*, comments.xml, commentsExtended.xml, people.xml), are prefixed OOXML names written with namespace-aware APIs — root aliases bound with setAttributeNS(XMLNS_NS, ...), prefixed attributes with setAttributeNS(W14_NS/W15_NS, ...), and is there a test that proves the live DOM works before serialization/reparse? String-prefixed attributes can serialize plausibly while the live DOM still throws namespace errors. Reference: fix(docx-core): declare xmlns:w14/w15 on comments root before writing prefixed attributes (#154) #180 (xmlns:w14/w15 declared on comments root before writing prefixed attrs).

  3. Deleted field markup keeps w:fldChar outside w:del: If this PR touches field atomization, validateFieldStructure, hasFldCharInsideDel, w:fldChar, w:instrText, w:delInstrText, or collapsed field comparison logic, does deleted field output stay ECMA-376-conformant — w:fldChar sibling-level (never inside w:del), deleted instructions use w:delInstrText only inside valid delete wrappers, accept/reject safety checks still reject malformed combined output? Word treats deleted field-state markup in the wrong container as document-corrupting. References: fix(docx-core): validate w:delInstrText placement and reject w:fldChar inside <w:del> #211, fix(docx-core): partition field-closure validation by ECMA-376 story (#212) #225, fix(docx-core): fragment w:fldChar outside w:del per ECMA-376 Part 4 #228.

  4. Field validation per story, not global: If this PR touches packages/docx-core/src/baselines/atomizer/pipeline.ts, splitStories, validateFieldStructure, side-part merge logic, or footnote/endnote field handling, is field validation run independently per ECMA story (document.xml, each footnote, each endnote), with sidecars from both original and revised archives considered, and global counter balance not treated as sufficient? A document can be globally balanced but have an invalid field sequence inside one story. References: fix(docx-core): partition field-closure validation by ECMA-376 story (#212) #225, fix(docx-core): fragment w:fldChar outside w:del per ECMA-376 Part 4 #228, feat(docx-core): sweep side-part revisions on accept/reject #218.

  5. Revision IDs seeded from all revision-bearing side parts: If this PR touches packages/docx-mcp/src/session/manager.ts (especially getRevisionContextForSession or FIXED_REVISION_ID_SEED_PARTS), createRevisionContext, revision-ID allocation, or MCP tools that create tracked changes/comments/footnotes, does revision-ID allocation scan all relevant package parts before issuing new IDs — comments, footnotes, endnotes, glossary, headers, footers — ignore non-revision w:id values (comment IDs, bookmarks), and handle malformed optional parts gracefully? Revision IDs are package-wide; document-only seeding collides with existing side-part revisions. Reference: fix(docx-mcp): seed revision ids from side parts #216 (seed revision ids from side parts).

  6. Accept/reject sweep side parts and caches: If this PR touches DocxDocument.acceptChanges, DocxDocument.rejectChanges, REVISION_STORY_PART_PATHS, accept_changes, reject_changes, or side-part revision markup, does accept/reject process every revision-bearing story — updating document.xml + footnotes.xml + endnotes.xml + comments.xml, writing back only changed side parts while refreshing cached XML, and pruning orphan footnotes without deleting reserved separator entries? Accepting only in the main document leaves stale revisions and dangling references in the package. References: feat(docx-core): sweep side-part revisions on accept/reject #218, fix(docx-mcp): seed revision ids from side parts #216, fix(docx-core): partition field-closure validation by ECMA-376 story (#212) #225.

  7. DocumentViewNode.heading stays canonical: If this PR touches packages/docx-core/src/primitives/document_view.ts, HeadingValue, heading heuristics, ListMetadata.header_style, or Google Docs document-view heading normalization, does node.heading remain a structural heading signal — exact Word styles Heading1Heading6 win, heuristic sources suppressed inside table cells while real Word heading styles still pass, ordinary body paragraphs omit the heading key? Consumers use node.heading != null as a structural test; heuristic false positives break downstream navigation. References: fix(docx-core): harden heading detection (#157 Phase 1) #178, fix(docx-core): suppress non-sectional false-positive headings (closes #187) #188, feat(docx-core): add derived heading object to DocumentViewNode (closes #179) #190.

  8. AI-author parity across entry points: If this PR touches packages/docx-mcp/src/server.ts, packages/docx-mcp/src/cli/tool_runner.ts, packages/docx-mcp/src/cli/commands/**, or adds any new new SessionManager(...) call site in docx-mcp, does every entry point that constructs a SessionManager resolve SAFE_DOCX_AI_AUTHOR with the same three-way semantics (set → use it; empty string → opt out to untracked; unset → defaultAiAuthor), or has a new entry path silently bypassed tracked emission? Each entry path looks locally correct while diverging from another; tracked emission has gone dark in one path before anyone noticed. References: feat(docx-mcp): wire configurable AI author through MCP layer (#142) #172 (production MCP wiring would have kept tracked emission dark), fix(docx-mcp): honor SAFE_DOCX_AI_AUTHOR in CLI entry points (#181) #182 (CLI runners constructing bare SessionManager() silently produced untracked edits).

  9. Property-change wrapper discipline: If this PR touches packages/docx-core/src/primitives/layout.ts, packages/docx-core/src/primitives/text.ts, packages/docx-mcp/src/tools/clear_formatting.ts, or packages/docx-core/src/primitives/track-changes-emitter.ts, do tracked formatting/property edits emit exactly one correct *PrChange wrapper (pPrChange / rPrChange / trPrChange / tcPrChange) carrying a snapshot of the prior live properties — not stacking stale wrappers, not stripping valid historical children (cellIns/cellDel/cellMerge), and not omitting the snapshot when the operation is formatting-aware? Emitted OOXML is visually plausible but subtle snapshot mistakes only surface during later accept/reject or in Word's tracked-changes UI. References: feat(docx-core): emit pPrChange/trPrChange/tcPrChange from layout setters (#140) #167 (duplicate pPrChange/trPrChange/tcPrChange stacking + over-broad tcPr exclusion), feat(docx-mcp): emit rPrChange from clear_formatting MCP tool (#141) #170 (clear_formatting failing to strip stale rPrChange), feat(docx-core): emit rPrChange for formatted paragraph replacements #215 (rPrChange for formatted paragraph replacements + filtering nested stale records).

  10. SUPPORT.md Table A drift vs. implementation: If this PR modifies OOXML revision emission behavior (w:ins, w:del, w:rPrChange, etc.) in packages/docx-core/src/primitives/**, or touches packages/docx-core/SUPPORT.md, does the PR symmetrically update Table A in SUPPORT.md when the supported revision-emission surface in primitives changed — added, removed, or weakened — or is the documented contract now lying about what's supported? Reviewers focus on TS AST correctness and golden tests; Markdown contract tables get treated as an afterthought, so the documented surface drifts from the actual surface. Reference: [120.8] Regression suite for canonical revision emission across the surface #143 review caught replaceParagraphTextRange should emit w:rPrChange when run formatting changes #173 (formatting mismatch in Table A) and addCommentReply should emit body revision markup OR SUPPORT.md should be softened #174 (comment body revision omission forcing a Table A softening) late in peer review.

  11. Table A / Table B boundary on side-part revisions: If this PR touches packages/docx-core/src/primitives/comments.ts, packages/docx-core/src/primitives/footnotes.ts, or other side-part primitives, and adds/changes revision markup (w:ins, w:del), does tracked-change revision logic stay scoped to Table A (document-body content inside the side part) without leaking revision markup into Table B (the side-part package bootstrap — comments.xml/footnotes.xml element registration itself)? Body runs and side-part package elements share nearly identical XML namespace schemas; revisions emitted in the wrong table corrupt the package contract while looking plausible. References: [120.3] Emit w:ins/w:del for comment body anchors #138 (comment-body straddle constraints), [120.4] Emit w:ins/w:del for footnote reference and text #139 (footnote-reference straddle constraints).

  12. Canonical-emission surface completeness: If this PR adds or changes a tracked-edit surface in packages/docx-core/src/primitives/** or packages/docx-mcp/src/tools/**, are the paired artifacts updated together — packages/docx-core/src/integration/canonical-emission-regression.test.ts, packages/docx-mcp/src/integration/canonical-emission-mcp.test.ts, and the documented emitter surface (Table A) — or is the rollout only partially wired? The primitive change looks done before the MCP path, regression matrix, and documented contract are wired through; partial rollouts ship undocumented surface that drifts. References: feat(docx-mcp): wire configurable AI author through MCP layer (#142) #172 (RevisionContext threaded through every Table A MCP tool), test(docx-core,docx-mcp): final regression suite for canonical emission (#143) #175 (24-test regression suite + verified write-time emitter rows), feat(docx-core): emit rPrChange for formatted paragraph replacements #215 (re-enabled rPrChange regression + updated support surface for replaceParagraphTextRange).

  13. Lean predicate drift against engine semantics (asymmetric): If this PR changes field-wrapper semantics, the proof boundary, or atomizer behavior — packages/docx-core/src/baselines/atomizer/**, verification/lean/LeanSpike/Spec.lean, verification/lean/Tier2/**, or packages/docx-core/src/integration/lean-spec-bridge.test.ts — and if the TS engine semantics shifted, did the PR also update the Lean residual predicate and bridge tests, or is the proof now pinned to a stale stronger/weaker assumption? Asymmetric: a TS change without a corresponding Lean update is WARN; a Lean-only change without a TS update should not fire. The Lean side can still compile while the abstraction boundary is subtly wrong for the next engine refactor. References: feat(verification): close inv_field_001 with Tier 2 OoxmlDoc subset #208 (closed inv_field_001 using stronger recursivelyWellformed), refactor(verification): weaken inv_field_001 axiom to document-level preservationFriendly (rebased follow-up to #208) #220 (weakened the axiom to document-level preservationFriendly to avoid breakage when field fragmentation lands).

  14. Unit-test quality (avoid tautological / change-detector tests): If this PR adds or modifies any **/*.test.ts (or other test files), are the test assertions independent of the system under test — expected values constructed from first principles rather than re-derived from the function under test, mocks limited to external boundaries (filesystem, network, clocks) rather than mocking the SUT itself, assertions making concrete semantic claims rather than just snapshotting current behavior or asserting non-null, and any test added alongside a bug fix actually exercising the bug? Tests that re-implement the production code as the "expected" value, or mock out the system under test, pass green while providing no regression protection.

  15. Re-derived facts vs canonical sources: If this PR adds logic that re-computes a fact the codebase already derives elsewhere — another fldChar begin/separate/end visible-content walk (canonical: getParagraphRuns in packages/docx-core/src/primitives/text.ts), a second footnote display-number map (buildFootnoteDisplayMap vs getFootnotes().displayNumber), or any sibling re-implementation of an existing helper instead of importing or exporting it — does the PR consume the canonical source, or explicitly justify the new derivation and add a test pinning the copies in agreement? Independent derivations of the same fact drift apart; read_file renders every footnote marker twice in text/tagged_text ([^N][^N]) #382's doubled footnote markers were two passes disagreeing about one fact, and the fix initially added a fifth copy of the field-state walk. References: read_file renders every footnote marker twice in text/tagged_text ([^N][^N]) #382, fix(docx-mcp): render each footnote marker exactly once in read_file output #386, refactor: expose footnote references as document-view node metadata — one derivation of "visible footnote refs", not five fldChar walks #393.

Estimated cost (this run): $0.0472 — 152,060 input + 614 output tokens (≈4 chars/token) on gemini-3.5-flash · 6 retry attempts. Char-count estimate, not provider telemetry.

- Count-based baseline comparison (agy BLOCKER): structural diagnostics
  carry no per-instance location, so introduced/pre-existing splitting now
  compares fingerprint multisets — N new instances can no longer hide
  behind one pre-existing instance of the same finding.
- Strict xsd:dateTime check (both reviewers): w:date now validates against
  the OOXML ST_DateTime shape instead of any JS-parseable date.
- Range-marker id validation (Codex): comment/perm/bookmark/custom-XML and
  move range markers now flag missing or non-integer w:id (previously
  silently skipped), severity classified by author/touched context.
- apply_plan preflight de-duplication (agy): per-step replace_text /
  insert_paragraph preflights are skipped via an internal flag because
  apply_plan already preflights the entire step sequence once — removes
  the N+1 serialize/parse cost; plan-level rejection covered by a new test.
@stevenobiajulu

Copy link
Copy Markdown
Member Author

Peer review (Codex + agy) — 2026-06-11

Both reviewers ran dynamic reviews (built both packages, ran both full test suites, and executed targeted probes; exit codes reported in their traces).

agy findings: BLOCKER — baseline fingerprints for location-less structural diagnostics were compared as a Set, so N newly-introduced instances of an error type could hide behind one pre-existing instance. MAJOR — apply_plan ran a full preflight per inner step on top of its plan-level preflight (N+1 serialize/parse cycles). MAJOR — Date.parse accepted non-ISO dates Word rejects. NIT — range pairing ignores marker ordering/interleaving.

Codex findings: MAJOR — w:date lax parsing (concurs with agy, probe: "04 Dec 1995 00:12:00 GMT" passed). MAJOR — comment/perm/bookmark range markers got pairing checks but no w:id validation (markers with missing ids were silently skipped). NIT — the static source-scan guard test is shallow (kept as supplementary; dynamic transactionality tests are the primary guard). Codex explicitly confirmed: transactional wiring sound across all guarded tools, double-mutation determinism holds (identical revision ids in preview vs live across replace_text/insert_paragraph/apply_plan/add_comment/add_footnote probes), baseline demotion works for pre-existing unattributable anomalies.

Resolution (commit 0dc3db9): count-based multiset baseline comparison; strict xsd:dateTime validation; range-marker id presence/integer checks with author/touched severity classification; apply_plan inner preflights skipped via an internal flag (plan-level preflight covers the whole sequence; new test asserts the plan-level rejection still fires). Accepted as-is: range-marker ordering/interleaving detection is deferred to the selective accept/reject work (#123), where interval tracking over sibling milestones is already a design requirement.

@stevenobiajulu stevenobiajulu marked this pull request as ready for review June 11, 2026 23:12
…ator

# Conflicts:
#	packages/docx-core/src/baselines/atomizer/pipeline.ts
… test fixtures

Two CI regressions from the validator work:

- lean-build: the Lean differential harness pins validateFieldStructure
  extensionally against Tier2.FieldStructure.validateFieldStructure, and
  the new text-placement rules (TEXT_INSIDE_DELETION,
  DELETED_TEXT_OUTSIDE_DELETION) changed its verdict on 13507/50008
  generated docs (first divergence: w:delText inside w:moveFrom — valid
  per the OOXML revision model). The boolean now filters the
  text-placement codes (restoring pre-PR semantics exactly), and those
  rules — used only by the AI revision validator — treat w:moveFrom as a
  deletion context alongside w:del.
- emitted-schema corpus: guard/save toBuffer snapshots captured the
  deliberately malformed foreign-revision test fixtures into the corpus
  (non-integer w:id, non-xsd dates violate the WML schema). Fixtures now
  use schema-valid markup that the validator still flags (missing w:date,
  which the schema makes optional but the validator requires).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant