Skip to content

feat(doc): add v2 XML content guards for bare ampersands and deprecated tags#822

Open
herbertliu wants to merge 2 commits into
mainfrom
feat/docs-v2-xml-content-guard
Open

feat(doc): add v2 XML content guards for bare ampersands and deprecated tags#822
herbertliu wants to merge 2 commits into
mainfrom
feat/docs-v2-xml-content-guard

Conversation

@herbertliu
Copy link
Copy Markdown
Collaborator

@herbertliu herbertliu commented May 11, 2026

Summary

Add pre-flight static checks for the v2 XML document path to catch two categories of silent failures before the API call is made.

Changes

  • shortcuts/doc/docs_update_check.go: Add CheckV2XMLBareAmpersand (hard error) and CheckV2XMLWarnings (non-fatal warnings) with table-driven tests
  • shortcuts/doc/docs_create_v2.go: Integrate bare-ampersand check in validateCreateV2 and XML warnings in executeCreateV2
  • shortcuts/doc/docs_update_v2.go: Same integration in validateUpdateV2 / executeUpdateV2
  • shortcuts/doc/docs_update_check_test.go: Add TestCheckV2XMLBareAmpersand and TestCheckV2XMLWarnings

Checks

CheckV2XMLBareAmpersand — hard error (returned from Validate):

  • Fires when content contains a & that is not a recognised XML entity (&, <, >, ', ", &#N;, &#xH;).
  • The v2 XML parser rejects such requests outright; catching this early gives a clear error instead of an opaque API failure.

CheckV2XMLWarnings — non-fatal warnings (printed to stderr before the API call):

  1. <quote-container> — v2 silently drops the block; the correct tag is <blockquote>.
  2. <column width="N"> with an integer value — has no effect in v2; the correct attribute is width-ratio="0.5" (float 0–1).

Both checks only fire when --doc-format is xml (the default).

Test Plan

  • go test ./shortcuts/doc/... — all pass, 66.4% coverage
  • go vet ./... — clean
  • gofmt -l . — clean

Summary by CodeRabbit

  • New Features

    • Enhanced XML validation for v2 documents: malformed bare ampersands now block submission when not using markdown.
    • Emits user-facing warnings for unsupported XML constructs and deprecated column width attributes prior to processing.
  • Tests

    • Added unit tests covering ampersand detection and warning conditions to ensure consistent validation and messaging.

Review Change Stack

…ed tags

Add pre-flight checks for the v2 XML document path:

- CheckV2XMLBareAmpersand: returns a hard error when content contains a
  bare & that is not a valid XML entity reference (&amp;, &lt;, &gt;,
  &apos;, &quot;, &#N;, &#xH;). Such bare ampersands cause the v2 XML
  parser to reject the request.

- CheckV2XMLWarnings: returns non-fatal warnings for two silently-wrong
  constructs — <quote-container> (v2 drops it; use <blockquote>) and
  <column width="N"> with an integer value (has no effect; use
  width-ratio="0.N").

Both checks are integrated into validateCreateV2/validateUpdateV2 (hard
error) and executeCreateV2/executeUpdateV2 (warnings to stderr). Only
fires when --doc-format is xml (the default).
@github-actions github-actions Bot added domain/ccm PR touches the ccm domain size/M Single-domain feat or fix with limited business impact labels May 11, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 11, 2026

📝 Walkthrough

Walkthrough

This PR adds XML content validation for v2 documents. Two new validators detect bare ampersands (fatal errors) and unsupported v2 constructs (non-fatal warnings). Both create and update v2 command paths now validate XML content and emit warnings before performing API operations.

Changes

XML V2 Validation and Warning System

Layer / File(s) Summary
Core XML Validators
shortcuts/doc/docs_update_check.go
CheckV2XMLBareAmpersand detects unescaped & using regex entity matching; CheckV2XMLWarnings identifies unsupported quote-container blocks and integer column width attributes (recommending width-ratio instead).
Validation Tests
shortcuts/doc/docs_update_check_test.go
TestCheckV2XMLBareAmpersand validates bare ampersand detection across valid entities and bare cases; TestCheckV2XMLWarnings validates warning detection for quote-containers and width attributes; containsStr helper checks substrings in warnings.
Create V2 Integration
shortcuts/doc/docs_create_v2.go
Adds fmt import; validateCreateV2 fails on bare ampersands when format is not markdown; executeCreateV2 emits warnings to stderr before calling create-document API.
Update V2 Integration
shortcuts/doc/docs_update_v2.go
validateUpdateV2 fails on bare ampersands when format is not markdown; executeUpdateV2 emits warnings to stderr before calling update API.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Suggested labels

size/M, domain/ccm

Suggested reviewers

  • fangshuyu-768

Poem

🐰 Ampersands now dance with care,
Bare ones caught before they snare,
Warnings bloom for width-ratio grace,
XML v2 finds its proper place!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 27.27% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat(doc): add v2 XML content guards for bare ampersands and deprecated tags' accurately and concisely describes the main objective of the PR—adding validation checks for v2 XML content.
Description check ✅ Passed The pull request description comprehensively covers all required sections: summary explains the motivation, changes detail the modifications across four files, checks describe the two validation functions with their behaviors, and test plan confirms verification steps.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/docs-v2-xml-content-guard

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@shortcuts/doc/docs_update_check.go`:
- Around line 313-317: The current regex columnIntWidthRe is too permissive and
misses valid forms; change it to require a preceding whitespace before the
attribute, allow optional spaces around '=', accept single or double quotes, and
capture the integer value (e.g. `<column\b[^>]*\swidth\s*=\s*(['"])(\d+)\1`) so
it won't match attributes like data-width and will match forms like width='50'
or width = "50". Apply the same tightening to the other related regex defined
around lines 335-341 so both matchers use `\swidth\s*=\s*(['"])(\d+)\1` (or the
float equivalent) and keep the surrounding `<column\b[^>]*` context to limit
matches to column elements.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: bc2981b2-3f9c-4a87-909d-8b36a95bd73c

📥 Commits

Reviewing files that changed from the base of the PR and between 25c72ce and a29a8bf.

📒 Files selected for processing (4)
  • shortcuts/doc/docs_create_v2.go
  • shortcuts/doc/docs_update_check.go
  • shortcuts/doc/docs_update_check_test.go
  • shortcuts/doc/docs_update_v2.go

Comment thread shortcuts/doc/docs_update_check.go
@codecov
Copy link
Copy Markdown

codecov Bot commented May 11, 2026

Codecov Report

❌ Patch coverage is 70.58824% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 65.94%. Comparing base (0ed63b0) to head (9fa94ab).
⚠️ Report is 6 commits behind head on main.

Files with missing lines Patch % Lines
shortcuts/doc/docs_update_v2.go 0.00% 6 Missing ⚠️
shortcuts/doc/docs_create_v2.go 33.33% 2 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #822      +/-   ##
==========================================
+ Coverage   65.67%   65.94%   +0.27%     
==========================================
  Files         513      516       +3     
  Lines       47655    48930    +1275     
==========================================
+ Hits        31297    32269     +972     
- Misses      13652    13883     +231     
- Partials     2706     2778      +72     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 11, 2026

🚀 PR Preview Install Guide

🧰 CLI update

npm i -g https://pkg.pr.new/larksuite/cli/@larksuite/cli@9fa94ab3a26aad5efc77fec1fd210295c812bc6b

🧩 Skill update

npx skills add larksuite/cli#feat/docs-v2-xml-content-guard -y -g

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
shortcuts/doc/docs_update_check_test.go (1)

474-497: ⚡ Quick win

Use strings.Contains instead of custom substring scanner.

The containsStr function at line 487 is locally scoped to this test file, used only once at line 479, and the strings package is already imported. Replace it with strings.Contains to eliminate the unnecessary custom implementation.

Proposed simplification
@@
 			for _, sub := range tt.wantContains {
-				if !containsStr(combined, sub) {
+				if !strings.Contains(combined, sub) {
 					t.Errorf("expected warning to contain %q, got: %s", sub, combined)
 				}
 			}
 		})
 	}
 }
-
-func containsStr(s, sub string) bool {
-	return len(s) >= len(sub) && (s == sub || len(sub) == 0 ||
-		func() bool {
-			for i := 0; i+len(sub) <= len(s); i++ {
-				if s[i:i+len(sub)] == sub {
-					return true
-				}
-			}
-			return false
-		}())
-}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@shortcuts/doc/docs_update_check_test.go` around lines 474 - 497, The test
defines a local containsStr function and calls it to check substrings; replace
that call with the standard strings.Contains and remove the custom containsStr
function to simplify the test (ensure the strings package is imported and used
in the assertion where containsStr was invoked, and delete the containsStr
function declaration to avoid redundancy).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@shortcuts/doc/docs_update_check_test.go`:
- Around line 474-497: The test defines a local containsStr function and calls
it to check substrings; replace that call with the standard strings.Contains and
remove the custom containsStr function to simplify the test (ensure the strings
package is imported and used in the assertion where containsStr was invoked, and
delete the containsStr function declaration to avoid redundancy).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ae956902-4be3-4375-a43b-58e4b2cd6efa

📥 Commits

Reviewing files that changed from the base of the PR and between a29a8bf and 9fa94ab.

📒 Files selected for processing (2)
  • shortcuts/doc/docs_update_check.go
  • shortcuts/doc/docs_update_check_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • shortcuts/doc/docs_update_check.go

Copy link
Copy Markdown
Collaborator

@fangshuyu-768 fangshuyu-768 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding these pre-flight checks — the overall approach of catching silent failures early is solid. I have a few concerns below, ordered by impact.

// also matched — that is acceptable for a non-blocking warning.
var columnIntWidthRe = regexp.MustCompile(`<column\b[^>]*\swidth\s*=\s*['"]?\d+['"]?`)

// CheckV2XMLWarnings returns a list of non-fatal warnings for v2 XML content.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

columnIntWidthRe produces false positives on float values.

The regex <column\b[^>]*\swidth\s*=\s*['"]?\d+['"]? will match width="0.5" because \d+ greedily matches the 0 before the dot, producing a spurious warning for a valid float value.

Suggested fix: use FindStringSubmatch to capture the attribute value and then check in Go code whether it's a pure integer (no decimal point), e.g.:

var columnWidthRe = regexp.MustCompile(`<column\b[^>]*\swidth\s*=\s*(['"])([^'"]+)\1`)

// then in CheckV2XMLWarnings:
for _, m := range columnWidthRe.FindAllStringSubmatch(content, -1) {
    val := m[2]
    if _, err := strconv.Atoi(val); err == nil {
        // pure integer → warn
    }
}

Alternatively, if the regex approach is preferred, add a negative lookahead equivalent by excluding values containing a dot: \d+(?!\.\d) — but since Go's RE2 doesn't support lookahead, the code-based check is the way to go.

if content == "" || !strings.Contains(content, "&") {
return ""
}
// Replace every valid entity with its same-length placeholder so positional
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment says "same-length placeholder" but "ENTITY" is not same-length.

The comment states: "Replace every valid entity with its same-length placeholder so positional byte offsets are preserved", but "ENTITY" (6 bytes) is not the same length as &lt; (4), &amp; (5), &#65; (5), etc. The parenthetical "(not required here, but avoids false positives)" also contradicts the stated purpose.

Since positional offsets aren't actually used anywhere, suggest simplifying the comment to something like:

// Replace every valid entity with a placeholder so that any remaining &
// must be a bare ampersand.

return true
}
}
return false
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Custom containsStr helper is unnecessary — use strings.Contains.

This hand-rolled implementation replicates strings.Contains with added complexity (nested anonymous function). There's no performance concern here since this is test code.

Suggested replacement:

if !strings.Contains(combined, sub) {
    t.Errorf("expected warning to contain %q, got: %s", sub, combined)
}

if runtime.Str("parent-token") != "" && runtime.Str("parent-position") != "" {
return common.FlagErrorf("--parent-token and --parent-position are mutually exclusive")
}
if runtime.Str("doc-format") != "markdown" {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent empty-content guard between create and update paths.

Here in validateCreateV2 there's no content != "" guard before calling CheckV2XMLBareAmpersand, but in validateUpdateV2 (line ~106) there is one:

if runtime.Str("doc-format") != "markdown" && content != "" {

While CheckV2XMLBareAmpersand handles empty strings internally, the inconsistency makes it look like an oversight. Same applies to the warning checks in executeCreateV2 vs executeUpdateV2.

Suggested: either add the guard here for consistency, or remove it from validateUpdateV2 and rely on the callee's internal check in both places.

{
name: "column float width-ratio is fine",
content: `<grid><column width-ratio="0.5"><p>A</p></column></grid>`,
wantLen: 0,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing test case: mixed valid entities and bare ampersand.

There's no test for content that contains both valid entities and a bare &, e.g. "a &amp; b & c". This is a common real-world scenario and would help verify that the replace-then-check approach works correctly when valid and invalid references coexist.

Suggested addition:

{name: "mixed valid entity and bare ampersand flagged", content: "a &amp; b & c", wantErr: true},

{
name: "column single-quoted integer width triggers warning",
content: `<grid><column width='30'/></grid>`,
wantLen: 1,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing test case: width with float value should NOT trigger warning.

This is the flip side of the columnIntWidthRe false-positive issue. Adding a test for <column width="0.5"> would both document the intended behavior and catch the regex bug (it currently produces a false positive because \d+ matches the 0 in 0.5).

Suggested addition:

{
    name:    "column float width value is fine",
    content: `<grid><column width="0.5"><p>A</p></column></grid>`,
    wantLen: 0,
},

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

domain/ccm PR touches the ccm domain size/M Single-domain feat or fix with limited business impact

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants