feat(doc): add v2 XML content guards for bare ampersands and deprecated tags#822
feat(doc): add v2 XML content guards for bare ampersands and deprecated tags#822herbertliu wants to merge 2 commits into
Conversation
…ed tags Add pre-flight checks for the v2 XML document path: - CheckV2XMLBareAmpersand: returns a hard error when content contains a bare & that is not a valid XML entity reference (&, <, >, ', ", &#N;, &#xH;). Such bare ampersands cause the v2 XML parser to reject the request. - CheckV2XMLWarnings: returns non-fatal warnings for two silently-wrong constructs — <quote-container> (v2 drops it; use <blockquote>) and <column width="N"> with an integer value (has no effect; use width-ratio="0.N"). Both checks are integrated into validateCreateV2/validateUpdateV2 (hard error) and executeCreateV2/executeUpdateV2 (warnings to stderr). Only fires when --doc-format is xml (the default).
📝 WalkthroughWalkthroughThis PR adds XML content validation for v2 documents. Two new validators detect bare ampersands (fatal errors) and unsupported v2 constructs (non-fatal warnings). Both create and update v2 command paths now validate XML content and emit warnings before performing API operations. ChangesXML V2 Validation and Warning System
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@shortcuts/doc/docs_update_check.go`:
- Around line 313-317: The current regex columnIntWidthRe is too permissive and
misses valid forms; change it to require a preceding whitespace before the
attribute, allow optional spaces around '=', accept single or double quotes, and
capture the integer value (e.g. `<column\b[^>]*\swidth\s*=\s*(['"])(\d+)\1`) so
it won't match attributes like data-width and will match forms like width='50'
or width = "50". Apply the same tightening to the other related regex defined
around lines 335-341 so both matchers use `\swidth\s*=\s*(['"])(\d+)\1` (or the
float equivalent) and keep the surrounding `<column\b[^>]*` context to limit
matches to column elements.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: bc2981b2-3f9c-4a87-909d-8b36a95bd73c
📒 Files selected for processing (4)
shortcuts/doc/docs_create_v2.goshortcuts/doc/docs_update_check.goshortcuts/doc/docs_update_check_test.goshortcuts/doc/docs_update_v2.go
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #822 +/- ##
==========================================
+ Coverage 65.67% 65.94% +0.27%
==========================================
Files 513 516 +3
Lines 47655 48930 +1275
==========================================
+ Hits 31297 32269 +972
- Misses 13652 13883 +231
- Partials 2706 2778 +72 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
🚀 PR Preview Install Guide🧰 CLI updatenpm i -g https://pkg.pr.new/larksuite/cli/@larksuite/cli@9fa94ab3a26aad5efc77fec1fd210295c812bc6b🧩 Skill updatenpx skills add larksuite/cli#feat/docs-v2-xml-content-guard -y -g |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
shortcuts/doc/docs_update_check_test.go (1)
474-497: ⚡ Quick winUse
strings.Containsinstead of custom substring scanner.The
containsStrfunction at line 487 is locally scoped to this test file, used only once at line 479, and thestringspackage is already imported. Replace it withstrings.Containsto eliminate the unnecessary custom implementation.Proposed simplification
@@ for _, sub := range tt.wantContains { - if !containsStr(combined, sub) { + if !strings.Contains(combined, sub) { t.Errorf("expected warning to contain %q, got: %s", sub, combined) } } }) } } - -func containsStr(s, sub string) bool { - return len(s) >= len(sub) && (s == sub || len(sub) == 0 || - func() bool { - for i := 0; i+len(sub) <= len(s); i++ { - if s[i:i+len(sub)] == sub { - return true - } - } - return false - }()) -}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@shortcuts/doc/docs_update_check_test.go` around lines 474 - 497, The test defines a local containsStr function and calls it to check substrings; replace that call with the standard strings.Contains and remove the custom containsStr function to simplify the test (ensure the strings package is imported and used in the assertion where containsStr was invoked, and delete the containsStr function declaration to avoid redundancy).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@shortcuts/doc/docs_update_check_test.go`:
- Around line 474-497: The test defines a local containsStr function and calls
it to check substrings; replace that call with the standard strings.Contains and
remove the custom containsStr function to simplify the test (ensure the strings
package is imported and used in the assertion where containsStr was invoked, and
delete the containsStr function declaration to avoid redundancy).
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: ae956902-4be3-4375-a43b-58e4b2cd6efa
📒 Files selected for processing (2)
shortcuts/doc/docs_update_check.goshortcuts/doc/docs_update_check_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- shortcuts/doc/docs_update_check.go
fangshuyu-768
left a comment
There was a problem hiding this comment.
Thanks for adding these pre-flight checks — the overall approach of catching silent failures early is solid. I have a few concerns below, ordered by impact.
| // also matched — that is acceptable for a non-blocking warning. | ||
| var columnIntWidthRe = regexp.MustCompile(`<column\b[^>]*\swidth\s*=\s*['"]?\d+['"]?`) | ||
|
|
||
| // CheckV2XMLWarnings returns a list of non-fatal warnings for v2 XML content. |
There was a problem hiding this comment.
columnIntWidthRe produces false positives on float values.
The regex <column\b[^>]*\swidth\s*=\s*['"]?\d+['"]? will match width="0.5" because \d+ greedily matches the 0 before the dot, producing a spurious warning for a valid float value.
Suggested fix: use FindStringSubmatch to capture the attribute value and then check in Go code whether it's a pure integer (no decimal point), e.g.:
var columnWidthRe = regexp.MustCompile(`<column\b[^>]*\swidth\s*=\s*(['"])([^'"]+)\1`)
// then in CheckV2XMLWarnings:
for _, m := range columnWidthRe.FindAllStringSubmatch(content, -1) {
val := m[2]
if _, err := strconv.Atoi(val); err == nil {
// pure integer → warn
}
}Alternatively, if the regex approach is preferred, add a negative lookahead equivalent by excluding values containing a dot: \d+(?!\.\d) — but since Go's RE2 doesn't support lookahead, the code-based check is the way to go.
| if content == "" || !strings.Contains(content, "&") { | ||
| return "" | ||
| } | ||
| // Replace every valid entity with its same-length placeholder so positional |
There was a problem hiding this comment.
Comment says "same-length placeholder" but "ENTITY" is not same-length.
The comment states: "Replace every valid entity with its same-length placeholder so positional byte offsets are preserved", but "ENTITY" (6 bytes) is not the same length as < (4), & (5), A (5), etc. The parenthetical "(not required here, but avoids false positives)" also contradicts the stated purpose.
Since positional offsets aren't actually used anywhere, suggest simplifying the comment to something like:
// Replace every valid entity with a placeholder so that any remaining &
// must be a bare ampersand.| return true | ||
| } | ||
| } | ||
| return false |
There was a problem hiding this comment.
Custom containsStr helper is unnecessary — use strings.Contains.
This hand-rolled implementation replicates strings.Contains with added complexity (nested anonymous function). There's no performance concern here since this is test code.
Suggested replacement:
if !strings.Contains(combined, sub) {
t.Errorf("expected warning to contain %q, got: %s", sub, combined)
}| if runtime.Str("parent-token") != "" && runtime.Str("parent-position") != "" { | ||
| return common.FlagErrorf("--parent-token and --parent-position are mutually exclusive") | ||
| } | ||
| if runtime.Str("doc-format") != "markdown" { |
There was a problem hiding this comment.
Inconsistent empty-content guard between create and update paths.
Here in validateCreateV2 there's no content != "" guard before calling CheckV2XMLBareAmpersand, but in validateUpdateV2 (line ~106) there is one:
if runtime.Str("doc-format") != "markdown" && content != "" {While CheckV2XMLBareAmpersand handles empty strings internally, the inconsistency makes it look like an oversight. Same applies to the warning checks in executeCreateV2 vs executeUpdateV2.
Suggested: either add the guard here for consistency, or remove it from validateUpdateV2 and rely on the callee's internal check in both places.
| { | ||
| name: "column float width-ratio is fine", | ||
| content: `<grid><column width-ratio="0.5"><p>A</p></column></grid>`, | ||
| wantLen: 0, |
There was a problem hiding this comment.
Missing test case: mixed valid entities and bare ampersand.
There's no test for content that contains both valid entities and a bare &, e.g. "a & b & c". This is a common real-world scenario and would help verify that the replace-then-check approach works correctly when valid and invalid references coexist.
Suggested addition:
{name: "mixed valid entity and bare ampersand flagged", content: "a & b & c", wantErr: true},| { | ||
| name: "column single-quoted integer width triggers warning", | ||
| content: `<grid><column width='30'/></grid>`, | ||
| wantLen: 1, |
There was a problem hiding this comment.
Missing test case: width with float value should NOT trigger warning.
This is the flip side of the columnIntWidthRe false-positive issue. Adding a test for <column width="0.5"> would both document the intended behavior and catch the regex bug (it currently produces a false positive because \d+ matches the 0 in 0.5).
Suggested addition:
{
name: "column float width value is fine",
content: `<grid><column width="0.5"><p>A</p></column></grid>`,
wantLen: 0,
},
Summary
Add pre-flight static checks for the v2 XML document path to catch two categories of silent failures before the API call is made.
Changes
shortcuts/doc/docs_update_check.go: AddCheckV2XMLBareAmpersand(hard error) andCheckV2XMLWarnings(non-fatal warnings) with table-driven testsshortcuts/doc/docs_create_v2.go: Integrate bare-ampersand check invalidateCreateV2and XML warnings inexecuteCreateV2shortcuts/doc/docs_update_v2.go: Same integration invalidateUpdateV2/executeUpdateV2shortcuts/doc/docs_update_check_test.go: AddTestCheckV2XMLBareAmpersandandTestCheckV2XMLWarningsChecks
CheckV2XMLBareAmpersand— hard error (returned from Validate):&that is not a recognised XML entity (&,<,>,',",&#N;,&#xH;).CheckV2XMLWarnings— non-fatal warnings (printed to stderr before the API call):<quote-container>— v2 silently drops the block; the correct tag is<blockquote>.<column width="N">with an integer value — has no effect in v2; the correct attribute iswidth-ratio="0.5"(float 0–1).Both checks only fire when
--doc-formatisxml(the default).Test Plan
go test ./shortcuts/doc/...— all pass, 66.4% coveragego vet ./...— cleangofmt -l .— cleanSummary by CodeRabbit
New Features
Tests