fix: stop leaving stray commas when parsing legacy multi-group PR links#286
Conversation
|
I don't know that I like that we are adding even more regex parsing to what is already a pretty wild function. I wonder if it would be a lot easier to read if we parsed a changelog entry in steps (first, separate words from links; then strip out parentheses; then split by comma; then attempt to parse the rest). But.... I guess that would require a larger refactor, so, maybe this is fine for now. Thanks for fixing this. One question I have is, does this fix stray commas, or just prevent them from showing up going forward? |
Thanks for the review! Two answers:
I considered adding a heuristic to also strip dangling commas in the parser, but decided against it, it'd be permanent code that exists only to compensate for a finite historical bug. On the refactor: agreed the function is wild. Maybe we can refactor in a follow up PR? |
|
@hmalik88 Okay no problem. I have plans to update this tool so it's more strict about formatting and so that the autofix functionality is more useful. So we can address fixing the extra commas then. |
Summary
extractPrLinksleaves stray commas in change descriptions when parsing the legacy form([#A](url)), ([#B](url)). The mangled output surfaces wheneverauto-changelogre-stringifies a changelog that contains historical entries in that form. The release tool (@metamask/create-release-branch) hits this on every release run becauseupdateChangelogcallsparseChangelog({ shouldExtractPrLinks: true }).Repro
Older changelog entry:
After parse + re-stringify:
One stray comma per inter-group separator. PR references are still extracted correctly. Only the description text is polluted.
Root cause
src/parse-changelog.tsmatches long PR-link groups with this regex:The pattern matches a single parenthesized group. When the input is
([#A]), ([#B]), ([#C]), each([#X])is matched as its own group. The,separators sit between matches, outside the pattern, soString.replace(pattern, '')leaves them in place and they end up tacked onto the description.Fix
Extend the pattern so one match consumes the full sequence of adjacent groups, including the comma separators between them:
Three shapes are now handled correctly:
([#A](url), [#B](url))([#A](url)), ([#B](url))([#A](url), [#B](url)), ([#C](url))longMatchPattern(PR-number extraction) is unchanged. It finds individual[#X](url)parts regardless of grouping.Use cases this fixes
Any caller of
parseChangelog/updateChangelogwhose input contains historical entries in the legacy multi-group form. Concretely:@metamask/create-release-branchrunning on monorepos. It callsupdateChangelogper changed package, so any package with legacy-form entries in older release sections gets mangled on its next release.shouldExtractPrLinks: truetoparseChangelog.PR numbers are still extracted correctly, so no data is lost. The damage is to the description text, which has to be cleaned up by hand before merging the release PR.
Note
Medium Risk
Medium risk because it changes the core regex used to strip long-form PR links from changelog entries, which could affect parsing edge cases; added tests cover the new legacy and mixed formats.
Overview
Fixes
extractPrLinkslong-form PR link stripping to treat multiple adjacent parenthesized link groups separated by commas as a single match, preventing leftover comma characters in the resulting change description.Adds regression tests for legacy
([#A](url)), ([#B](url))and mixed canonical/legacy groupings, and documents the fix inCHANGELOG.md.Reviewed by Cursor Bugbot for commit 71bee80. Bugbot is set up for automated code reviews on this repo. Configure here.