Skip to content

fix: stop leaving stray commas when parsing legacy multi-group PR links#286

Merged
hmalik88 merged 4 commits into
mainfrom
hm/fix-oxfmt-formatting
May 15, 2026
Merged

fix: stop leaving stray commas when parsing legacy multi-group PR links#286
hmalik88 merged 4 commits into
mainfrom
hm/fix-oxfmt-formatting

Conversation

@hmalik88
Copy link
Copy Markdown
Contributor

@hmalik88 hmalik88 commented May 13, 2026

Summary

extractPrLinks leaves stray commas in change descriptions when parsing the legacy form ([#A](url)), ([#B](url)). The mangled output surfaces whenever auto-changelog re-stringifies a changelog that contains historical entries in that form. The release tool (@metamask/create-release-branch) hits this on every release run because updateChangelog calls parseChangelog({ shouldExtractPrLinks: true }).

Repro

Older changelog entry:

- Add new dependency `@metamask/keyring-sdk@1.2.0` ([#478](.../pull/478)), ([#482](.../pull/482)), ([#496](.../pull/496))

After parse + re-stringify:

- Add new dependency `@metamask/keyring-sdk@1.2.0`,,, ([#478](.../pull/478), [#482](.../pull/482), [#496](.../pull/496))

One stray comma per inter-group separator. PR references are still extracted correctly. Only the description text is polluted.

Root cause

src/parse-changelog.ts matches long PR-link groups with this regex:

\s+\(\s*(\[#\d+\]\([^)]*${repoName}[^)]*\)\s*,?\s*)+\)

The pattern matches a single parenthesized group. When the input is ([#A]), ([#B]), ([#C]), each ([#X]) is matched as its own group. The , separators sit between matches, outside the pattern, so String.replace(pattern, '') leaves them in place and they end up tacked onto the description.

Fix

Extend the pattern so one match consumes the full sequence of adjacent groups, including the comma separators between them:

const longLinkGroup = `\\(\\s*(?:\\[#\\d+\\]\\([^)]*${repoName}[^)]*\\)\\s*,?\\s*)+\\)`;
new RegExp(`\\s+${longLinkGroup}(?:\\s*,\\s*${longLinkGroup})*`, 'gu');

Three shapes are now handled correctly:

  1. Canonical, one group with comma-separated refs: ([#A](url), [#B](url))
  2. Legacy, adjacent single-ref groups: ([#A](url)), ([#B](url))
  3. Mixed: ([#A](url), [#B](url)), ([#C](url))

longMatchPattern (PR-number extraction) is unchanged. It finds individual [#X](url) parts regardless of grouping.

Use cases this fixes

Any caller of parseChangelog / updateChangelog whose input contains historical entries in the legacy multi-group form. Concretely:

  • @metamask/create-release-branch running on monorepos. It calls updateChangelog per changed package, so any package with legacy-form entries in older release sections gets mangled on its next release.
  • Direct callers that pass shouldExtractPrLinks: true to parseChangelog.

PR numbers are still extracted correctly, so no data is lost. The damage is to the description text, which has to be cleaned up by hand before merging the release PR.


Note

Medium Risk
Medium risk because it changes the core regex used to strip long-form PR links from changelog entries, which could affect parsing edge cases; added tests cover the new legacy and mixed formats.

Overview
Fixes extractPrLinks long-form PR link stripping to treat multiple adjacent parenthesized link groups separated by commas as a single match, preventing leftover comma characters in the resulting change description.

Adds regression tests for legacy ([#A](url)), ([#B](url)) and mixed canonical/legacy groupings, and documents the fix in CHANGELOG.md.

Reviewed by Cursor Bugbot for commit 71bee80. Bugbot is set up for automated code reviews on this repo. Configure here.

@hmalik88 hmalik88 marked this pull request as ready for review May 13, 2026 20:51
@hmalik88 hmalik88 requested a review from a team as a code owner May 13, 2026 20:51
Comment thread src/parse-changelog.ts Dismissed
@mcmire
Copy link
Copy Markdown
Contributor

mcmire commented May 13, 2026

I don't know that I like that we are adding even more regex parsing to what is already a pretty wild function. I wonder if it would be a lot easier to read if we parsed a changelog entry in steps (first, separate words from links; then strip out parentheses; then split by comma; then attempt to parse the rest).

But.... I guess that would require a larger refactor, so, maybe this is fine for now. Thanks for fixing this.

One question I have is, does this fix stray commas, or just prevent them from showing up going forward?

@hmalik88
Copy link
Copy Markdown
Contributor Author

I don't know that I like that we are adding even more regex parsing to what is already a pretty wild function. I wonder if it would be a lot easier to read if we parsed a changelog entry in steps (first, separate words from links; then strip out parentheses; then split by comma; then attempt to parse the rest).

But.... I guess that would require a larger refactor, so, maybe this is fine for now. Thanks for fixing this.

One question I have is, does this fix stray commas, or just prevent them from showing up going forward?

I don't know that I like that we are adding even more regex parsing to what is already a pretty wild function. I wonder if it would be a lot easier to read if we parsed a changelog entry in steps (first, separate words from links; then strip out parentheses; then split by comma; then attempt to parse the rest).

But.... I guess that would require a larger refactor, so, maybe this is fine for now. Thanks for fixing this.

One question I have is, does this fix stray commas, or just prevent them from showing up going forward?

Thanks for the review!

Two answers:

  • Legacy form sitting in older release sections (([#A](url)), ([#B](url)), never been corrupted yet): yes, this fix means the next create-release-branch run produces clean canonical output. Self-heals.
  • Already-corrupted descriptions (Add new dep, ([#A](url), [#B](url)), where the previous buggy parser wrote stray commas into the file): no, this fix doesn't retroactively clean them. From the parser's perspective those commas are now part of the description prose. Affected repos need a one-time manual scrub.

I considered adding a heuristic to also strip dangling commas in the parser, but decided against it, it'd be permanent code that exists only to compensate for a finite historical bug.

On the refactor: agreed the function is wild. Maybe we can refactor in a follow up PR?

@mcmire
Copy link
Copy Markdown
Contributor

mcmire commented May 15, 2026

@hmalik88 Okay no problem. I have plans to update this tool so it's more strict about formatting and so that the autofix functionality is more useful. So we can address fixing the extra commas then.

Copy link
Copy Markdown
Contributor

@mcmire mcmire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@hmalik88 hmalik88 merged commit 16f7742 into main May 15, 2026
26 checks passed
@hmalik88 hmalik88 deleted the hm/fix-oxfmt-formatting branch May 15, 2026 16:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants