Skip to content

Strings cleaners improvements #OCTranspo #2Locales (Part 2)#37

Closed
mmathieum wants to merge 1 commit into
mm/strings_cleaner_improvments_2localesfrom
master
Closed

Strings cleaners improvements #OCTranspo #2Locales (Part 2)#37
mmathieum wants to merge 1 commit into
mm/strings_cleaner_improvments_2localesfrom
master

Conversation

@mmathieum

Copy link
Copy Markdown
Member

No description provided.

@mmathieum mmathieum self-assigned this Jun 4, 2026
@mmathieum mmathieum closed this Jun 4, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors string cleaning utilities, including updating casing logic thresholds, avoiding lowercasing single letters, and migrating several regex replacements from CleanUtils to local Regex instances in StringsCleaner. It also updates the second language removal regex in OttawaOCTranspoProviderCommons and adds corresponding unit tests. Feedback suggests removing the redundant FR_STATION_AND_NAME regex since 'station' is spelled identically in English and French, and optimizing the regex in OttawaOCTranspoProviderCommons by using a greedy quantifier instead of a lazy one to prevent unnecessary backtracking.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +44 to +45
private val FR_STATION_AND_NAME = Regex("""(?U)(^|\s+)station\s+(\w+)""", RegexOption.IGNORE_CASE)
private const val FR_STATION_AND_NAME_REPLACEMENT = "$1$2"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The regular expression FR_STATION_AND_NAME and its replacement string are completely identical to STATION_AND_NAME and STATION_AND_NAME_REPLACEMENT. Since "station" is spelled the same way in both English and French, you can remove these duplicate definitions and reuse the English ones to reduce redundancy.

1, // subway
-> {
tripHeadsign = CleanUtils.FR_CA_STATION.matcher(tripHeadsign).replaceAll(EMPTY)
tripHeadsign = FR_STATION_AND_NAME.replace(tripHeadsign, FR_STATION_AND_NAME_REPLACEMENT)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Use STATION_AND_NAME and STATION_AND_NAME_REPLACEMENT instead of the duplicate French definitions.

                    tripHeadsign = STATION_AND_NAME.replace(tripHeadsign, STATION_AND_NAME_REPLACEMENT)

when (routeType) {
1, // subway
-> {
stopName = FR_STATION_AND_NAME.replace(stopName, FR_STATION_AND_NAME_REPLACEMENT)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Use STATION_AND_NAME and STATION_AND_NAME_REPLACEMENT instead of the duplicate French definitions.

                    stopName = STATION_AND_NAME.replace(stopName, STATION_AND_NAME_REPLACEMENT)

private val FIX_CAIRINE_WILSON_REPLACEMENT = CleanUtils.cleanWordsReplacement("Cairine Wilson")

private val REMOVE_SECOND_LANGUAGE = Pattern.compile("( ~ .*$)") // FIXME i18n head-signs
private val REMOVE_SECOND_LANGUAGE = Pattern.compile("(\\s+~\\s+[^<>]+?)(?=\\s*<>|$)") // FIXME i18n head-signs

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The regular expression pattern uses a lazy quantifier +? inside [^<>]+?. Since the character class [^<>] already prevents matching across < or >, and it is followed by a lookahead (?=\s*<>|$), using a greedy quantifier + is more efficient and avoids unnecessary backtracking.

Suggested change
private val REMOVE_SECOND_LANGUAGE = Pattern.compile("(\\s+~\\s+[^<>]+?)(?=\\s*<>|$)") // FIXME i18n head-signs
private val REMOVE_SECOND_LANGUAGE = Pattern.compile("(\\s+~\\s+[^<>]+)(?=\\s*<>|$)") // FIXME i18n head-signs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant