fix(intl): apply Collator comparison options#694
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub. 1 Skipped Deployment
|
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
💤 Files with no reviewable changes (1)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughAdds ICU numeric and case-first collation configuration and Unicode-extension-aware Collator construction; integrates Collator into String.localeCompare; implements stable merge-sort for Array.sort/toSorted; updates supported collation values and adds related tests. ChangesCollation Options and Unicode Extension Support
Stable Array Sorting
Sequence Diagram(s)sequenceDiagram
participant Client
participant IntlCollator
participant UnicodeExt as UnicodeExtensionParser
participant ICU as ICUCollator
Client->>IntlCollator: new Intl.Collator(locale, {numeric, caseFirst})
IntlCollator->>UnicodeExt: extract -u- keys (kn,kf,co)
UnicodeExt-->>IntlCollator: return key values
IntlCollator->>IntlCollator: reconcile options vs extensions, build FICULocale
Client->>IntlCollator: compare(str1, str2)
IntlCollator->>ICU: TryICUCompareStrings(FICULocale, str1, str2, ASensitivity, AIgnorePunctuation, ANumeric, ACaseFirst)
ICU-->>IntlCollator: comparison result
IntlCollator-->>Client: numeric/case-first aware result
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
Suite TimingTest Runner (interpreted: 9,875 passed; bytecode: 9,875 passed)
MemoryGC rows aggregate the main thread plus all worker thread-local GCs. Test runner worker shutdown frees thread-local heaps in bulk; that shutdown reclamation is not counted as GC collections or collected objects.
Benchmarks (interpreted: 407; bytecode: 407)
MemoryGC rows aggregate the main thread plus all worker thread-local GCs. Benchmark runner performs explicit between-file collections, so collection and collected-object counts can be much higher than the test runner.
Measured on ubuntu-latest x64. |
Benchmark Results407 benchmarks Interpreted: 🟢 348 improved · 🔴 27 regressed · 32 unchanged · avg +10.7% arraybuffer.js — Interp: 🟢 9, 5 unch. · avg +6.7% · Bytecode: 🟢 3, 🔴 9, 2 unch. · avg -3.8%
arrays.js — Interp: 🟢 17, 2 unch. · avg +17.1% · Bytecode: 🟢 1, 🔴 10, 8 unch. · avg -0.8%
async-await.js — Interp: 🟢 5, 1 unch. · avg +11.6% · Bytecode: 🔴 5, 1 unch. · avg -7.0%
async-generators.js — Interp: 🟢 2 · avg +13.4% · Bytecode: 🔴 2 · avg -10.3%
base64.js — Interp: 🟢 3, 🔴 5, 2 unch. · avg +1.6% · Bytecode: 🔴 7, 3 unch. · avg -4.3%
classes.js — Interp: 🟢 29, 2 unch. · avg +9.7% · Bytecode: 🟢 5, 🔴 17, 9 unch. · avg -1.6%
closures.js — Interp: 🟢 9, 2 unch. · avg +8.1% · Bytecode: 🟢 1, 🔴 7, 3 unch. · avg -3.9%
collections.js — Interp: 🟢 12 · avg +10.9% · Bytecode: 🔴 12 · avg -11.4%
csv.js — Interp: 🟢 13 · avg +12.3% · Bytecode: 🟢 1, 🔴 1, 11 unch. · avg +0.8%
destructuring.js — Interp: 🟢 22 · avg +10.4% · Bytecode: 🔴 13, 9 unch. · avg -5.3%
fibonacci.js — Interp: 🟢 8 · avg +13.8% · Bytecode: 🔴 3, 5 unch. · avg -3.7%
float16array.js — Interp: 🟢 27, 5 unch. · avg +7.6% · Bytecode: 🟢 7, 🔴 21, 4 unch. · avg -0.4%
for-of.js — Interp: 🟢 6, 1 unch. · avg +5.7% · Bytecode: 🔴 3, 4 unch. · avg -3.2%
generators.js — Interp: 🟢 4 · avg +11.6% · Bytecode: 🔴 3, 1 unch. · avg -5.4%
iterators.js — Interp: 🟢 41, 1 unch. · avg +10.4% · Bytecode: 🟢 20, 🔴 5, 17 unch. · avg +1.6%
json.js — Interp: 🟢 20 · avg +10.8% · Bytecode: 🟢 1, 🔴 16, 3 unch. · avg -5.5%
jsx.jsx — Interp: 🟢 20, 1 unch. · avg +11.5% · Bytecode: 🔴 19, 2 unch. · avg -4.8%
modules.js — Interp: 🟢 9 · avg +12.3% · Bytecode: 🟢 4, 5 unch. · avg +1.9%
numbers.js — Interp: 🟢 11 · avg +13.0% · Bytecode: 🟢 5, 🔴 3, 3 unch. · avg +0.5%
objects.js — Interp: 🟢 7 · avg +13.8% · Bytecode: 🔴 7 · avg -7.1%
promises.js — Interp: 🟢 7, 🔴 1, 4 unch. · avg +2.3% · Bytecode: 🔴 9, 3 unch. · avg -7.2%
regexp.js — Interp: 🟢 5, 🔴 5, 1 unch. · avg +1.4% · Bytecode: 🟢 2, 🔴 6, 3 unch. · avg +0.4%
strings.js — Interp: 🟢 19 · avg +13.5% · Bytecode: 🟢 1, 🔴 8, 10 unch. · avg -2.9%
tsv.js — Interp: 🟢 6, 🔴 1, 2 unch. · avg +4.9% · Bytecode: 🟢 4, 🔴 1, 4 unch. · avg +1.8%
typed-arrays.js — Interp: 🟢 22 · avg +16.6% · Bytecode: 🟢 5, 🔴 13, 4 unch. · avg +1.0%
uint8array-encoding.js — Interp: 🔴 15, 3 unch. · avg -26.2% · Bytecode: 🟢 7, 🔴 4, 7 unch. · avg +16.5%
weak-collections.js — Interp: 🟢 15 · avg +65.5% · Bytecode: 🔴 14, 1 unch. · avg -10.7%
Deterministic profile diffarrays
Measured on ubuntu-latest x64. Benchmark ranges compare cached main-branch min/max ops/sec with the PR run; overlapping ranges are treated as unchanged noise. Percentage deltas are secondary context. |
test262 Conformance
Areas closest to 100%
Per-test deltas (+23 / -0)Newly passing (23):
Steady-state failures are non-blocking; regressions vs the cached main baseline (lower total pass count, or any PASS → non-PASS transition) fail the conformance gate. Measured on ubuntu-latest x64, bytecode mode. Areas grouped by the first two test262 path components; minimum 25 attempted tests, areas already at 100% excluded. Δ vs main compares against the most recent cached |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@source/units/Goccia.Values.ArrayValue.pas`:
- Around line 896-917: StableSortElementsDefault currently uses DefaultCompare
on the raw values which can place "undefined" before strings; update the default
stable sort path to always treat undefined as greater than any other value so
undefineds are moved to the end. In practice, modify the comparison used inside
StableSortElementsDefault (and the analogous routines at the other occurrences)
to check for an undefined TGocciaValue (use whatever sentinel/variant check
exists for undefined in TGocciaValue/TGocciaValueList) and consider undefined >
any non-undefined value; if both are undefined treat them equal, otherwise defer
to DefaultCompare for non-undefined vs non-undefined comparisons. Ensure the
change is applied to the other same-pattern sorting functions referenced in the
review.
In `@source/units/Goccia.Values.IntlCollator.pas`:
- Around line 485-491: In TGocciaIntlCollatorValue.Create, after reading
'caseFirst' (CaseFirstOptionPresent and Ignored via TryReadStringOption) and
after the existing NUL check, validate Ignored against permitted values using
IsValidCaseFirstValue; if invalid call
ThrowRangeError(Format(SErrorIntlInvalidOption, [Ignored, 'caseFirst'])); only
then assign FCaseFirst := Ignored so IntlCollatorResolvedOptions returns a
validated value (mirror the existing -u-kf handling logic).
In `@source/units/Goccia.Values.StringObjectValue.pas`:
- Around line 1955-1960: TGocciaStringObjectValue.StringLocaleCompare currently
only accepts the third argument when it's already a TGocciaObjectValue, ignoring
primitive option values; change the logic so when AArgs.Length > 2 and the third
argument is not the undefined value you call the runtime ToObject coercion (or
equivalent helper) to convert primitives to an object and assign that to Options
(if it's already a TGocciaObjectValue keep the cast), so that Option property
access follows ECMA‑402 semantics before passing Options into
TGocciaIntlCollatorValue.Create.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 87843f84-7ad2-4a18-b2de-17e0714d2dfe
📒 Files selected for processing (10)
source/shared/IntlICU.passource/units/Goccia.Builtins.Intl.passource/units/Goccia.Values.ArrayValue.passource/units/Goccia.Values.IntlCollator.passource/units/Goccia.Values.StringObjectValue.pastests/built-ins/Array/prototype/sort.jstests/built-ins/Intl/Collator/constructor.jstests/built-ins/Intl/Collator/prototype/compare.jstests/built-ins/Intl/supportedValuesOf.jstests/built-ins/String/prototype/localeCompare.js
💤 Files with no reviewable changes (1)
- source/units/Goccia.Builtins.Intl.pas
…issue-602-intl-collator-options
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
source/units/Goccia.Builtins.Intl.pas (1)
215-229:⚠️ Potential issue | 🟠 Major | ⚡ Quick winFix
Intl.supportedValuesOf('collation')to not under-report supported collations
source/units/Goccia.Builtins.Intl.paslists 12 collation identifiers (compat, dict, emoji, eor, phonebk, phonetic, pinyin, searchjl, stroke, trad, unihan, zhuyin), butsource/units/Goccia.Values.IntlCollator.pas’sSupportedCollationscontains 17 identifiers and also includesbig5han,direct,ducet,gb2312, andreformed. Update thesupportedValuesOf('collation')output to match (or explicitly document why these five are omitted).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@source/units/Goccia.Builtins.Intl.pas` around lines 215 - 229, The collation list in the Key = 'collation' branch of AddString calls in Goccia.Builtins.Intl.pas is missing five identifiers present in Goccia.Values.IntlCollator.pas.SupportedCollations; update the block that currently adds 'compat','dict','emoji','eor','phonebk','phonetic','pinyin','searchjl','stroke','trad','unihan','zhuyin' to also AddString('big5han'), AddString('direct'), AddString('ducet'), AddString('gb2312'), and AddString('reformed') so supportedValuesOf('collation') matches SupportedCollations (or alternatively add a comment documenting why any identifiers are intentionally omitted).
🧹 Nitpick comments (1)
source/units/Goccia.Builtins.Intl.pas (1)
708-757: ⚖️ Poor tradeoffConsider extracting the locale-argument helper to a shared unit.
CollatorLocaleArgumentToLocaleduplicates the same logic asLocaleCompareArgumentToLocaleinGoccia.Values.StringObjectValue.pas(lines 131–180). Both validate/canonicalize locale arguments with identical type checks and duplicate--u--extension handling.Since both implementations are identical and this PR is focused on collation options, defer extraction to a future refactor (e.g., move to
IntlLocaleResolveror a new shared locale-parsing utility). For now, the duplication is acceptable.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@source/units/Goccia.Builtins.Intl.pas` around lines 708 - 757, The locale-parsing/canonicalization logic in CollatorLocaleArgumentToLocale duplicates LocaleCompareArgumentToLocale; add a short comment/TODO at the top of CollatorLocaleArgumentToLocale (and optionally LocaleCompareArgumentToLocale) noting the duplication and that this logic should be extracted into a shared IntlLocaleResolver/locale-parsing utility in a future refactor, referencing the function names CollatorLocaleArgumentToLocale and LocaleCompareArgumentToLocale so maintainers can find and consolidate the code later.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@source/units/Goccia.Builtins.Intl.pas`:
- Around line 215-229: The collation list in the Key = 'collation' branch of
AddString calls in Goccia.Builtins.Intl.pas is missing five identifiers present
in Goccia.Values.IntlCollator.pas.SupportedCollations; update the block that
currently adds
'compat','dict','emoji','eor','phonebk','phonetic','pinyin','searchjl','stroke','trad','unihan','zhuyin'
to also AddString('big5han'), AddString('direct'), AddString('ducet'),
AddString('gb2312'), and AddString('reformed') so supportedValuesOf('collation')
matches SupportedCollations (or alternatively add a comment documenting why any
identifiers are intentionally omitted).
---
Nitpick comments:
In `@source/units/Goccia.Builtins.Intl.pas`:
- Around line 708-757: The locale-parsing/canonicalization logic in
CollatorLocaleArgumentToLocale duplicates LocaleCompareArgumentToLocale; add a
short comment/TODO at the top of CollatorLocaleArgumentToLocale (and optionally
LocaleCompareArgumentToLocale) noting the duplication and that this logic should
be extracted into a shared IntlLocaleResolver/locale-parsing utility in a future
refactor, referencing the function names CollatorLocaleArgumentToLocale and
LocaleCompareArgumentToLocale so maintainers can find and consolidate the code
later.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: a9e195a1-504b-4623-90be-e9a2be6f5e5d
📒 Files selected for processing (8)
source/units/Goccia.Builtins.Intl.passource/units/Goccia.Values.ArrayValue.passource/units/Goccia.Values.IntlCollator.passource/units/Goccia.Values.StringObjectValue.pastests/built-ins/Array/prototype/sort.jstests/built-ins/Array/prototype/toSorted.jstests/built-ins/Intl/Collator/constructor.jstests/built-ins/String/prototype/localeCompare.js
✅ Files skipped from review due to trivial changes (1)
- tests/built-ins/Array/prototype/toSorted.js
🚧 Files skipped from review as they are similar to previous changes (2)
- tests/built-ins/Intl/Collator/constructor.js
- source/units/Goccia.Values.IntlCollator.pas
Summary
Intl.Collator'snumericoption into ICU comparison and enableUCOL_NUMERIC_COLLATIONwhen requested.Testing
./build.pas loader && printf 'console.log(new Intl.Collator("en", {sensitivity: "base"}).compare("a", "A"));\nconsole.log(new Intl.Collator("en", {numeric: true}).compare("2", "10"));\n' | ./build/GocciaScriptLoader./build.pas testrunner && ./build/GocciaTestRunner tests/built-ins/Intl/Collator/prototype/compare.js./build/GocciaTestRunner tests/built-ins/Intl/Collator./fixtures/ffi/build.sh && ./build/GocciaTestRunner tests/built-ins/FFI./build.pas testrunner && ./build/GocciaTestRunner testsbun scripts/run_test262_suite.ts --suite-dir /tmp/goccia-test262-602 --categories intl402 --filter 'intl402/Collator/**/*sensitivity*' --mode bytecode --jobs 1 --verboseIntl.Collatoroptions.