Skip to content

fix: normalize language aliases in ShikiProvider and highlight hook#522

Merged
mattrothenberg merged 4 commits into
cloudflare:mainfrom
kale-stew:fix/language-alias-normalization
May 19, 2026
Merged

fix: normalize language aliases in ShikiProvider and highlight hook#522
mattrothenberg merged 4 commits into
cloudflare:mainfrom
kale-stew:fix/language-alias-normalization

Conversation

@kale-stew
Copy link
Copy Markdown
Contributor

@kale-stew kale-stew commented May 19, 2026

Fixes CFSA-633

Summary

Fixes language alias resolution so common aliases like js, ts, sh, yml, py, md, gql work without warnings in ShikiProvider and the highlight hook.

Problem

When using ShikiProvider with languages like ['js', 'ts'] or when CodeHighlighted receives lang="js" from markdown code fences, users see warnings:

[Kumo CodeHighlighted] Language "js" is not in the ShikiProvider's languages list.

This happens because Shiki's grammar names are canonical (e.g., javascript, typescript) but markdown code fences commonly use short aliases (js, ts, sh, etc.).

Solution

  1. Add LANGUAGE_ALIASES map with common aliases → canonical names
  2. Add normalizeLanguage() function that:
    • Returns canonical name if input is already canonical
    • Returns mapped canonical name if input is a known alias
    • Returns null if input is unknown
  3. Normalize languages in ShikiProvider before preloading grammars
  4. Normalize language in highlight() hook before checking support
  5. Export normalizeLanguage for consumers who need it

Supported Aliases

Alias Canonical
js, cjs, mjs javascript
ts, cts, mts typescript
sh, zsh bash
yml yaml
py python
md markdown
gql graphql

Intentionally Omitted

  • mdxmarkdown: MDX has a distinct grammar (Markdown + JSX) that would lose JSX highlighting if mapped to plain markdown.

Testing

After this change:

  • <ShikiProvider languages={['js', 'ts']}> works the same as languages={['javascript', 'typescript']}
  • <CodeHighlighted lang="js" > highlights correctly without warnings
  • Unknown languages still fall back gracefully with a warning

Related


  • Reviews
    • bonk has reviewed the change
    • automated review not possible because:
  • Tests
    • Tests included/updated
    • Automated tests not possible - manual testing has been completed as follows:
    • Additional testing not necessary because:

kale-stew added 3 commits May 19, 2026 14:30
- Add LANGUAGE_ALIASES map for common aliases (js, ts, sh, yml, py, md, gql, etc.)
- Add normalizeLanguage() function exported from provider
- Normalize languages in ShikiProvider before preloading grammars
- Normalize language in highlight() hook before checking support
- Update highlight() type signature to accept string (any language identifier)
- Intentionally omit mdx alias since MDX has distinct grammar from markdown

Fixes language alias resolution so 'js', 'ts', etc. work without warnings.
@mattrothenberg mattrothenberg self-requested a review May 19, 2026 22:29
@mattrothenberg
Copy link
Copy Markdown
Collaborator

/bonk please give this a thorough review!

@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new Bot commented May 19, 2026

npm i https://pkg.pr.new/@cloudflare/kumo@522

commit: a2fb3f6

@ask-bonk
Copy link
Copy Markdown
Contributor

ask-bonk Bot commented May 19, 2026

Thanks for this well-structured PR! The language alias normalization is a solid UX improvement. The implementation is clean, well-documented, and the test coverage is comprehensive.

However, I found one important bug that needs to be fixed before merging:

Bug: Context stores raw languages instead of normalized ones

In provider.tsx around line 200, the context value stores the raw languages prop (which may contain aliases), but useShikiHighlighter checks if the normalized language is in this array. This causes highlighting to fail when users pass aliases to ShikiProvider.

Example:

<ShikiProvider languages={["js", "ts"]}>
  <CodeHighlighted lang="js" code="const x = 1;" />
</ShikiProvider>
  1. Provider normalizes ["js", "ts"]["javascript", "typescript"] for loading grammars ✓
  2. But context.languages = ["js", "ts"] (raw prop with aliases)
  3. Hook normalizes lang="js""javascript"
  4. Check languages.includes("javascript") returns false because array contains ["js", "ts"]
  5. Code renders as plain text with warning ✗

Fix: Store the normalized validLanguages in the context instead of the raw languages prop. You'll need to track this in state since it's computed asynchronously in the effect.

Minor Suggestions

  1. Type consistency in types.ts:62: Now that aliases are supported, consider whether ShikiProviderProps["languages"] should accept string[] or remain strict to encourage canonical names.

  2. Test gap: The integration between provider normalization and hook checking isn't covered. The hook tests mock the context directly with canonical names, so this bug wasn't caught.

What I liked

  • The alias list is well-chosen (intentionally omitting mdx is the right call)
  • Nice use of Set for deduplication after normalization
  • Warning message suggests the normalized language name when an alias fails
  • Comprehensive test coverage for both unit and component-level tests

Overall this is great work—just need to fix that context language storage issue!

@ask-bonk
Copy link
Copy Markdown
Contributor

ask-bonk Bot commented May 19, 2026

Suggested fix for the context language storage issue:

The issue is that validLanguages (normalized) is computed in the effect but never stored. You need to track it in state:

// In provider.tsx, add to state:
const [state, setState] = useState<{
  highlighter: ShikiContextValue["highlighter"];
  isLoading: boolean;
  error: Error | null;
  languages: SupportedLanguage[]; // Add this
}>({
  highlighter: null,
  isLoading: true,
  error: null,
  languages: [], // Add this
});

// In initializeShiki success:
if (!cancelled) {
  setState({
    highlighter,
    isLoading: false,
    error: null,
    languages: validLanguages, // Store normalized languages
  });
}

// In context value:
const contextValue = useMemo<ShikiContextValue>(
  () => ({
    highlighter: state.highlighter,
    isLoading: state.isLoading,
    error: state.error,
    languages: state.languages, // Use normalized languages from state
    labels: mergedLabels,
  }),
  [state.highlighter, state.isLoading, state.error, state.languages, mergedLabels],
);

This ensures the context always contains canonical language names that match what the highlighter was actually loaded with.

@ask-bonk
Copy link
Copy Markdown
Contributor

ask-bonk Bot commented May 19, 2026

I've completed my review of PR #522. Here's a summary:

Review Summary

Overall: This is a well-structured PR that adds valuable functionality. The implementation is clean, well-documented, and has comprehensive test coverage.

Key Issue Found 🔴

Bug: Context stores raw languages instead of normalized ones

In provider.tsx, the context value stores the raw languages prop (which may contain aliases like 'js'), but useShikiHighlighter checks if the normalized language is in this array. This causes highlighting to fail when users pass aliases to ShikiProvider:

<ShikiProvider languages={["js", "ts"]}>
  <CodeHighlighted lang="js" code="const x = 1;" />
</ShikiProvider>

The provider correctly normalizes for loading grammars, but the context retains ["js", "ts"]. When the hook normalizes "js""javascript" and checks languages.includes("javascript"), it fails because the array still contains aliases.

Fix: Store the normalized validLanguages in state and expose that in the context instead of the raw prop.

Positive Feedback

  • ✅ The alias list is well-chosen (intentionally omitting mdx is the right call)
  • ✅ Nice use of Set for deduplication after normalization
  • ✅ Warning message suggests the normalized language name when an alias fails
  • ✅ Comprehensive test coverage for both unit and component-level tests
  • ✅ Good documentation in code comments and changeset

Once the context language storage issue is fixed, this PR should be good to merge!

github run

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 19, 2026

Docs Preview

View docs preview

Commit: a2fb3f6

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 19, 2026

Visual Regression Report — 5 changed, 33 unchanged

5 screenshot(s) with visual changes:

Button / Loading State

7 px (0.01%) changed

Before After Diff
Before After Diff

Dialog (Open)

0 px (0%) changed

Before After Diff
Before After Diff

Select / Select Loading

0 px (0%) changed

Before After Diff
Before After Diff

Select / Select Disabled Items

231 px (0.23%) changed

Before After Diff
Before After Diff

Select (Open)

3,293 px (0.01%) changed

Before After Diff
Before After Diff
33 screenshot(s) unchanged
  • Button / Basic
  • Button / Variant: Primary
  • Button / Variant: Secondary
  • Button / Variant: Ghost
  • Button / Variant: Destructive
  • Button / Variant: Outline
  • Button / Variant: Secondary Destructive
  • Button / Sizes
  • Button / With Icon
  • Button / Icon Only
  • Button / Disabled State
  • Button / Title
  • Button / Link as Button
  • Dialog / Dialog With Actions
  • Dialog / Dialog Basic
  • Dialog / Dialog Alert
  • Dialog / Dialog Confirmation
  • Dialog / Dialog With Select
  • Dialog / Dialog With Combobox
  • Dialog / Dialog With Dropdown
  • Select / Select Basic
  • Select / Select Sizes
  • Select / Select Without Label
  • Select / Select With Field
  • Select / Select Placeholder
  • Select / Select With Tooltip
  • Select / Select Custom Rendering
  • Select / Select Multiple
  • Select / Select Complex
  • Select / Select Disabled Options
  • Select / Select Grouped
  • Select / Select Grouped With Disabled
  • Select / Select Long List

Generated by Kumo Visual Regression

- Fix bug where context stored raw language aliases instead of normalized
  canonical names, causing highlight() to fail when ShikiProvider received
  aliases like ['js', 'ts']
- Move LANGUAGE_ALIASES to types.ts as const for single source of truth
- Add LanguageAlias and LanguageInput types derived from LANGUAGE_ALIASES
- Widen ShikiProviderProps.languages to accept LanguageInput[] (aliases + canonical)
- Widen CodeHighlightedProps.lang and highlight() to accept LanguageInput
- Add integration test covering alias-to-context round-tripping
- Export LANGUAGE_ALIASES, LanguageAlias, LanguageInput from @cloudflare/kumo/code
@mattrothenberg
Copy link
Copy Markdown
Collaborator

/bonk re-review. I've addressed your concerns!

@mattrothenberg mattrothenberg merged commit 94d0c22 into cloudflare:main May 19, 2026
13 of 14 checks passed
@ask-bonk
Copy link
Copy Markdown
Contributor

ask-bonk Bot commented May 19, 2026

@mattrothenberg Bonk workflow was cancelled.

View workflow run · To retry, trigger Bonk again.

@kale-stew kale-stew deleted the fix/language-alias-normalization branch May 20, 2026 02:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants