Skip to content

Lezer grammar: keyword-prefixed participant names (e.g. ifService, whileWorker) are split by the lexer, stripping the keyword prefix from the name #807

@MrCoder

Description

@MrCoder

Summary

Participant names that start with a ZenUML keyword prefix are silently truncated by the editor's Lezer lexer. @Actor ifService is tokenized as IfKeyword("if") + Identifier("Service") rather than Identifier("ifService"), so the participant name recorded in the editor is "Service" not "ifService".

Steps to Reproduce

  1. Open the ZenUML editor.
  2. Type:
    @Actor ifService
    @Actor whileWorker
    
  3. On a new line type ifService. and observe the autocomplete popup.

Expected: ifService and whileWorker are offered as participant completions.

Actual: The popup shows Service and Worker (keyword prefix stripped). ifService and whileWorker are absent from the autocomplete list.

Verified with the Lezer parser directly

@Actor ifService  →  ⚠("if")  IfKeyword("if")  Name{ Identifier("Service") }
@Actor whileWorker →  ⚠("")   Loop("while")  WhileKeyword("while")  ⚠("")  Name{ Identifier("Worker") }

collectParticipants() reads the Name child text → adds "Service" and "Worker" to the set, not "ifService"/"whileWorker".

Location

  • Grammar definition: web/src/editor/grammar/zenuml.grammar lines 73–95 (keyword token definitions) and lines 104–142 (@precedence block inside @tokens).
  • Participant collector: web/src/editor/participantManager.ts collectParticipants() — correct logic, wrong input from the lexer.

Root Cause

Lezer keyword tokens defined as bare string literals (IfKeyword { "if" }) have no word-boundary requirement. Since keywords are listed before Identifier in the @tokens @precedence block, the lexer matches the keyword prefix of any identifier that begins with a keyword string and splits the token.

Fix Sketch

Use Lezer's @specialize mechanism (the idiomatic approach) to derive keywords from the Identifier token rather than defining them as independent string literals. This guarantees the lexer only produces a keyword token when the full identifier is exactly the keyword string:

// In @tokens — keep only Identifier as a primary token:
Identifier { $[a-zA-Z_] $[a-zA-Z_0-9]* }

// Outside @tokens — specialize keywords:
@specialize[@name=IfKeyword]<Identifier, "if">
@specialize[@name=WhileKeyword]<Identifier, "while">
// … etc. for all keywords

With @specialize, ifService is first lexed as Identifier("ifService"); the specializer only replaces it with IfKeyword when the full text is exactly "if". This is how lezer-generator's own examples (lezer/javascript, lezer/python) handle keyword/identifier disambiguation.

Found via the 100-case browser-test campaign (catalog-extended.spec.ts, case U6); adversarially verified.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions