|
| 1 | +# Semantic Tokens Reference |
| 2 | + |
| 3 | +This document provides a reference for semantic token types and modifiers returned by each LSP backend. This is particularly useful when integrating with editors like Monaco that need to map token IDs to theme colors. |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +Semantic tokens provide richer syntax highlighting than traditional TextMate grammars by leveraging the language server's understanding of the code. The LSP protocol encodes tokens as a compact integer array where each token is represented by 5 values. |
| 8 | + |
| 9 | +## Token Encoding Format |
| 10 | + |
| 11 | +Each token in the `data` array consists of 5 consecutive integers: |
| 12 | + |
| 13 | +| Position | Field | Description | |
| 14 | +|----------|-------|-------------| |
| 15 | +| 0 | `deltaLine` | Line offset from previous token (or 0 for first token) | |
| 16 | +| 1 | `deltaStart` | Column offset from previous token on same line (or from 0 if new line) | |
| 17 | +| 2 | `length` | Token length in characters | |
| 18 | +| 3 | `tokenType` | Index into the legend's `tokenTypes` array | |
| 19 | +| 4 | `tokenModifiers` | Bitmask of modifiers from the legend's `tokenModifiers` array | |
| 20 | + |
| 21 | +### Decoding Token Modifiers |
| 22 | + |
| 23 | +The `tokenModifiers` value is a bitmask. To check if a modifier applies: |
| 24 | + |
| 25 | +```python |
| 26 | +def has_modifier(token_modifiers: int, modifier_index: int) -> bool: |
| 27 | + return (token_modifiers & (1 << modifier_index)) != 0 |
| 28 | +``` |
| 29 | + |
| 30 | +For example, if `tokenModifiers = 5` (binary `101`), modifiers at index 0 and 2 are active. |
| 31 | + |
| 32 | +## How to Get the Legend |
| 33 | + |
| 34 | +The legend is provided by the server during initialization in `InitializeResult.capabilities.semanticTokensProvider.legend`. You can extract it using: |
| 35 | + |
| 36 | +```python |
| 37 | +from lsp_types.process import LSPProcess |
| 38 | + |
| 39 | +async with LSPProcess(process_info) as process: |
| 40 | + init_result = await process.send.initialize({...}) |
| 41 | + legend = init_result["capabilities"]["semanticTokensProvider"]["legend"] |
| 42 | + token_types = legend["tokenTypes"] # List of type names |
| 43 | + token_modifiers = legend["tokenModifiers"] # List of modifier names |
| 44 | +``` |
| 45 | + |
| 46 | +See `examples/extract_semantic_legends.py` for a complete working example. |
| 47 | + |
| 48 | +--- |
| 49 | + |
| 50 | +## Token Legends by Backend |
| 51 | + |
| 52 | +### Pyright (basedpyright) |
| 53 | + |
| 54 | +> Last verified: basedpyright 1.36.2 |
| 55 | +
|
| 56 | +#### Token Types |
| 57 | + |
| 58 | +| Index | Token Type | |
| 59 | +|------:|------------| |
| 60 | +| 0 | `namespace` | |
| 61 | +| 1 | `type` | |
| 62 | +| 2 | `class` | |
| 63 | +| 3 | `enum` | |
| 64 | +| 4 | `typeParameter` | |
| 65 | +| 5 | `parameter` | |
| 66 | +| 6 | `variable` | |
| 67 | +| 7 | `property` | |
| 68 | +| 8 | `enumMember` | |
| 69 | +| 9 | `function` | |
| 70 | +| 10 | `method` | |
| 71 | +| 11 | `keyword` | |
| 72 | +| 12 | `decorator` | |
| 73 | +| 13 | `selfParameter` | |
| 74 | +| 14 | `clsParameter` | |
| 75 | + |
| 76 | +#### Token Modifiers |
| 77 | + |
| 78 | +| Bit | Modifier | |
| 79 | +|----:|----------| |
| 80 | +| 0 | `declaration` | |
| 81 | +| 1 | `definition` | |
| 82 | +| 2 | `readonly` | |
| 83 | +| 3 | `static` | |
| 84 | +| 4 | `async` | |
| 85 | +| 5 | `defaultLibrary` | |
| 86 | +| 6 | `builtin` | |
| 87 | +| 7 | `classMember` | |
| 88 | +| 8 | `parameter` | |
| 89 | + |
| 90 | +--- |
| 91 | + |
| 92 | +### Pyrefly |
| 93 | + |
| 94 | +> Last verified: Pyrefly 0.48.2 |
| 95 | +> Legend source: [semantic_tokens.rs](https://github.com/facebook/pyrefly/blob/main/pyrefly/lib/state/semantic_tokens.rs) |
| 96 | +
|
| 97 | +Pyrefly does not advertise its legend via LSP initialization, but the token mappings are defined in source code. |
| 98 | + |
| 99 | +#### Token Types |
| 100 | + |
| 101 | +| Index | Token Type | |
| 102 | +|------:|------------| |
| 103 | +| 0 | `namespace` | |
| 104 | +| 1 | `type` | |
| 105 | +| 2 | `class` | |
| 106 | +| 3 | `enum` | |
| 107 | +| 4 | `interface` | |
| 108 | +| 5 | `struct` | |
| 109 | +| 6 | `typeParameter` | |
| 110 | +| 7 | `parameter` | |
| 111 | +| 8 | `variable` | |
| 112 | +| 9 | `property` | |
| 113 | +| 10 | `enumMember` | |
| 114 | +| 11 | `event` | |
| 115 | +| 12 | `function` | |
| 116 | +| 13 | `method` | |
| 117 | +| 14 | `macro` | |
| 118 | +| 15 | `keyword` | |
| 119 | +| 16 | `modifier` | |
| 120 | +| 17 | `comment` | |
| 121 | +| 18 | `string` | |
| 122 | +| 19 | `number` | |
| 123 | +| 20 | `regexp` | |
| 124 | +| 21 | `operator` | |
| 125 | +| 22 | `decorator` | |
| 126 | + |
| 127 | +#### Token Modifiers |
| 128 | + |
| 129 | +| Bit | Modifier | |
| 130 | +|----:|----------| |
| 131 | +| 0 | `declaration` | |
| 132 | +| 1 | `definition` | |
| 133 | +| 2 | `readonly` | |
| 134 | +| 3 | `static` | |
| 135 | +| 4 | `deprecated` | |
| 136 | +| 5 | `abstract` | |
| 137 | +| 6 | `async` | |
| 138 | +| 7 | `modification` | |
| 139 | +| 8 | `documentation` | |
| 140 | +| 9 | `defaultLibrary` | |
| 141 | + |
| 142 | +--- |
| 143 | + |
| 144 | +### ty |
| 145 | + |
| 146 | +> Last verified: ty 0.0.12 |
| 147 | +
|
| 148 | +#### Token Types |
| 149 | + |
| 150 | +| Index | Token Type | |
| 151 | +|------:|------------| |
| 152 | +| 0 | `namespace` | |
| 153 | +| 1 | `class` | |
| 154 | +| 2 | `parameter` | |
| 155 | +| 3 | `selfParameter` | |
| 156 | +| 4 | `clsParameter` | |
| 157 | +| 5 | `variable` | |
| 158 | +| 6 | `property` | |
| 159 | +| 7 | `function` | |
| 160 | +| 8 | `method` | |
| 161 | +| 9 | `keyword` | |
| 162 | +| 10 | `string` | |
| 163 | +| 11 | `number` | |
| 164 | +| 12 | `decorator` | |
| 165 | +| 13 | `builtinConstant` | |
| 166 | +| 14 | `typeParameter` | |
| 167 | + |
| 168 | +#### Token Modifiers |
| 169 | + |
| 170 | +| Bit | Modifier | |
| 171 | +|----:|----------| |
| 172 | +| 0 | `definition` | |
| 173 | +| 1 | `readonly` | |
| 174 | +| 2 | `async` | |
| 175 | +| 3 | `documentation` | |
| 176 | + |
| 177 | +--- |
| 178 | + |
| 179 | +## Monaco Editor Integration |
| 180 | + |
| 181 | +When integrating with Monaco, register a `DocumentSemanticTokensProvider` that: |
| 182 | + |
| 183 | +1. Requests tokens via `session.get_semantic_tokens()` |
| 184 | +2. Returns the token data along with the legend |
| 185 | + |
| 186 | +```typescript |
| 187 | +// TypeScript example for Monaco |
| 188 | +monaco.languages.registerDocumentSemanticTokensProvider('python', { |
| 189 | + getLegend: () => ({ |
| 190 | + tokenTypes: ['namespace', 'type', 'class', ...], // From backend legend |
| 191 | + tokenModifiers: ['declaration', 'definition', ...] |
| 192 | + }), |
| 193 | + provideDocumentSemanticTokens: async (model) => { |
| 194 | + const tokens = await requestSemanticTokens(model.uri); |
| 195 | + return { |
| 196 | + data: new Uint32Array(tokens.data), |
| 197 | + resultId: tokens.resultId |
| 198 | + }; |
| 199 | + }, |
| 200 | + releaseDocumentSemanticTokens: () => {} |
| 201 | +}); |
| 202 | +``` |
| 203 | + |
| 204 | +The token types and modifiers must be registered in the **exact same order** as the backend's legend for the indices to map correctly. |
| 205 | + |
| 206 | +--- |
| 207 | + |
| 208 | +## Updating This Document |
| 209 | + |
| 210 | +Run the extraction script to get the latest legends: |
| 211 | + |
| 212 | +```bash |
| 213 | +uv run python examples/extract_semantic_legends.py |
| 214 | +``` |
| 215 | + |
| 216 | +Update the tables above with the script output when backend versions change. |
0 commit comments