Skip to content

Commit 8372def

Browse files
Mazyodclaude
andcommitted
docs: add semantic tokens reference with legends for all backends
- Move INTEGRATION_NOTES.md to docs/ - Create docs/SEMANTIC_TOKENS.md with token type/modifier mappings for Pyright (15 types, 9 modifiers), Pyrefly (23 types, 10 modifiers from source), and ty (15 types, 4 modifiers) - Add examples/extract_semantic_legends.py script to extract legends - Update README feature matrix with semantic tokens support notes Pyrefly's legend is documented from source code as it doesn't advertise via LSP initialization. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1 parent 7e4d585 commit 8372def

4 files changed

Lines changed: 389 additions & 2 deletions

File tree

README.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -87,15 +87,16 @@ The following LSPs are available out of the box:
8787
| Completion Resolution | :white_check_mark: | :x: | :white_check_mark: | Pyrefly: not yet supported |
8888
| Signature Help | :white_check_mark: | :white_check_mark: | :white_check_mark: | |
8989
| Rename | :white_check_mark: | :warning: | :warning: | Pyrefly: disabled for external files; ty: requires files on disk |
90-
| Semantic Tokens | :white_check_mark:\* | :white_check_mark: | :white_check_mark: | \*basedpyright recommended for extended features |
90+
| Semantic Tokens | :white_check_mark:\* | :white_check_mark:\*\* | :white_check_mark: | \*basedpyright recommended; \*\*Pyrefly: legend not advertised (see docs) |
9191
| Go to Definition | :grey_question: | :grey_question: | :grey_question: | Not exposed in Session API |
9292
| Find References | :grey_question: | :grey_question: | :grey_question: | Not exposed in Session API |
9393
| Code Actions | :grey_question: | :grey_question: | :grey_question: | Not exposed in Session API |
9494
| Formatting | :grey_question: | :grey_question: | :grey_question: | Not exposed in Session API |
9595

9696
> See [Feature Verification Guide](docs/FEATURE_VERIFICATION.md) for methodology on maintaining this table.
9797
98-
For detailed backend limitations:
98+
For detailed documentation:
99+
- [Semantic Tokens Reference](docs/SEMANTIC_TOKENS.md) - Token types and modifiers for Monaco/editor integration
99100
- [Pyrefly Known Limitations](lsp_types/pyrefly/KNOWN_LIMITATIONS.md)
100101
- [ty Known Limitations](lsp_types/ty/KNOWN_LIMITATIONS.md)
101102

File renamed without changes.

docs/SEMANTIC_TOKENS.md

Lines changed: 216 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,216 @@
1+
# Semantic Tokens Reference
2+
3+
This document provides a reference for semantic token types and modifiers returned by each LSP backend. This is particularly useful when integrating with editors like Monaco that need to map token IDs to theme colors.
4+
5+
## Overview
6+
7+
Semantic tokens provide richer syntax highlighting than traditional TextMate grammars by leveraging the language server's understanding of the code. The LSP protocol encodes tokens as a compact integer array where each token is represented by 5 values.
8+
9+
## Token Encoding Format
10+
11+
Each token in the `data` array consists of 5 consecutive integers:
12+
13+
| Position | Field | Description |
14+
|----------|-------|-------------|
15+
| 0 | `deltaLine` | Line offset from previous token (or 0 for first token) |
16+
| 1 | `deltaStart` | Column offset from previous token on same line (or from 0 if new line) |
17+
| 2 | `length` | Token length in characters |
18+
| 3 | `tokenType` | Index into the legend's `tokenTypes` array |
19+
| 4 | `tokenModifiers` | Bitmask of modifiers from the legend's `tokenModifiers` array |
20+
21+
### Decoding Token Modifiers
22+
23+
The `tokenModifiers` value is a bitmask. To check if a modifier applies:
24+
25+
```python
26+
def has_modifier(token_modifiers: int, modifier_index: int) -> bool:
27+
return (token_modifiers & (1 << modifier_index)) != 0
28+
```
29+
30+
For example, if `tokenModifiers = 5` (binary `101`), modifiers at index 0 and 2 are active.
31+
32+
## How to Get the Legend
33+
34+
The legend is provided by the server during initialization in `InitializeResult.capabilities.semanticTokensProvider.legend`. You can extract it using:
35+
36+
```python
37+
from lsp_types.process import LSPProcess
38+
39+
async with LSPProcess(process_info) as process:
40+
init_result = await process.send.initialize({...})
41+
legend = init_result["capabilities"]["semanticTokensProvider"]["legend"]
42+
token_types = legend["tokenTypes"] # List of type names
43+
token_modifiers = legend["tokenModifiers"] # List of modifier names
44+
```
45+
46+
See `examples/extract_semantic_legends.py` for a complete working example.
47+
48+
---
49+
50+
## Token Legends by Backend
51+
52+
### Pyright (basedpyright)
53+
54+
> Last verified: basedpyright 1.36.2
55+
56+
#### Token Types
57+
58+
| Index | Token Type |
59+
|------:|------------|
60+
| 0 | `namespace` |
61+
| 1 | `type` |
62+
| 2 | `class` |
63+
| 3 | `enum` |
64+
| 4 | `typeParameter` |
65+
| 5 | `parameter` |
66+
| 6 | `variable` |
67+
| 7 | `property` |
68+
| 8 | `enumMember` |
69+
| 9 | `function` |
70+
| 10 | `method` |
71+
| 11 | `keyword` |
72+
| 12 | `decorator` |
73+
| 13 | `selfParameter` |
74+
| 14 | `clsParameter` |
75+
76+
#### Token Modifiers
77+
78+
| Bit | Modifier |
79+
|----:|----------|
80+
| 0 | `declaration` |
81+
| 1 | `definition` |
82+
| 2 | `readonly` |
83+
| 3 | `static` |
84+
| 4 | `async` |
85+
| 5 | `defaultLibrary` |
86+
| 6 | `builtin` |
87+
| 7 | `classMember` |
88+
| 8 | `parameter` |
89+
90+
---
91+
92+
### Pyrefly
93+
94+
> Last verified: Pyrefly 0.48.2
95+
> Legend source: [semantic_tokens.rs](https://github.com/facebook/pyrefly/blob/main/pyrefly/lib/state/semantic_tokens.rs)
96+
97+
Pyrefly does not advertise its legend via LSP initialization, but the token mappings are defined in source code.
98+
99+
#### Token Types
100+
101+
| Index | Token Type |
102+
|------:|------------|
103+
| 0 | `namespace` |
104+
| 1 | `type` |
105+
| 2 | `class` |
106+
| 3 | `enum` |
107+
| 4 | `interface` |
108+
| 5 | `struct` |
109+
| 6 | `typeParameter` |
110+
| 7 | `parameter` |
111+
| 8 | `variable` |
112+
| 9 | `property` |
113+
| 10 | `enumMember` |
114+
| 11 | `event` |
115+
| 12 | `function` |
116+
| 13 | `method` |
117+
| 14 | `macro` |
118+
| 15 | `keyword` |
119+
| 16 | `modifier` |
120+
| 17 | `comment` |
121+
| 18 | `string` |
122+
| 19 | `number` |
123+
| 20 | `regexp` |
124+
| 21 | `operator` |
125+
| 22 | `decorator` |
126+
127+
#### Token Modifiers
128+
129+
| Bit | Modifier |
130+
|----:|----------|
131+
| 0 | `declaration` |
132+
| 1 | `definition` |
133+
| 2 | `readonly` |
134+
| 3 | `static` |
135+
| 4 | `deprecated` |
136+
| 5 | `abstract` |
137+
| 6 | `async` |
138+
| 7 | `modification` |
139+
| 8 | `documentation` |
140+
| 9 | `defaultLibrary` |
141+
142+
---
143+
144+
### ty
145+
146+
> Last verified: ty 0.0.12
147+
148+
#### Token Types
149+
150+
| Index | Token Type |
151+
|------:|------------|
152+
| 0 | `namespace` |
153+
| 1 | `class` |
154+
| 2 | `parameter` |
155+
| 3 | `selfParameter` |
156+
| 4 | `clsParameter` |
157+
| 5 | `variable` |
158+
| 6 | `property` |
159+
| 7 | `function` |
160+
| 8 | `method` |
161+
| 9 | `keyword` |
162+
| 10 | `string` |
163+
| 11 | `number` |
164+
| 12 | `decorator` |
165+
| 13 | `builtinConstant` |
166+
| 14 | `typeParameter` |
167+
168+
#### Token Modifiers
169+
170+
| Bit | Modifier |
171+
|----:|----------|
172+
| 0 | `definition` |
173+
| 1 | `readonly` |
174+
| 2 | `async` |
175+
| 3 | `documentation` |
176+
177+
---
178+
179+
## Monaco Editor Integration
180+
181+
When integrating with Monaco, register a `DocumentSemanticTokensProvider` that:
182+
183+
1. Requests tokens via `session.get_semantic_tokens()`
184+
2. Returns the token data along with the legend
185+
186+
```typescript
187+
// TypeScript example for Monaco
188+
monaco.languages.registerDocumentSemanticTokensProvider('python', {
189+
getLegend: () => ({
190+
tokenTypes: ['namespace', 'type', 'class', ...], // From backend legend
191+
tokenModifiers: ['declaration', 'definition', ...]
192+
}),
193+
provideDocumentSemanticTokens: async (model) => {
194+
const tokens = await requestSemanticTokens(model.uri);
195+
return {
196+
data: new Uint32Array(tokens.data),
197+
resultId: tokens.resultId
198+
};
199+
},
200+
releaseDocumentSemanticTokens: () => {}
201+
});
202+
```
203+
204+
The token types and modifiers must be registered in the **exact same order** as the backend's legend for the indices to map correctly.
205+
206+
---
207+
208+
## Updating This Document
209+
210+
Run the extraction script to get the latest legends:
211+
212+
```bash
213+
uv run python examples/extract_semantic_legends.py
214+
```
215+
216+
Update the tables above with the script output when backend versions change.

0 commit comments

Comments
 (0)