Skip to content

feat: add token counting for LLM context estimation#1307

Open
camjac251 wants to merge 1 commit intoXAMPPRocky:masterfrom
camjac251:feat/token-display
Open

feat: add token counting for LLM context estimation#1307
camjac251 wants to merge 1 commit intoXAMPPRocky:masterfrom
camjac251:feat/token-display

Conversation

@camjac251
Copy link
Copy Markdown

Summary

Add optional token counting using tiktoken's o200k_base encoding for LLM context estimation.

Builds upon #1268 by @Arichy, with runtime configuration and updated encoding.

Features:

  • tokens feature flag (included in all)
  • --tokens / -T flag for runtime opt-in
  • show_tokens config option in tokei.toml
  • --sort tokens / --rsort tokens support
  • Token counts in JSON/YAML output

Usage

cargo install tokei --features tokens
tokei . --tokens

Or via config (~/.config/tokei.toml):

show_tokens = true

Example

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 Language              Files        Lines         Code     Comments       Blanks       Tokens
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 Rust                      1           42           34            1            7          362
 |- Markdown               1           11            0            8            3          102
 (Total)                               53           34            9           10          464
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 Total                     1           53           34            9           10          464
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Test plan

  • cargo test --features tokens passes
  • No performance impact when disabled (~11ms vs ~113ms with tokens)

Related: #1268

Add optional token counting using tiktoken o200k_base encoding.
Build with `--features tokens` to enable, then use `--tokens` flag
or `show_tokens = true` in config to display the Tokens column.

- New `tokens` feature flag (included in `all` features)
- `--tokens` / `-T` flag to enable at runtime
- `show_tokens` config option in tokei.toml
- Tokens column shows LLM token count per file/language
- `--sort tokens` / `--rsort tokens` support
- Token counts in JSON/YAML output via serde
@camjac251
Copy link
Copy Markdown
Author

@XAMPPRocky thoughts?

@lucaspar
Copy link
Copy Markdown

@XAMPPRocky I was looking for alternatives to count tokens like this, do you think we can merge it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants