Skip to content

Commit f583731

Browse files
committed
Add token counting utilities and update documentation
Features: - Add benchmark dependency group with tiktoken>=0.4.0 to pyproject.toml - Export count_tokens, estimate_savings, and compare_formats utilities - Implement token counting using tiktoken with o200k_base encoding (gpt5/gpt5-mini) Documentation Updates: - Add Token Counting & Comparison section to main README with examples - Update docs/README.md with new utility functions in API reference list - Add roadmap section announcing planned comprehensive benchmarks - Add complete Utility Functions section to docs/api.md covering: * count_tokens() - Token counting with tiktoken * estimate_savings() - JSON vs TOON comparison metrics * compare_formats() - Formatted comparison tables - Add Token Efficiency examples with cost estimation patterns - Update LLM integration guide with Measuring Token Savings section - Include cost calculation examples and integration patterns - Update model references from GPT-4 to gpt5 throughout docs - Add benchmark disclaimer noting comprehensive benchmarks coming soon Technical Details: - Update tokenizer documentation from GPT-4o/GPT-4 to gpt5/gpt5-mini - Fix TypedDict usage examples in docs/api.md (EncodeOptions uses dict syntax) - Clarify DecodeOptions is a class while EncodeOptions is a TypedDict - Add toon-spec/ submodule files (CHANGELOG.md and SPEC.md v1.3)
1 parent d249dff commit f583731

6 files changed

Lines changed: 417 additions & 24 deletions

File tree

β€ŽREADME.mdβ€Ž

Lines changed: 30 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,9 +10,10 @@ Compact, human-readable serialization format for LLM contexts with **30-60% toke
1010

1111
```bash
1212
pip install toon_format
13+
# or (recommended)
14+
uv add toon_format
1315
```
1416

15-
1617
## Quick Start
1718

1819
```python
@@ -72,6 +73,34 @@ decode("id: 123", {"indent": 2, "strict": True})
7273
- `indent`: Expected indent size (default: `2`)
7374
- `strict`: Validate syntax, lengths, delimiters (default: `True`)
7475

76+
### Token Counting & Comparison
77+
78+
Measure token efficiency and compare formats:
79+
80+
```python
81+
from toon_format import estimate_savings, compare_formats, count_tokens
82+
83+
# Measure savings
84+
data = {"users": [{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}]}
85+
result = estimate_savings(data)
86+
print(f"Saves {result['savings_percent']:.1f}% tokens") # Saves 42.3% tokens
87+
88+
# Visual comparison
89+
print(compare_formats(data))
90+
# Format Comparison
91+
# ────────────────────────────────────────────────
92+
# Format Tokens Size (chars)
93+
# JSON 45 123
94+
# TOON 28 85
95+
# ────────────────────────────────────────────────
96+
# Savings: 17 tokens (37.8%)
97+
98+
# Count tokens directly
99+
toon_str = encode(data)
100+
tokens = count_tokens(toon_str) # Uses tiktoken (gpt5/gpt5-mini)
101+
```
102+
103+
**Requires tiktoken:** `pip install tiktoken` or `pip install toon-format[benchmark]`
75104

76105
## Format Specification
77106

β€Ždocs/README.mdβ€Ž

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,9 @@ New to TOON? Start here:
2424
Complete reference for all public functions and classes:
2525
- `encode()` - Convert Python to TOON
2626
- `decode()` - Convert TOON to Python
27+
- `count_tokens()` - Count tokens in text using tiktoken
28+
- `estimate_savings()` - Compare JSON vs TOON token counts
29+
- `compare_formats()` - Generate formatted comparison table
2730
- `EncodeOptions` - Encoding configuration
2831
- `DecodeOptions` - Decoding configuration
2932
- `ToonDecodeError` - Error handling
@@ -53,6 +56,15 @@ Best practices for LLM usage:
5356
- Performance metrics
5457
- Debugging tips
5558

59+
## Roadmap
60+
61+
The following features are planned for future releases:
62+
63+
- **Comprehensive Benchmarks**: Detailed token efficiency comparisons across various data structures and LLM models (gpt5, gpt5-mini, Claude)
64+
- **Official Documentation Site**: Dedicated documentation website with interactive examples and tutorials
65+
66+
Stay tuned for updates!
67+
5668
## External Resources
5769

5870
- [Official TOON Specification](https://github.com/toon-format/spec) - Normative spec
@@ -95,6 +107,28 @@ decode("items[5]: a,b,c", {"strict": False})
95107
# {'items': ['a', 'b', 'c']} # Accepts length mismatch
96108
```
97109

110+
### Token Efficiency
111+
112+
```python
113+
from toon_format import estimate_savings, compare_formats
114+
115+
data = {"employees": [{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}]}
116+
117+
# Get savings metrics
118+
result = estimate_savings(data)
119+
print(f"Saves {result['savings_percent']:.1f}% tokens")
120+
121+
# Get formatted comparison
122+
print(compare_formats(data))
123+
# Format Comparison
124+
# ────────────────────────────────────────────────
125+
# Format Tokens Size (chars)
126+
# JSON 45 123
127+
# TOON 28 85
128+
# ────────────────────────────────────────────────
129+
# Savings: 17 tokens (37.8%)
130+
```
131+
98132
## Support
99133

100134
- **Bug Reports:** [GitHub Issues](https://github.com/toon-format/toon-python/issues)

0 commit comments

Comments
Β (0)