Skip to content

Latest commit

Β 

History

History
623 lines (458 loc) Β· 12.9 KB

File metadata and controls

623 lines (458 loc) Β· 12.9 KB

LLM Integration Guide

Best practices for using TOON with Large Language Models to maximize token efficiency and response quality.

Why TOON for LLMs?

Traditional JSON wastes tokens on structural characters:

  • Braces & brackets: {}, []
  • Repeated quotes: Every key quoted in JSON
  • Commas everywhere: Between all elements

TOON eliminates this redundancy, achieving 30-60% token reduction while maintaining readability.


Quick Example

JSON (45 tokens with GPT-5):

{"users": [{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}]}

TOON (20 tokens with GPT-5, 56% reduction):

users[2,]{id,name}:
  1,Alice
  2,Bob

Basic Integration Patterns

1. Prompting the Model

Explicit format instruction:

Respond using TOON format (Token-Oriented Object Notation):
- Use `key: value` for objects
- Use indentation for nesting
- Use `[N]` to indicate array lengths
- Use tabular format `[N,]{fields}:` for uniform arrays

Example:
users[2,]{id,name}:
  1,Alice
  2,Bob

2. Code Block Wrapping

Always wrap TOON in code blocks for clarity:

```toon
users[3,]{id,name,age}:
  1,Alice,30
  2,Bob,25
  3,Charlie,35
```

This helps the model distinguish TOON from natural language.

3. Validation with Length Markers

Use lengthMarker="#" for explicit validation hints:

from toon_format import encode

data = {"items": ["a", "b", "c"]}
toon = encode(data, {"lengthMarker": "#"})
# items[#3]: a,b,c

Tell the model:

"Array lengths are prefixed with #. Ensure your response matches these counts exactly."


Measuring Token Savings

Before integrating TOON with your LLM application, measure actual savings for your data:

Basic Measurement

from toon_format import estimate_savings

# Your actual data structure
user_data = {
    "users": [
        {"id": 1, "name": "Alice", "email": "alice@example.com", "active": True},
        {"id": 2, "name": "Bob", "email": "bob@example.com", "active": True},
        {"id": 3, "name": "Charlie", "email": "charlie@example.com", "active": False}
    ]
}

# Compare formats
result = estimate_savings(user_data)
print(f"JSON: {result['json_tokens']} tokens")
print(f"TOON: {result['toon_tokens']} tokens")
print(f"Savings: {result['savings_percent']:.1f}%")
# JSON: 112 tokens
# TOON: 68 tokens
# Savings: 39.3%

Cost Estimation

Calculate actual dollar savings based on your API usage:

from toon_format import estimate_savings

# Your typical prompt data
prompt_data = {
    "context": [
        {"role": "system", "content": "You are a helpful assistant"},
        {"role": "user", "content": "Analyze this data"}
    ],
    "data": [
        {"id": i, "value": f"Item {i}", "score": i * 10}
        for i in range(1, 101)  # 100 items
    ]
}

result = estimate_savings(prompt_data["data"])

# GPT-5 pricing (example: $0.01 per 1K tokens)
cost_per_1k = 0.01
json_cost = (result['json_tokens'] / 1000) * cost_per_1k
toon_cost = (result['toon_tokens'] / 1000) * cost_per_1k

print(f"JSON cost per request: ${json_cost:.4f}")
print(f"TOON cost per request: ${toon_cost:.4f}")
print(f"Savings per request: ${json_cost - toon_cost:.4f}")
print(f"Savings per 10,000 requests: ${(json_cost - toon_cost) * 10000:.2f}")

Detailed Comparison

Get a formatted report for documentation or analysis:

from toon_format import compare_formats

api_response = {
    "status": "success",
    "results": [
        {"id": 1, "score": 0.95, "category": "A"},
        {"id": 2, "score": 0.87, "category": "B"},
        {"id": 3, "score": 0.92, "category": "A"}
    ],
    "total": 3
}

print(compare_formats(api_response))
# Format Comparison
# ────────────────────────────────────────────────
# Format      Tokens    Size (chars)
# JSON            78             189
# TOON            48             112
# ────────────────────────────────────────────────
# Savings: 30 tokens (38.5%)

Integration Pattern

Use token counting in production to monitor savings:

import json
from toon_format import encode, count_tokens

def send_to_llm(data, use_toon=True):
    """Send data to LLM with optional TOON encoding."""
    if use_toon:
        formatted = encode(data)
        format_type = "TOON"
    else:
        formatted = json.dumps(data, indent=2)
        format_type = "JSON"

    tokens = count_tokens(formatted)
    print(f"[{format_type}] Sending {tokens} tokens")

    # Your LLM API call here
    # response = openai.ChatCompletion.create(...)

    return formatted, tokens

# Example usage
data = {"items": [{"id": 1}, {"id": 2}]}
formatted, token_count = send_to_llm(data, use_toon=True)

Real-World Use Cases

Use Case 1: Structured Data Extraction

Prompt:

Extract user information from the text below. Respond in TOON format.

Text: "Alice (age 30) works at ACME. Bob (age 25) works at XYZ."

Format:
users[N,]{name,age,company}:
  ...

Model Response:

users[2,]{name,age,company}:
  Alice,30,ACME
  Bob,25,XYZ

Processing:

from toon_format import decode

response = """users[2,]{name,age,company}:
  Alice,30,ACME
  Bob,25,XYZ"""

data = decode(response)
# {'users': [
#   {'name': 'Alice', 'age': 30, 'company': 'ACME'},
#   {'name': 'Bob', 'age': 25, 'company': 'XYZ'}
# ]}

Use Case 2: Configuration Generation

Prompt:

Generate a server configuration in TOON format with:
- app: "myapp"
- port: 8080
- database settings (host, port, name)
- enabled features: ["auth", "logging", "cache"]

Model Response:

app: myapp
port: 8080
database:
  host: localhost
  port: 5432
  name: myapp_db
features[3]: auth,logging,cache

Processing:

config = decode(response)
# Use config dict directly in your application

Use Case 3: API Response Formatting

Prompt:

Convert this data to TOON format for efficient transmission:

Products:
1. Widget A ($9.99, stock: 50)
2. Widget B ($14.50, stock: 30)
3. Widget C ($19.99, stock: 0)

Model Response:

products[3,]{id,name,price,stock}:
  1,"Widget A",9.99,50
  2,"Widget B",14.50,30
  3,"Widget C",19.99,0

Advanced Techniques

1. Few-Shot Learning

Provide examples in your prompt:

Convert the following to TOON format. Examples:

Input: {"name": "Alice", "age": 30}
Output:
name: Alice
age: 30

Input: [{"id": 1, "item": "A"}, {"id": 2, "item": "B"}]
Output:
[2,]{id,item}:
  1,A
  2,B

Now convert this: <your data>

2. Validation Instructions

Add explicit validation rules:

Respond in TOON format. Rules:
1. Array lengths MUST match actual count: [3] means exactly 3 items
2. Tabular arrays require uniform keys across all objects
3. Use quotes for: empty strings, keywords (null/true/false), numeric strings
4. Indentation: 2 spaces per level

If you cannot provide valid TOON, respond with an error message.

3. Delimiter Selection

Choose delimiters based on your data:

# For data with commas (addresses, descriptions)
encode(data, {"delimiter": "\t"})  # Use tab

# For data with tabs (code snippets)
encode(data, {"delimiter": "|"})   # Use pipe

# For general use
encode(data, {"delimiter": ","})   # Use comma (default)

Tell the model which delimiter to use:

"Use tab-separated values in tabular arrays due to commas in descriptions."


Error Handling

Graceful Degradation

Always wrap TOON decoding in error handling:

from toon_format import decode, ToonDecodeError

def safe_decode(toon_str):
    try:
        return decode(toon_str)
    except ToonDecodeError as e:
        print(f"TOON decode error: {e}")
        # Fall back to asking model to regenerate
        return None

Model Error Prompting

If decoding fails, ask the model to fix it:

The TOON you provided has an error: "Expected 3 items, but got 2"

Please regenerate with correct array lengths. Original:
items[3]: a,b

Should be either:
items[2]: a,b  (fix length)
OR
items[3]: a,b,c  (add missing item)

Token Efficiency Best Practices

1. Prefer Tabular Format

Less efficient (list format):

users[3]:
  - id: 1
    name: Alice
  - id: 2
    name: Bob
  - id: 3
    name: Charlie

More efficient (tabular format):

users[3,]{id,name}:
  1,Alice
  2,Bob
  3,Charlie

2. Minimize Nesting

Less efficient:

data:
  metadata:
    items:
      list[2]: a,b

More efficient:

items[2]: a,b

3. Use Compact Keys

Less efficient:

user_identification_number: 123
user_full_name: Alice

More efficient:

id: 123
name: Alice

Common Pitfalls

❌ Don't: Trust Model Without Validation

# BAD: No validation
response = llm.generate(prompt)
data = decode(response)  # May raise error
# GOOD: Validate and handle errors
response = llm.generate(prompt)
try:
    data = decode(response, {"strict": True})
except ToonDecodeError:
    # Retry or fall back

❌ Don't: Mix Formats Mid-Conversation

First response: JSON
Second response: TOON

Be consistent - stick to TOON throughout the conversation.

❌ Don't: Forget Quoting Rules

Model might produce:

code: 123  # Wrong! Numeric string needs quotes

Should be:

code: "123"  # Correct

Solution: Explicitly mention quoting in prompts.


Integration Examples

With OpenAI API

import openai
from toon_format import decode

def ask_for_toon_data(prompt):
    response = openai.ChatCompletion.create(
        model="gpt-5",
        messages=[
            {"role": "system", "content": "Respond using TOON format"},
            {"role": "user", "content": prompt}
        ]
    )

    toon_str = response.choices[0].message.content

    # Extract TOON from code blocks if wrapped
    if "```toon" in toon_str:
        toon_str = toon_str.split("```toon")[1].split("```")[0].strip()
    elif "```" in toon_str:
        toon_str = toon_str.split("```")[1].split("```")[0].strip()

    return decode(toon_str)

With Anthropic Claude API

import anthropic
from toon_format import decode

def claude_toon(prompt):
    client = anthropic.Anthropic()

    message = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        messages=[{
            "role": "user",
            "content": f"{prompt}\n\nRespond in TOON format (Token-Oriented Object Notation)."
        }]
    )

    toon_str = message.content[0].text

    # Remove code blocks if present
    if "```" in toon_str:
        toon_str = toon_str.split("```")[1].strip()
        if toon_str.startswith("toon\n"):
            toon_str = toon_str[5:]

    return decode(toon_str)

Performance Metrics

Based on testing with gpt5 and Claude:

Data Type JSON Tokens TOON Tokens Reduction
Simple config (10 keys) 45 28 38%
User list (50 users) 892 312 65%
Nested structure 234 142 39%
Mixed arrays 178 95 47%

Average reduction: 30-60% depending on data structure and tokenizer.

Note: Comprehensive benchmarks across gpt5, gpt5-mini, and other models are coming soon. See the roadmap for details.


Debugging Tips

1. Log Raw TOON

Always log the raw TOON before decoding:

print("Raw TOON from model:")
print(repr(toon_str))

try:
    data = decode(toon_str)
except ToonDecodeError as e:
    print(f"Decode error: {e}")

2. Test with Strict Mode

Enable strict validation during development:

decode(toon_str, {"strict": True})  # Strict validation

Disable for production if lenient parsing is acceptable:

decode(toon_str, {"strict": False})  # Lenient

3. Validate Against Schema

After decoding, validate the Python structure:

data = decode(toon_str)

# Validate structure
assert "users" in data
assert isinstance(data["users"], list)
assert all("id" in user for user in data["users"])

Resources


Summary

Key Takeaways:

  1. Explicit prompting - Tell the model to use TOON format clearly
  2. Validation - Always validate model output with error handling
  3. Examples - Provide few-shot examples in prompts
  4. Consistency - Use TOON throughout the conversation
  5. Tabular format - Prefer tabular arrays for maximum efficiency
  6. Error recovery - Handle decode errors gracefully

TOON can reduce LLM costs by 30-60% while maintaining readability and structure. Start with simple use cases and expand as you become familiar with the format.