LLM Integration Guide

Best practices for using TOON with Large Language Models to maximize token efficiency and response quality.

Why TOON for LLMs?

Traditional JSON wastes tokens on structural characters:

Braces & brackets: {}, []
Repeated quotes: Every key quoted in JSON
Commas everywhere: Between all elements

TOON eliminates this redundancy, achieving 30-60% token reduction while maintaining readability.

Quick Example

JSON (45 tokens with GPT-5):

{"users": [{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}]}

TOON (20 tokens with GPT-5, 56% reduction):

users[2,]{id,name}:
  1,Alice
  2,Bob

Basic Integration Patterns

1. Prompting the Model

Explicit format instruction:

Respond using TOON format (Token-Oriented Object Notation):
- Use `key: value` for objects
- Use indentation for nesting
- Use `[N]` to indicate array lengths
- Use tabular format `[N,]{fields}:` for uniform arrays

Example:
users[2,]{id,name}:
  1,Alice
  2,Bob

2. Code Block Wrapping

Always wrap TOON in code blocks for clarity:

```toon
users[3,]{id,name,age}:
  1,Alice,30
  2,Bob,25
  3,Charlie,35
```

This helps the model distinguish TOON from natural language.

3. Validation with Length Markers

Use lengthMarker="#" for explicit validation hints:

from toon_format import encode

data = {"items": ["a", "b", "c"]}
toon = encode(data, {"lengthMarker": "#"})
# items[#3]: a,b,c

Tell the model:

"Array lengths are prefixed with #. Ensure your response matches these counts exactly."

Measuring Token Savings

Before integrating TOON with your LLM application, measure actual savings for your data:

Basic Measurement

from toon_format import estimate_savings

# Your actual data structure
user_data = {
    "users": [
        {"id": 1, "name": "Alice", "email": "alice@example.com", "active": True},
        {"id": 2, "name": "Bob", "email": "bob@example.com", "active": True},
        {"id": 3, "name": "Charlie", "email": "charlie@example.com", "active": False}
    ]
}

# Compare formats
result = estimate_savings(user_data)
print(f"JSON: {result['json_tokens']} tokens")
print(f"TOON: {result['toon_tokens']} tokens")
print(f"Savings: {result['savings_percent']:.1f}%")
# JSON: 112 tokens
# TOON: 68 tokens
# Savings: 39.3%

Cost Estimation

Calculate actual dollar savings based on your API usage:

from toon_format import estimate_savings

# Your typical prompt data
prompt_data = {
    "context": [
        {"role": "system", "content": "You are a helpful assistant"},
        {"role": "user", "content": "Analyze this data"}
    ],
    "data": [
        {"id": i, "value": f"Item {i}", "score": i * 10}
        for i in range(1, 101)  # 100 items
    ]
}

result = estimate_savings(prompt_data["data"])

# GPT-5 pricing (example: $0.01 per 1K tokens)
cost_per_1k = 0.01
json_cost = (result['json_tokens'] / 1000) * cost_per_1k
toon_cost = (result['toon_tokens'] / 1000) * cost_per_1k

print(f"JSON cost per request: ${json_cost:.4f}")
print(f"TOON cost per request: ${toon_cost:.4f}")
print(f"Savings per request: ${json_cost - toon_cost:.4f}")
print(f"Savings per 10,000 requests: ${(json_cost - toon_cost) * 10000:.2f}")

Detailed Comparison

Get a formatted report for documentation or analysis:

from toon_format import compare_formats

api_response = {
    "status": "success",
    "results": [
        {"id": 1, "score": 0.95, "category": "A"},
        {"id": 2, "score": 0.87, "category": "B"},
        {"id": 3, "score": 0.92, "category": "A"}
    ],
    "total": 3
}

print(compare_formats(api_response))
# Format Comparison
# ────────────────────────────────────────────────
# Format      Tokens    Size (chars)
# JSON            78             189
# TOON            48             112
# ────────────────────────────────────────────────
# Savings: 30 tokens (38.5%)

Integration Pattern

Use token counting in production to monitor savings:

import json
from toon_format import encode, count_tokens

def send_to_llm(data, use_toon=True):
    """Send data to LLM with optional TOON encoding."""
    if use_toon:
        formatted = encode(data)
        format_type = "TOON"
    else:
        formatted = json.dumps(data, indent=2)
        format_type = "JSON"

    tokens = count_tokens(formatted)
    print(f"[{format_type}] Sending {tokens} tokens")

    # Your LLM API call here
    # response = openai.ChatCompletion.create(...)

    return formatted, tokens

# Example usage
data = {"items": [{"id": 1}, {"id": 2}]}
formatted, token_count = send_to_llm(data, use_toon=True)

Real-World Use Cases

Use Case 1: Structured Data Extraction

Prompt:

Extract user information from the text below. Respond in TOON format.

Text: "Alice (age 30) works at ACME. Bob (age 25) works at XYZ."

Format:
users[N,]{name,age,company}:
  ...

Model Response:

users[2,]{name,age,company}:
  Alice,30,ACME
  Bob,25,XYZ

Processing:

from toon_format import decode

response = """users[2,]{name,age,company}:
  Alice,30,ACME
  Bob,25,XYZ"""

data = decode(response)
# {'users': [
#   {'name': 'Alice', 'age': 30, 'company': 'ACME'},
#   {'name': 'Bob', 'age': 25, 'company': 'XYZ'}
# ]}

Use Case 2: Configuration Generation

Prompt:

Generate a server configuration in TOON format with:
- app: "myapp"
- port: 8080
- database settings (host, port, name)
- enabled features: ["auth", "logging", "cache"]

Model Response:

app: myapp
port: 8080
database:
  host: localhost
  port: 5432
  name: myapp_db
features[3]: auth,logging,cache

Processing:

config = decode(response)
# Use config dict directly in your application

Use Case 3: API Response Formatting

Prompt:

Convert this data to TOON format for efficient transmission:

Products:
1. Widget A ($9.99, stock: 50)
2. Widget B ($14.50, stock: 30)
3. Widget C ($19.99, stock: 0)

Model Response:

products[3,]{id,name,price,stock}:
  1,"Widget A",9.99,50
  2,"Widget B",14.50,30
  3,"Widget C",19.99,0

Advanced Techniques

1. Few-Shot Learning

Provide examples in your prompt:

Convert the following to TOON format. Examples:

Input: {"name": "Alice", "age": 30}
Output:
name: Alice
age: 30

Input: [{"id": 1, "item": "A"}, {"id": 2, "item": "B"}]
Output:
[2,]{id,item}:
  1,A
  2,B

Now convert this: <your data>

2. Validation Instructions

Add explicit validation rules:

Respond in TOON format. Rules:
1. Array lengths MUST match actual count: [3] means exactly 3 items
2. Tabular arrays require uniform keys across all objects
3. Use quotes for: empty strings, keywords (null/true/false), numeric strings
4. Indentation: 2 spaces per level

If you cannot provide valid TOON, respond with an error message.

3. Delimiter Selection

Choose delimiters based on your data:

# For data with commas (addresses, descriptions)
encode(data, {"delimiter": "\t"})  # Use tab

# For data with tabs (code snippets)
encode(data, {"delimiter": "|"})   # Use pipe

# For general use
encode(data, {"delimiter": ","})   # Use comma (default)

Tell the model which delimiter to use:

"Use tab-separated values in tabular arrays due to commas in descriptions."

Error Handling

Graceful Degradation

Always wrap TOON decoding in error handling:

from toon_format import decode, ToonDecodeError

def safe_decode(toon_str):
    try:
        return decode(toon_str)
    except ToonDecodeError as e:
        print(f"TOON decode error: {e}")
        # Fall back to asking model to regenerate
        return None

Model Error Prompting

If decoding fails, ask the model to fix it:

The TOON you provided has an error: "Expected 3 items, but got 2"

Please regenerate with correct array lengths. Original:
items[3]: a,b

Should be either:
items[2]: a,b  (fix length)
OR
items[3]: a,b,c  (add missing item)

Token Efficiency Best Practices

1. Prefer Tabular Format

Less efficient (list format):

users[3]:
  - id: 1
    name: Alice
  - id: 2
    name: Bob
  - id: 3
    name: Charlie

More efficient (tabular format):

users[3,]{id,name}:
  1,Alice
  2,Bob
  3,Charlie

2. Minimize Nesting

Less efficient:

data:
  metadata:
    items:
      list[2]: a,b

More efficient:

items[2]: a,b

3. Use Compact Keys

Less efficient:

user_identification_number: 123
user_full_name: Alice

More efficient:

id: 123
name: Alice

Common Pitfalls

❌ Don't: Trust Model Without Validation

# BAD: No validation
response = llm.generate(prompt)
data = decode(response)  # May raise error

# GOOD: Validate and handle errors
response = llm.generate(prompt)
try:
    data = decode(response, {"strict": True})
except ToonDecodeError:
    # Retry or fall back

❌ Don't: Mix Formats Mid-Conversation

First response: JSON
Second response: TOON

Be consistent - stick to TOON throughout the conversation.

❌ Don't: Forget Quoting Rules

Model might produce:

code: 123  # Wrong! Numeric string needs quotes

Should be:

code: "123"  # Correct

Solution: Explicitly mention quoting in prompts.

Integration Examples

With OpenAI API

import openai
from toon_format import decode

def ask_for_toon_data(prompt):
    response = openai.ChatCompletion.create(
        model="gpt-5",
        messages=[
            {"role": "system", "content": "Respond using TOON format"},
            {"role": "user", "content": prompt}
        ]
    )

    toon_str = response.choices[0].message.content

    # Extract TOON from code blocks if wrapped
    if "```toon" in toon_str:
        toon_str = toon_str.split("```toon")[1].split("```")[0].strip()
    elif "```" in toon_str:
        toon_str = toon_str.split("```")[1].split("```")[0].strip()

    return decode(toon_str)

With Anthropic Claude API

import anthropic
from toon_format import decode

def claude_toon(prompt):
    client = anthropic.Anthropic()

    message = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        messages=[{
            "role": "user",
            "content": f"{prompt}\n\nRespond in TOON format (Token-Oriented Object Notation)."
        }]
    )

    toon_str = message.content[0].text

    # Remove code blocks if present
    if "```" in toon_str:
        toon_str = toon_str.split("```")[1].strip()
        if toon_str.startswith("toon\n"):
            toon_str = toon_str[5:]

    return decode(toon_str)

Performance Metrics

Based on testing with gpt5 and Claude:

Data Type	JSON Tokens	TOON Tokens	Reduction
Simple config (10 keys)	45	28	38%
User list (50 users)	892	312	65%
Nested structure	234	142	39%
Mixed arrays	178	95	47%

Average reduction: 30-60% depending on data structure and tokenizer.

Note: Comprehensive benchmarks across gpt5, gpt5-mini, and other models are coming soon. See the roadmap for details.

Debugging Tips

1. Log Raw TOON

Always log the raw TOON before decoding:

print("Raw TOON from model:")
print(repr(toon_str))

try:
    data = decode(toon_str)
except ToonDecodeError as e:
    print(f"Decode error: {e}")

2. Test with Strict Mode

Enable strict validation during development:

decode(toon_str, {"strict": True})  # Strict validation

Disable for production if lenient parsing is acceptable:

decode(toon_str, {"strict": False})  # Lenient

3. Validate Against Schema

After decoding, validate the Python structure:

data = decode(toon_str)

# Validate structure
assert "users" in data
assert isinstance(data["users"], list)
assert all("id" in user for user in data["users"])

Resources

Format Specification - Complete TOON syntax reference
API Reference - Function documentation
Official Spec - Normative specification
Benchmarks - Token efficiency analysis

Summary

Key Takeaways:

Explicit prompting - Tell the model to use TOON format clearly
Validation - Always validate model output with error handling
Examples - Provide few-shot examples in prompts
Consistency - Use TOON throughout the conversation
Tabular format - Prefer tabular arrays for maximum efficiency
Error recovery - Handle decode errors gracefully

TOON can reduce LLM costs by 30-60% while maintaining readability and structure. Start with simple use cases and expand as you become familiar with the format.

FilesExpand file tree

llm-integration.md

Latest commit

History

llm-integration.md

File metadata and controls

LLM Integration Guide

Why TOON for LLMs?

Quick Example

Basic Integration Patterns

1. Prompting the Model

2. Code Block Wrapping

3. Validation with Length Markers

Measuring Token Savings

Basic Measurement

Cost Estimation

Detailed Comparison

Integration Pattern

Real-World Use Cases

Use Case 1: Structured Data Extraction

Use Case 2: Configuration Generation

Use Case 3: API Response Formatting

Advanced Techniques

1. Few-Shot Learning

2. Validation Instructions

3. Delimiter Selection

Error Handling

Graceful Degradation

Model Error Prompting

Token Efficiency Best Practices

1. Prefer Tabular Format

2. Minimize Nesting

3. Use Compact Keys

Common Pitfalls

❌ Don't: Trust Model Without Validation

❌ Don't: Mix Formats Mid-Conversation

❌ Don't: Forget Quoting Rules

Integration Examples

With OpenAI API

With Anthropic Claude API

Performance Metrics

Debugging Tips

1. Log Raw TOON

2. Test with Strict Mode

3. Validate Against Schema

Resources

Summary