Skip to content

Latest commit

 

History

History
42 lines (28 loc) · 1.77 KB

File metadata and controls

42 lines (28 loc) · 1.77 KB

Token Savings

Real Results

OpenAI API Testing (gpt-4o-mini, 5 scenarios):

Variant Compliance Avg Token Reduction Use Case
STRICT 80% 88.8% Production systems
BALANCED 40% 95.4% General use
MINIMAL 40% 95.7% Testing/learning
  • Range (Strict): 75.0% - 94.2%
  • Compliance Note: % of tests outputting valid Vector-Native (strict highest reliability)
  • Measurement: Completion tokens only (system prompt overhead separate; strict has larger prompt but superior compliance)

Cost Impact

At Scale (1M Output Tokens, gpt-4o-mini $0.60/1M):

  • English: ~$343
  • Vector-Native (avg across variants): ~$22
  • Savings: 93.6% ($321)

Why: 88-95% reduction in completions compounds in A2A/system prompts. Strict: Reliable for prod; Balanced/Minimal: Max savings for testing.

What Gets Measured

  • Completions Only: Output token reduction (ignores system prompt). Total API cost varies: Strict (larger prompt) nets ~70% savings; Minimal ~95%.
  • Test Setup: 5 diverse scenarios (analysis, task creation, multi-op). Raw data: tests/test_results/.
  • Scalability: Every internal message/prompt benefits—no user-facing impact.

Why This Matters

English wastes tokens on filler in non-human paths. Vector-Native skips translation:

System Prompt Example:
English (~20 tokens): "You are helpful. Provide details. Focus on needs."
Vector-Native (~8 tokens): ●assistant|mode:helpful|detail:high|attention:needs
Savings: 60% per request → 93.6% at scale.

Ideal for agent-to-agent, tools, prompts—where humans don't read.