You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Latency and throughput benchmarks for AI API models accessible through the Global API gateway. Measure TTFT, tokens/second, and cost efficiency from 6 global regions.
Quick Benchmark
fromopenaiimportOpenAIimporttimeclient=OpenAI(
base_url="https://global-apis.com/v1",
api_key="your-global-api-key",
)
models= ["deepseek-ai/DeepSeek-V4-Flash", "qwen/qwen3-32b", "moonshot/kimi-k2.5"]
prompt="Write a Python function to sort a list of dictionaries by multiple keys."formodelinmodels:
start=time.time()
response=client.chat.completions.create(
model=model, messages=[{"role": "user", "content": prompt}], max_tokens=200
)
elapsed= (time.time() -start) *1000tokens=response.usage.completion_tokensprint(f"{model}: {elapsed:.0f}ms, {tokens/elapsed*1000:.1f} tok/s")
Latest Results (May 2026)
Speed Benchmark (TTFT in ms, lower is better)
Model
US East
US West
EU
Asia
Australia
DeepSeek V4 Flash
420
450
480
320
510
Qwen3-32B
510
530
560
380
590
GPT-4o
680
710
740
850
760
Kimi K2.5
560
590
620
410
650
GLM-5
530
560
590
390
620
Throughput (tokens/second, higher is better)
Model
US East
US West
EU
Asia
Australia
DeepSeek V4 Flash
85.2
82.1
79.5
92.3
78.4
Qwen3-32B
72.1
69.8
67.2
78.6
65.9
GPT-4o
45.3
43.1
41.8
48.2
40.5
Kimi K2.5
60.3
58.1
55.7
65.4
53.2
Streaming vs Non-Streaming
Model
Non-Stream TTFT
Stream TTFT
Improvement
DeepSeek V4 Flash
420ms
180ms
2.3x faster
Qwen3-32B
510ms
240ms
2.1x faster
GPT-4o
680ms
350ms
1.9x faster
Why Global API?
One endpoint, 184+ models — test without signing up for each provider
Same latency as direct — no proxy slowdown, direct access to official APIs
Pay with PayPal — no Chinese bank account needed
Never-expiring credits — no monthly subscription traps