Overshoot
OvershootThis is us. Overshoot is our own real-time inference fabric, built to close the loop on live camera, screen, and RTSP streams in under 200 ms end to end.
- US East (Virginia)1.4 s, view the regional leaderboard
- US West (Oregon)1.4 s, view the regional leaderboard
- EU West (Ireland)1.4 s, view the regional leaderboard
- EU Central (Frankfurt)1.5 s, view the regional leaderboard
- Asia Pacific (Tokyo)1.5 s, view the regional leaderboard
- Asia Pacific (Mumbai)1.5 s, view the regional leaderboard
- South America (São Paulo)1.5 s, view the regional leaderboard
- Middle East (Dubai)1.5 s, view the regional leaderboard
- Models on live video
- 13
- Regions measured
- 8
- Real-time models
- 6 of 13
- Median output speed
- 253 tok/s
- Median E2E · US East
- 1.4 s
- Median blended $/1M
- $4.45
Models on Overshoot
| # | Details | |||||||
|---|---|---|---|---|---|---|---|---|
| 1 | View gemma-4-26b-a4bGemma 4 26B A4B | 429 tok/s | 30 ms | 86 ms | $0.12 | <200 ms | ||
| 2 | View holo3-35b-a3bHolo3 35B A3B | H Company | 423 tok/s | 35 ms | 92 ms | $0.12 | <200 ms | |
| 3 | View qwen3-6-35b-a3bQwen3.6 35B A3BReasoning | Alibaba (Qwen) | 406 tok/s | 35 ms | 2.9 s | $0.12 | – | |
| 4 | View gpt-5-4-nanoGPT-5.4 nanoReasoning | OpenAI | 354 tok/s | 28 ms | 1.4 s | $3.20 | – | |
| 5 | View gemma-4-31bGemma 4 31B | 270 tok/s | 52 ms | 141 ms | $0.27 | <200 ms | ||
| 6 | View gemini-3-flashGemini 3 Flash | 254 tok/s | 45 ms | 139 ms | $4.66 | <200 ms | ||
| 7 | View qwen3-6-27bQwen3.6 27B | Alibaba (Qwen) | 253 tok/s | 45 ms | 140 ms | $0.24 | <200 ms | |
| 8 | View gpt-5-4-miniGPT-5.4 miniReasoning | OpenAI | 248 tok/s | 46 ms | 2.6 s | $5.00 | – | |
| 9 | View claude-haiku-4-5Claude Haiku 4.5 | Anthropic | 228 tok/s | 36 ms | 141 ms | $4.45 | <200 ms | |
| 10 | View gemini-3-1-proGemini 3.1 ProReasoning | 99 tok/s | 85 ms | 8.4 s | $5.90 | – | ||
| 11 | View gpt-5-4GPT-5.4Reasoning | OpenAI | 96 tok/s | 87 ms | 8.9 s | $6.63 | – | |
| 12 | View claude-sonnet-4-6Claude Sonnet 4.6Reasoning | Anthropic | 93 tok/s | 85 ms | 7.5 s | $6.44 | – | |
| 13 | View claude-opus-4-6Claude Opus 4.6Reasoning | Anthropic | 55 tok/s | 120 ms | 11 s | $7.40 | – |
Top 12 of 13; the full list is in the table above.
Reasoning models are indicated by a lightbulb icon
Real-time coverage
US East (Virginia)6 of 13 under 200 ms
US West (Oregon)6 of 13 under 200 ms
EU West (Ireland)6 of 13 under 200 ms
EU Central (Frankfurt)6 of 13 under 200 ms
Asia Pacific (Tokyo)6 of 13 under 200 ms
Asia Pacific (Mumbai)6 of 13 under 200 ms
South America (São Paulo)4 of 13 under 200 ms
Middle East (Dubai)6 of 13 under 200 ms
Latency columns use the US East (Virginia) reference region. Real-time marks a model that closes the loop under 200 ms end to end on Overshoot from at least one region.