Skip to content

Overshoot

Overshoot

This is us. Overshoot is our own real-time inference fabric, built to close the loop on live camera, screen, and RTSP streams in under 200 ms end to end.

Models on live video
13
Regions measured
8
Real-time models
6 of 13
Median output speed
253 tok/s
Median E2E · US East
1.4 s
Median blended $/1M
$4.45

Models on Overshoot

#Details
1View gemma-4-26b-a4bGemma 4 26B A4BGoogle
429 tok/s
30 ms86 ms$0.12<200 ms
2View holo3-35b-a3bHolo3 35B A3BH Company
423 tok/s
35 ms92 ms$0.12<200 ms
3View qwen3-6-35b-a3bQwen3.6 35B A3BReasoningAlibaba (Qwen)
406 tok/s
35 ms2.9 s$0.12
4View gpt-5-4-nanoGPT-5.4 nanoReasoningOpenAI
354 tok/s
28 ms1.4 s$3.20
5View gemma-4-31bGemma 4 31BGoogle
270 tok/s
52 ms141 ms$0.27<200 ms
6View gemini-3-flashGemini 3 FlashGoogle
254 tok/s
45 ms139 ms$4.66<200 ms
7View qwen3-6-27bQwen3.6 27BAlibaba (Qwen)
253 tok/s
45 ms140 ms$0.24<200 ms
8View gpt-5-4-miniGPT-5.4 miniReasoningOpenAI
248 tok/s
46 ms2.6 s$5.00
9View claude-haiku-4-5Claude Haiku 4.5Anthropic
228 tok/s
36 ms141 ms$4.45<200 ms
10View gemini-3-1-proGemini 3.1 ProReasoningGoogle
99 tok/s
85 ms8.4 s$5.90
11View gpt-5-4GPT-5.4ReasoningOpenAI
96 tok/s
87 ms8.9 s$6.63
12View claude-sonnet-4-6Claude Sonnet 4.6ReasoningAnthropic
93 tok/s
85 ms7.5 s$6.44
13View claude-opus-4-6Claude Opus 4.6ReasoningAnthropic
55 tok/s
120 ms11 s$7.40
Output speed by model

Top 12 of 13; the full list is in the table above.

Output speed by model. 12 bars. Highest: Gemma 4 26B A4B at 429 tok/s. Toggle the data table for exact values.Gemma 4 26B A4BGemma 4 26B A4B429 tok/sHolo3 35B A3BHolo3 35B A3B423 tok/sQwen3.6 35B A3BQwen3.6 35B A3B406 tok/sGPT-5.4 nanoGPT-5.4 nano354 tok/sGemma 4 31BGemma 4 31B270 tok/sGemini 3 FlashGemini 3 Flash254 tok/sQwen3.6 27BQwen3.6 27B253 tok/sGPT-5.4 miniGPT-5.4 mini248 tok/sClaude Haiku 4.5Claude Haiku 4.5228 tok/sGemini 3.1 ProGemini 3.1 Pro99 tok/sGPT-5.4GPT-5.496 tok/sClaude Sonnet 4.6Claude Sonnet 4.693 tok/s

Reasoning models are indicated by a lightbulb icon

Real-time coverage

Latency columns use the US East (Virginia) reference region. Real-time marks a model that closes the loop under 200 ms end to end on Overshoot from at least one region.