Skip to content

Output Speed

Output speed
Output speed. 13 bars. Highest: Gemma 4 26B A4B at 429 tok/s. Toggle the data table for exact values.0 tok/s100 tok/s200 tok/s300 tok/s400 tok/s500 tok/sGOHCALOAGOALGOOAANGOANOAAN

Reasoning models are indicated by a lightbulb icon

#Details
1View Gemma 4 26B A4BGemma 4 26B A4BOvershoot
429 tok/s
2View Holo3 35B A3BHolo3 35B A3BOvershoot
423 tok/s
3View Qwen3.6 35B A3BQwen3.6 35B A3BOvershoot
406 tok/s
4View GPT-5.4 nanoGPT-5.4 nanoOpenAI API
354 tok/s
5View Gemma 4 31BGemma 4 31BOvershoot
270 tok/s
6View Qwen3.6 27BQwen3.6 27BOvershoot
253 tok/s
7View Gemini 3 FlashGemini 3 FlashGoogle Vertex
247 tok/s
8View GPT-5.4 miniGPT-5.4 miniOpenAI API
245 tok/s
9View Claude Haiku 4.5Claude Haiku 4.5Anthropic API
221 tok/s
10View Gemini 3.1 ProGemini 3.1 ProGoogle Vertex
106 tok/s
11View Claude Sonnet 4.6Claude Sonnet 4.6Anthropic API
95 tok/s
12View GPT-5.4GPT-5.4OpenAI API
91 tok/s
13View Claude Opus 4.6Claude Opus 4.6Anthropic API
55 tok/s

Speed is measured at batch-1 on the Overshoot API, the reference provider for every model on the board.