Output Speed
Reasoning models are indicated by a lightbulb icon
| # | Details | |||
|---|---|---|---|---|
| 1 | View Gemma 4 26B A4BGemma 4 26B A4B | Overshoot | 429 tok/s | |
| 2 | View Holo3 35B A3BHolo3 35B A3B | Overshoot | 423 tok/s | |
| 3 | View Qwen3.6 35B A3BQwen3.6 35B A3B | Overshoot | 406 tok/s | |
| 4 | View GPT-5.4 nanoGPT-5.4 nano | OpenAI API | 354 tok/s | |
| 5 | View Gemma 4 31BGemma 4 31B | Overshoot | 270 tok/s | |
| 6 | View Qwen3.6 27BQwen3.6 27B | Overshoot | 253 tok/s | |
| 7 | View Gemini 3 FlashGemini 3 Flash | Google Vertex | 247 tok/s | |
| 8 | View GPT-5.4 miniGPT-5.4 mini | OpenAI API | 245 tok/s | |
| 9 | View Claude Haiku 4.5Claude Haiku 4.5 | Anthropic API | 221 tok/s | |
| 10 | View Gemini 3.1 ProGemini 3.1 Pro | Google Vertex | 106 tok/s | |
| 11 | View Claude Sonnet 4.6Claude Sonnet 4.6 | Anthropic API | 95 tok/s | |
| 12 | View GPT-5.4GPT-5.4 | OpenAI API | 91 tok/s | |
| 13 | View Claude Opus 4.6Claude Opus 4.6 | Anthropic API | 55 tok/s |
Speed is measured at batch-1 on the Overshoot API, the reference provider for every model on the board.