Skip to content

vLLM (self-host)

Self-host vLLM
Models on live video
5
Regions measured
2
Real-time models
2 of 5
Median output speed
212 tok/s
Median E2E · US East
281 ms
Median blended $/1M
$0.08

Models on vLLM (self-host)

#Details
1View gemma-4-26b-a4bGemma 4 26B A4BGoogle
240 tok/s
66 ms166 ms$0.08<200 ms
2View qwen3-6-35b-a3bQwen3.6 35B A3BReasoningAlibaba (Qwen)
219 tok/s
66 ms2.7 s$0.08
3View holo3-35b-a3bHolo3 35B A3BH Company
212 tok/s
74 ms188 ms$0.08<200 ms
4View gemma-4-31bGemma 4 31BGoogle
137 tok/s
106 ms281 ms$0.17
5View qwen3-6-27bQwen3.6 27BAlibaba (Qwen)
136 tok/s
105 ms281 ms$0.17
Output speed by model
Output speed by model. 5 bars. Highest: Gemma 4 26B A4B at 240 tok/s. Toggle the data table for exact values.Gemma 4 26B A4BGemma 4 26B A4B240 tok/sQwen3.6 35B A3BQwen3.6 35B A3B219 tok/sHolo3 35B A3BHolo3 35B A3B212 tok/sGemma 4 31BGemma 4 31B137 tok/sQwen3.6 27BQwen3.6 27B136 tok/s

Reasoning models are indicated by a lightbulb icon

Real-time coverage

US East (Virginia)2 of 5 under 200 ms
EU West (Ireland)2 of 5 under 200 ms

Latency columns use the US East (Virginia) reference region. Real-time marks a model that closes the loop under 200 ms end to end on vLLM (self-host) from at least one region.