vLLM (self-host)

Self-host vLLM

Models on live video: 5
Regions measured: 2
Real-time models: 2 of 5
Median output speed: 212 tok/s
Median E2E · US East: 281 ms
Median blended $/1M: $0.08

Models on vLLM (self-host)

#
1	View gemma-4-26b-a4bGemma 4 26B A4B	Google	240 tok/s	66 ms	166 ms	$0.08	<200 ms
2	View qwen3-6-35b-a3bQwen3.6 35B A3BReasoning	Alibaba (Qwen)	219 tok/s	66 ms	2.7 s	$0.08	–
3	View holo3-35b-a3bHolo3 35B A3B	H Company	212 tok/s	74 ms	188 ms	$0.08	<200 ms
4	View gemma-4-31bGemma 4 31B	Google	137 tok/s	106 ms	281 ms	$0.17	–
5	View qwen3-6-27bQwen3.6 27B	Alibaba (Qwen)	136 tok/s	105 ms	281 ms	$0.17	–

Output speed by model

Reasoning models are indicated by a lightbulb icon

Real-time coverage

US East (Virginia)2 of 5 under 200 ms

Gemma 4 26B A4B Holo3 35B A3B

EU West (Ireland)2 of 5 under 200 ms

Gemma 4 26B A4B Holo3 35B A3B

Latency columns use the US East (Virginia) reference region. Real-time marks a model that closes the loop under 200 ms end to end on vLLM (self-host) from at least one region.