Compare models
Up to four models side by side, measured the way they run on live video.
Up to four models side by side, measured the way they run on live video.
| 2 of 4 models | |||
|---|---|---|---|
| Intelligence Index | best in row,86 | 85 | |
| OCR & Text | 95 | 95 | |
| Document & Chart | best in row,87 | 85 | |
| Scene & Spatial | 84 | best in row,86 | |
| Video QA | best in row,86 | 83 | |
| Grounding & Detection | 76 | 76 | |
| Structured Extraction | 84 | best in row,85 | |
| Output Speed | 55 tok/s-36 tok/s | best in row,91 tok/s | |
| TTFT | 207 ms+70 ms | best in row,138 ms | |
| End-to-End | 12 s+3.0 s | best in row,9.3 s | |
| Real-time (<200 ms) | No | No | |
| Blended $/1M | $6.93+$0.25 | best in row,$6.68 | |
| Cost / Task | $5.11 | best in row,$4.93 | |
| Context | 400k | 400k | |
| Params | – | – | |
| Open? | Closed | Closed | |
| License | Proprietary | Proprietary |