Sample data: Phase-1 sample snapshot. Official crawling and weekly benchmark jobs are not connected yet. All price, latency and score values validate the product structure only and must be replaced by traceable production data before launch.
Model comparison
The `models=a,b,c` URL parameter already drives the comparison page; selectors and saved comparisons come next.
| Model | Input | Output | TTFT | Context | Value | Updated | |
|---|---|---|---|---|---|---|---|
| QWQwen 2.5 72BAlibaba Cloud · open | $0.35/1M | $0.70/1M | 156ms | 128K | 91 | 2026-06-09 | Compare |
| DSDeepSeek V3DeepSeek · closed | $0.14/1M | $0.28/1M | 124ms | 128K | 96 | 2026-06-09 | Compare |
QW
Qwen 2.5 72B
Alibaba Cloud · Chinese-language performance is weighted higher in the Chinese task bucket.
DS
DeepSeek V3
DeepSeek · Strong value baseline for coding and Chinese tasks in the sample set.