Compare Models
Side-by-side comparison of benchmark performance across Chinese LLM providers.
Benchmark Comparison
Bar Chart View
Radar Chart View
Full Comparison Table
Showing top performing models. Green highlights indicate best score in category.
| Model | Provider | Parameters | Context | Open Source | mmlu | humaneval | gsm8k | math | mmlu-pro | gpqa |
|---|---|---|---|---|---|---|---|---|---|---|
DeepSeek-R1-0528 | DeepSeek | 671B (37B active) | 131K | Yes | 90.8% | 89.5% | 97% | 94.5% | - | - |
Qwen3-235B-A22B | Qwen | 235B (22B active) | 131K | Yes | 86.5% | 92.1% | 95.2% | 90.3% | - | - |
DeepSeek-V3.2 | DeepSeek | 671B (37B active) | 131K | No | 88.5% | 90% | 95.5% | - | - | - |
Kimi K2 | Kimi | 1T (32B active) | 131K | Yes | 87.5% | 90.5% | 95% | - | - | - |
DeepSeek-V3-0324 | DeepSeek | 671B (37B active) | 131K | Yes | 87.5% | 88% | - | - | 81.2% | 68.4% |
DeepSeek-V3 | DeepSeek | 671B (37B active) | 131K | Yes | 87.1% | 86.5% | 93% | - | - | - |
QwQ-32B | Qwen | 32B | 131K | Yes | 85% | 88.5% | 94% | 87.5% | - | - |
Qwen2.5-72B-Instruct | Qwen | 72B | 131K | Yes | 86.1% | 86.4% | 91.2% | 83.1% | - | - |