Compare Models

Side-by-side comparison of benchmark performance across Chinese LLM providers.

Benchmark Comparison

Showing top performing models. Green highlights indicate best score in category.

Model	Provider	Parameters	Context	Open Source	mmlu	humaneval	gsm8k	math	mmlu-pro	gpqa
DeepSeek-R1-0528	DeepSeek	671B (37B active)	131K	Yes	90.8%	89.5%	97%	94.5%	-	-
Qwen3-235B-A22B	Qwen	235B (22B active)	131K	Yes	86.5%	92.1%	95.2%	90.3%	-	-
DeepSeek-V3.2	DeepSeek	671B (37B active)	131K	No	88.5%	90%	95.5%	-	-	-
Kimi K2	Kimi	1T (32B active)	131K	Yes	87.5%	90.5%	95%	-	-	-
DeepSeek-V3-0324	DeepSeek	671B (37B active)	131K	Yes	87.5%	88%	-	-	81.2%	68.4%
DeepSeek-V3	DeepSeek	671B (37B active)	131K	Yes	87.1%	86.5%	93%	-	-	-
QwQ-32B	Qwen	32B	131K	Yes	85%	88.5%	94%	87.5%	-	-
Qwen2.5-72B-Instruct	Qwen	72B	131K	Yes	86.1%	86.4%	91.2%	83.1%	-	-