Chinese LLM Tracker

Track and compare the latest releases from China's leading AI labs. Qwen, DeepSeek, Kimi, GLM, and Baichuan - all in one place.

5

Providers

26

Models Tracked

19

Open Source

Dec 2025

Latest Release

Providers

GLM

GLM-4.7

400B parameters

Open Source API
💻 Coding 🧠 Reasoning 🔢 Math 🤖 Agents

Major release competing with GPT-5.2 on coding. Ranked #1 on Code Arena among open-source and domestic models. AIME 2025: 95.7% accuracy.

  • LiveCodeBench: 84.9% (beats Claude 4.5)
  • SWE-bench Verified: 73.8% (SOTA open-source)
  • HLE benchmark: 42.8%

Benchmarks

humaneval

91.5%

gsm8k

97%

math

95.7%

Released Dec 22, 2025 200K context
HuggingFace

Kimi

Kimi K2 Thinking

1T (32B active) parameters

Open Source API
🧠 Reasoning 🤖 Agents 💻 Coding

Reasoning and tool-using thinking agent. Can execute 200-300 sequential tool calls without human interference. State-of-the-art agentic capabilities.

  • 200-300 sequential tool calls
  • LiveCodeBench-v6: 83.1%
  • Advanced agentic reasoning

Benchmarks

humaneval

91%

gsm8k

96.5%

Released Nov 1, 2025 131K context

Kimi

Kimi Linear

48B (3B active) parameters

Open Source API
💬 Chat 📄 Long Context

Uses Kimi Delta Attention (KDA) for efficient long-context processing. Reduces memory usage and improves generation speed at longer context windows.

  • 1M token context window
  • Novel Delta Attention mechanism
  • Efficient memory usage

Benchmarks

mmlu

78%

Released Oct 1, 2025 1M context

DeepSeek

DeepSeek-V3.2

671B (37B active) parameters

API
💬 Chat 🧠 Reasoning 💻 Coding

Latest V3 iteration with improved general capabilities. Enhanced reasoning and coding performance over V3.1.

  • SWE-bench Verified: 73.1%
  • Improved knowledge and academic tasks
  • LiveCodeBench-v6: 83.3%

Benchmarks

mmlu

88.5%

humaneval

90%

gsm8k

95.5%

Released Sep 1, 2025 131K context

Kimi

Kimi K2

1T (32B active) parameters

Open Source API
💬 Chat 🧠 Reasoning 💻 Coding

1 trillion parameter MoE with 32B active. State-of-the-art open-source performance on coding benchmarks. Trained for $4.6M, rivaling ChatGPT and Claude.

  • SOTA open-source coding performance
  • Trained for only $4.6M
  • Beats GPT-4o on multiple benchmarks

Benchmarks

mmlu

87.5%

humaneval

90.5%

gsm8k

95%

Released Jul 14, 2025 131K context
HuggingFace

Kimi

Kimi-Dev

72B parameters

Open Source API
💻 Coding

Coding-focused model based on Qwen2.5-72B. State-of-the-art among open source on SWE-bench Verified.

  • SOTA open-source on SWE-bench Verified
  • Built on Qwen2.5-72B foundation
  • Specialized for software development

Benchmarks

humaneval

89%

Released Jun 1, 2025 131K context

Benchmark Leaders

Top MMLU Scores

1

DeepSeek-R1-0528

DeepSeek

90.8%
2

DeepSeek-V3.2

DeepSeek

88.5%
3

DeepSeek-V3-0324

DeepSeek

87.5%
4

Kimi K2

Kimi

87.5%
5

DeepSeek-V3

DeepSeek

87.1%

Top HumanEval Scores

1

Qwen2.5-Coder-32B-Instruct

Qwen

92.7%
2

Qwen3-235B-A22B

Qwen

92.1%
3

GLM-4.7

GLM

91.5%
4

Kimi K2 Thinking

Kimi

91%
5

Kimi K2

Kimi

90.5%

Compare Models

Select models to compare benchmarks side-by-side with interactive charts.

Start Comparing