Chinese LLM Tracker

Track and compare the latest releases from China's leading AI labs. Qwen, DeepSeek, Kimi, GLM, and Baichuan - all in one place.

Providers

Models Tracked

Open Source

Dec 2025

Latest Release

Providers

Qwen

Alibaba Cloud

Alibaba's flagship LLM series, known for strong multilingual capabilities, extensive model sizes from 0.5B to 235B, and industry-leading open-source coding models.

5 models 5 open source

Latest Release

Qwen3-235B-A22B Apr 2025

DeepSeek

DeepSeek AI

Known for efficient MoE architectures and groundbreaking reasoning models. DeepSeek-V3 and R1 series have achieved remarkable performance with innovative training techniques.

5 models 4 open source

Latest Release

DeepSeek-V3.2 Sep 2025

Kimi

Moonshot AI

Moonshot AI's Kimi series is known for pioneering long-context understanding and efficient MoE architectures. K2 became a major open-source competitor in 2025.

6 models 5 open source

Latest Release

Kimi K2 Thinking Nov 2025

GLM

Zhipu AI (Z.ai)

Zhipu AI's GLM series has evolved from ChatGLM to the powerful GLM-4.7, competing with GPT-5 on coding tasks. Known for strong Chinese language capabilities and open-source models.

5 models 4 open source

Latest Release

GLM-4.7 Dec 2025

Baichuan

Baichuan Intelligence

Baichuan focuses on Chinese language excellence and specialized vertical models. Strong in finance and healthcare domains. Backed by Alibaba and Xiaomi.

5 models 1 open source

Latest Release

Baichuan-M2Plus Mar 2025

Latest Releases

View all releases →

GLM

GLM-4.7

400B parameters

Open Source API

💻 Coding 🧠 Reasoning 🔢 Math 🤖 Agents

Major release competing with GPT-5.2 on coding. Ranked #1 on Code Arena among open-source and domestic models. AIME 2025: 95.7% accuracy.

LiveCodeBench: 84.9% (beats Claude 4.5)
SWE-bench Verified: 73.8% (SOTA open-source)
HLE benchmark: 42.8%

Benchmarks

humaneval

91.5%

gsm8k

97%

math

95.7%

Released Dec 22, 2025 200K context

HuggingFace

Kimi

Kimi K2 Thinking

1T (32B active) parameters

Open Source API

🧠 Reasoning 🤖 Agents 💻 Coding

Reasoning and tool-using thinking agent. Can execute 200-300 sequential tool calls without human interference. State-of-the-art agentic capabilities.

200-300 sequential tool calls
LiveCodeBench-v6: 83.1%
Advanced agentic reasoning

Benchmarks

humaneval

91%

gsm8k

96.5%

Released Nov 1, 2025 131K context

Kimi

Kimi Linear

48B (3B active) parameters

Open Source API

💬 Chat 📄 Long Context

Uses Kimi Delta Attention (KDA) for efficient long-context processing. Reduces memory usage and improves generation speed at longer context windows.

1M token context window
Novel Delta Attention mechanism
Efficient memory usage

Benchmarks

mmlu

78%

Released Oct 1, 2025 1M context

DeepSeek

DeepSeek-V3.2

671B (37B active) parameters

API

💬 Chat 🧠 Reasoning 💻 Coding

Latest V3 iteration with improved general capabilities. Enhanced reasoning and coding performance over V3.1.

SWE-bench Verified: 73.1%
Improved knowledge and academic tasks
LiveCodeBench-v6: 83.3%

Benchmarks

mmlu

88.5%

humaneval

90%

gsm8k

95.5%

Released Sep 1, 2025 131K context

Kimi

Kimi K2

1T (32B active) parameters

Open Source API

💬 Chat 🧠 Reasoning 💻 Coding

1 trillion parameter MoE with 32B active. State-of-the-art open-source performance on coding benchmarks. Trained for $4.6M, rivaling ChatGPT and Claude.

SOTA open-source coding performance
Trained for only $4.6M
Beats GPT-4o on multiple benchmarks

Benchmarks

mmlu

87.5%

humaneval

90.5%

gsm8k

95%

Released Jul 14, 2025 131K context

HuggingFace

Kimi

Kimi-Dev

72B parameters

Open Source API

💻 Coding

Coding-focused model based on Qwen2.5-72B. State-of-the-art among open source on SWE-bench Verified.

SOTA open-source on SWE-bench Verified
Built on Qwen2.5-72B foundation
Specialized for software development

Benchmarks

humaneval

89%

Released Jun 1, 2025 131K context

Benchmark Leaders

Top MMLU Scores

DeepSeek-R1-0528

DeepSeek

90.8%

DeepSeek-V3.2

DeepSeek

88.5%

DeepSeek-V3-0324

DeepSeek

87.5%

Kimi K2

Kimi

87.5%

DeepSeek-V3

DeepSeek

87.1%

Top HumanEval Scores

Qwen2.5-Coder-32B-Instruct

Qwen

92.7%

Qwen3-235B-A22B

Qwen

92.1%

GLM-4.7

GLM

91.5%

Kimi K2 Thinking

Kimi

91%

Kimi K2

Kimi

90.5%

Compare Models

Select models to compare benchmarks side-by-side with interactive charts.

Start Comparing