Qwen
Alibaba Cloud
Alibaba's flagship LLM series, known for strong multilingual capabilities, extensive model sizes from 0.5B to 235B, and industry-leading open-source coding models.
Models
Qwen3-235B-A22B
235B (22B active) parameters
Flagship MoE reasoning model. Hybrid thinking/non-thinking modes. Trained on 36T tokens across 119 languages. Outperforms DeepSeek-R1 on 17/23 benchmarks.
- State-of-the-art open-source reasoning model
- Hybrid thinking mode for complex problems
- 35% total and 60% activated parameters vs DeepSeek-R1
Benchmarks
mmlu
86.5%
humaneval
92.1%
gsm8k
95.2%
math
90.3%
QwQ-32B
32B parameters
Medium-sized reasoning model achieving performance comparable to DeepSeek-R1 (671B). Strong step-by-step reasoning capabilities.
- Competitive with 671B DeepSeek-R1 using only 32B parameters
- ArenaHard: 89.5
- CodeForces Elo: 1982
Benchmarks
mmlu
85%
humaneval
88.5%
gsm8k
94%
math
87.5%
Qwen2.5-Coder-32B-Instruct
32B parameters
State-of-the-art open-source code LLM. Trained on 5.5T tokens. Performance matches GPT-4o on coding tasks. Supports 40+ programming languages.
- HumanEval pass@1: 92.7%
- Aider benchmark: 73.7 (comparable to GPT-4o)
- McEval multilingual: 65.9
Benchmarks
humaneval
92.7%
mbpp
90.2%
Qwen2.5-72B-Instruct
72B parameters
Major upgrade with improved reasoning and 128K context. Surpasses Llama-3.1-405B on several benchmarks. MMLU improved from 84.2 to 86.1.
- Beats Llama-3.1-405B despite smaller size
- 128K context window
- Apache 2.0 for most sizes
Benchmarks
mmlu
86.1%
humaneval
86.4%
gsm8k
91.2%
math
83.1%
Qwen2-VL-72B
72B parameters
Multimodal vision-language model. Strong image understanding, document analysis, and visual reasoning capabilities.
- State-of-the-art open-source VLM
- Dynamic resolution support
- Video understanding capability
Benchmarks
mmlu
82%