评测榜单

致力于探索最先进的大模型,为产研界提供全面、客观、中立的评测参考

Rank Model Score
Claude4.5-sonnet-thinking 37.65
Gemini3-pro 37.02
Doubao-Seed-1.6-Thinking 36.79
4 o4-mini 34.51
5 Grok4 34.34
6 Hunyuan-2.0-Thinking 33.38
7 DeepSeek-V3.2-thinking 32.6
8 DeepSeek-V3.2-Speciale 31.42
9 GPT-5.2-Thinking 29.98
10 DeepSeek-R1 29.43
11 Llama4-Maverick 28.87
12 Qwen3-Max 28.66
13 Gemini2.5-pro 28.48
14 Qwen3-32B 26.12
15 ERNIE4.0 22.69
16 Doubao-pro 19.13
17 GLM4-32B 14.82
18 Baichuan4 13.88
Rank Model Score
Claude4.5-sonnet-thinking 30.61
DeepSeek-V3.2-Speciale 29
DeepSeek-V3.2-thinking 25.87
4 Doubao-Seed-1.6-Thinking 24.72
5 Gemini3-pro 23.9
6 Grok4 23.61
7 Hunyuan-2.0-Thinking 23
8 o4-mini 21.36
9 ERNIE4.0 20.6
10 Qwen3-Max 20.44
11 Llama4-Maverick 18.76
12 GPT-5.2-Thinking 18.65
13 Gemini2.5-pro 18.42
14 DeepSeek-R1 17.14
15 Baichuan4 14.95
16 GLM4-32B 12.36
17 Qwen3-32B 11.32
18 Doubao-pro 9.68
Rank Model Score
o4-mini 55.07
Doubao-Seed-1.6-Thinking 53.5
DeepSeek-V3.2-thinking 51.6
4 Hunyuan-2.0-Thinking 47.61
5 Qwen3-32B 47.33
6 DeepSeek-V3.2-Speciale 46.89
7 Claude4.5-sonnet-thinking 46.61
8 DeepSeek-R1 45.53
9 Grok4 45.11
10 Gemini3-pro 41.83
11 GPT-5.2-Thinking 37.56
12 Qwen3-Max 35.98
13 Llama4-Maverick 35.43
14 Gemini2.5-pro 34
15 GLM4-32B 27.33
16 ERNIE4.0 27.33
17 Doubao-pro 27.07
18 Baichuan4 20.67
Rank Model Score
Claude4.5-sonnet-thinking 53
GPT-5.2-Thinking 53
Gemini3-pro 48.33
4 Hunyuan-2.0-Thinking 48.33
5 o4-mini 45.4
6 Grok4 45
7 Qwen3-Max 43
8 Doubao-Seed-1.6-Thinking 41.44
9 DeepSeek-V3.2-thinking 41
10 DeepSeek-V3.2-Speciale 39
11 DeepSeek-R1 37
12 Gemini2.5-pro 37
13 Llama4-Maverick 35
14 Qwen3-32B 33
15 GLM4-32B 26.5
16 Baichuan4 15.8
17 Doubao-pro 5.5
18 ERNIE4.0 2
Rank Model Score
Gemini2.5-pro 29.92
DeepSeek-V3.2-thinking 27.92
Gemini3-pro 26
4 Grok4 24
5 Qwen3-32B 22
6 DeepSeek-R1 20.4
7 Llama4-Maverick 20
8 Doubao-Seed-1.6-Thinking 20
9 DeepSeek-V3.2-Speciale 19.83
10 o4-mini 18.09
11 GPT-5.2-Thinking 15.38
12 Baichuan4 14
13 Qwen3-Max 12
14 Hunyuan-2.0-Thinking 11.54
15 Doubao-pro 11.1
16 Claude4.5-sonnet-thinking 8
17 ERNIE4.0 7.18
18 GLM4-32B 3.38
Rank Model Score
Gemini3-pro 62.44
Claude4.5-sonnet-thinking 48.96
Doubao-Seed-1.6-Thinking 47.96
4 Llama4-Maverick 45.7
5 Grok4 44.85
6 GPT-5.2-Thinking 42.74
7 Doubao-pro 41.68
8 Hunyuan-2.0-Thinking 41.4
9 Gemini2.5-pro 40.38
10 Qwen3-Max 39.4
11 o4-mini 39.32
12 ERNIE4.0 39.04
13 DeepSeek-R1 36.72
14 Qwen3-32B 29.92
15 DeepSeek-V3.2-thinking 19.1
16 DeepSeek-V3.2-Speciale 16.28
17 GLM4-32B 2.08
18 Baichuan4 0

领域类榜单

Rank Model Score
Doubao-Seed-1.6-Thinking 45.42
Grok4 44.79
Gemini3-pro 43.98
4 Llama4-Maverick 43.12
5 o4-mini 42.93
6 Claude4.5-sonnet-thinking 41.42
7 Gemini2.5-pro 37.02
8 Hunyuan-2.0-Thinking 36.02
9 Qwen3-32B 35.47
10 GPT-5.2-Thinking 34.08
11 DeepSeek-R1 33.39
12 Qwen3-Max 33.14
13 Doubao-pro 26.75
14 DeepSeek-V3.2-thinking 26.59
15 ERNIE4.0 24.01
16 DeepSeek-V3.2-Speciale 23.87
17 GLM4-32B 14.44
18 Baichuan4 13.62
Rank Model Score
Doubao-Seed-1.6-Thinking 48.33
DeepSeek-V3.2-Speciale 48.33
Gemini3-pro 43.33
4 Hunyuan-2.0-Thinking 41.67
5 Claude4.5-sonnet-thinking 35
6 DeepSeek-V3.2-thinking 31.67
7 ERNIE4.0 30
8 Grok4 30
9 Gemini2.5-pro 26.67
10 Qwen3-Max 25
11 GPT-5.2-Thinking 23.33
12 GLM4-32B 22
13 Qwen3-32B 22
14 o4-mini 21.67
15 Baichuan4 20
16 DeepSeek-R1 20
17 Llama4-Maverick 16.67
18 Doubao-pro 16.67
Rank Model Score
Gemini3-pro 32
DeepSeek-V3.2-Speciale 30
DeepSeek-V3.2-thinking 26
4 Claude4.5-sonnet-thinking 22
5 Hunyuan-2.0-Thinking 20
6 Qwen3-Max 18
7 GPT-5.2-Thinking 18
8 Gemini2.5-pro 8
9 DeepSeek-R1 6
10 Qwen3-32B 2
11 Doubao-Seed-1.6-Thinking 2
12 GLM4-32B 0
13 Baichuan4 0
14 Llama4-Maverick 0
15 Doubao-pro 0
16 o4-mini 0
17 ERNIE4.0 0
18 Grok4 0
Rank Model Score
Claude4.5-sonnet-thinking 33.75
DeepSeek-V3.2-Speciale 30
Qwen3-Max 29.6
4 ERNIE4.0 29.5
5 DeepSeek-V3.2-thinking 29.17
6 Hunyuan-2.0-Thinking 26.66
7 GPT-5.2-Thinking 25.95
8 DeepSeek-R1 25.4
9 Grok4 24.01
10 o4-mini 21.28
11 Llama4-Maverick 18.9
12 Gemini3-pro 18.75
13 Baichuan4 17.9
14 Gemini2.5-pro 17.56
15 GLM4-32B 15.9
16 Doubao-Seed-1.6-Thinking 15.55
17 Doubao-pro 14.55
18 Qwen3-32B 10.3
Rank Model Score
DeepSeek-V3.2-thinking 87.8
DeepSeek-V3.2-Speciale 75.67
o4-mini 74.2
4 Claude4.5-sonnet-thinking 70.83
5 Doubao-Seed-1.6-Thinking 67.5
6 Hunyuan-2.0-Thinking 57.83
7 DeepSeek-R1 55.6
8 GPT-5.2-Thinking 46.67
9 Gemini3-pro 41.5
10 Grok4 41.32
11 GLM4-32B 34
12 Qwen3-32B 34
13 ERNIE4.0 32
14 Qwen3-Max 30.93
15 Gemini2.5-pro 20
16 Baichuan4 15
17 Llama4-Maverick 13.3
18 Doubao-pro 12.2