评测榜单
致力于探索最先进的大模型,为产研界提供全面、客观、中立的评测参考
Rank | Model | Score | WTQ | PeopleRelQA | ReportFixer | KCQAD | AttributionNLI | ASPBench-ASC | AffairQA | BioTextQA | MatTextQA | PharmKGQA | ChineseLawFact | VersiCode | UAQFact | TaxReasoner | ElaBench |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
![]() |
o4-mini | 54.79 | 79.80 | 3.50 | 49.30 | 41.20 | 69.67 | 35.95 | 44.50 | 88.10 | 71.43 | 42.86 | 69.63 | 67.75 | 62.00 | 27.00 | 69.23 |
![]() |
GPT-5-mini | 53.48 | 82.03 | 4.30 | 84.73 | 43.60 | 58.00 | 17.00 | 42.00 | 88.57 | 77.62 | 47.37 | 73.50 | 55.31 | 56.00 | 26.00 | 46.15 |
![]() |
DeepSeek-R1 | 47.74 | 74.30 | 6.80 | 59.70 | 64.44 | 71.20 | 32.20 | 45.50 | 33.81 | 50.48 | 31.37 | 58.00 | 65.60 | 46.00 | 46.00 | 30.77 |
4 | GPT-5-nano | 46.65 | 82.20 | 2.89 | 51.47 | 31.60 | 56.00 | 9.00 | 42.00 | 86.67 | 73.81 | 39.71 | 65.00 | 49.29 | 44.00 | 20.00 | 46.15 |
5 | Llama4-Maverick | 46.51 | 72.00 | 3.50 | 32.10 | 36.20 | 52.00 | 30.67 | 43.50 | 82.38 | 71.43 | 40.48 | 73.12 | 64.78 | 48.00 | 20.00 | 27.46 |
6 | DeepSeek-V3 | 45.07 | 69.90 | 2.60 | 57.90 | 56.20 | 56.00 | 48.90 | 42.50 | 55.71 | 39.90 | 39.04 | 53.87 | 37.40 | 36.00 | 34.00 | 46.15 |
7 | GPT4o | 43.56 | 69.40 | 3.20 | 44.70 | 64.00 | 63.00 | 23.40 | 41.00 | 43.81 | 61.43 | 39.23 | 56.63 | 66.50 | 42.00 | 12.00 | 23.08 |
8 | QWQ-32B | 43.46 | 70.50 | 3.00 | 32.30 | 65.53 | 51.20 | 18.90 | 45.00 | 76.67 | 62.38 | 45.67 | 69.00 | 23.70 | 30.00 | 27.00 | 31.01 |
9 | Qwen2.5-72B | 43.00 | 65.50 | 2.50 | 38.90 | 58.40 | 57.30 | 15.30 | 45.00 | 81.43 | 62.86 | 38.09 | 70.50 | 35.90 | 32.00 | 21.00 | 20.38 |
10 | Llama3.1-70B | 39.42 | 47.70 | 2.20 | 24.20 | 47.80 | 54.50 | 30.70 | 40.00 | 88.57 | 71.43 | 34.33 | 59.38 | 50.90 | 28.00 | 9.00 | 2.64 |
11 | Doubao-pro | 37.96 | 46.00 | 0.00 | 25.30 | 41.08 | 60.10 | 15.20 | 40.00 | 83.33 | 50.00 | 27.14 | 57.50 | 63.10 | 28.00 | 25.00 | 7.69 |
12 | GLM4-9B | 33.41 | 39.10 | 0.20 | 6.60 | 55.00 | 45.10 | 10.20 | 38.50 | 80.95 | 58.10 | 17.70 | 66.25 | 58.50 | 14.00 | 10.00 | 0.88 |
13 | ERNIE3.5 | 31.77 | 7.30 | 0.10 | 19.50 | 25.20 | 46.80 | 2.00 | 41.50 | 80.19 | 49.72 | 27.47 | 78.00 | 59.78 | 26.00 | 13.00 | nan |
14 | Qwen2.5-7B | 29.21 | 30.60 | 0.50 | 17.00 | 42.00 | 35.80 | 24.50 | 46.00 | 50.95 | 37.50 | 31.55 | 62.88 | 17.80 | 28.00 | 10.00 | 3.02 |
15 | Llama3.1-8B | 26.90 | 35.70 | 0.20 | 2.50 | 50.20 | 32.60 | 10.00 | 42.00 | 55.23 | 55.98 | 23.53 | 57.13 | 11.80 | 18.00 | 8.00 | 0.60 |
Rank | Model | Score | WTQ | PeopleRelQA | ReportFixer | KCQAD | AttributionNLI | ASPBench-ASC | AffairQA | BioTextQA | MatTextQA | PharmKGQA | ChineseLawFact | VersiCode | UAQFact | TaxReasoner | ElaBench |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
![]() |
o3 | 29.86 | 53.00 | 20.00 | 67.70 | 56.82 | 22.91 | 28.55 | 2.00 | 41.90 | 38.46 | 22.49 | 23.33 | 29.58 | 32.00 | 4.00 | 5.17 |
![]() |
GPT-5 | 27.89 | 46.00 | 12.64 | 79.17 | 33.33 | 14.00 | 10.42 | 2.00 | 43.81 | 73.33 | 21.53 | 18.00 | 16.00 | 36.00 | 6.00 | 6.12 |
![]() |
o4-mini | 27.30 | 45.40 | 4.00 | 74.20 | 42.54 | 20.83 | 39.76 | 0.00 | 43.33 | 36.67 | 21.90 | 21.67 | 18.09 | 36.00 | 0.00 | 5.17 |
4 | Claude4-sonnet | 27.12 | 44.32 | 12.00 | 79.26 | 10.00 | 33.33 | 14.00 | 6.00 | 84.76 | 49.53 | 33.65 | 30.00 | 6.00 | 4.00 | 0.00 | 0.00 |
5 | Qwen3-32B | 26.44 | 33.00 | 4.00 | 34.00 | 12.00 | 31.25 | 8.00 | 0.00 | 85.71 | 60.38 | 34.31 | 22.00 | 22.00 | 48.00 | 2.00 | 0.00 |
6 | GPT-5-mini | 26.31 | 51.00 | 10.00 | 82.48 | 43.60 | 12.00 | 12.00 | 0.00 | 45.24 | 38.57 | 23.44 | 26.00 | 16.36 | 32.00 | 2.00 | 0.02 |
7 | GPT4.1 | 23.68 | 44.00 | 4.00 | 47.30 | 51.46 | 25.00 | 22.28 | 2.00 | 24.29 | 35.71 | 18.57 | 23.33 | 17.33 | 40.00 | 0.00 | 0.00 |
8 | o1 | 22.11 | 41.00 | 8.00 | 36.30 | 22.60 | 27.10 | 26.20 | 0.00 | 42.45 | 33.65 | 21.15 | 18.33 | 8.80 | 44.00 | 2.00 | 0.00 |
9 | Qwen3-14B | 22.01 | 30.00 | 4.00 | 26.00 | 6.00 | 25.00 | 4.00 | 0.00 | 91.59 | 60.38 | 39.22 | 24.00 | 12.00 | 8.00 | 0.00 | 0.00 |
10 | GPT-5-nano | 21.42 | 45.00 | 4.00 | 75.71 | 31.60 | 8.00 | 4.00 | 0.00 | 43.33 | 37.62 | 18.18 | 20.00 | 4.00 | 28.00 | 0.00 | 1.79 |
11 | Grok3 | 20.76 | 44.30 | 6.00 | 70.80 | 4.33 | 14.60 | 24.80 | 3.00 | 40.00 | 29.52 | 19.62 | 11.67 | 14.80 | 28.00 | 0.00 | 0.00 |
12 | Gemini2.5-pro | 19.93 | 37.00 | 8.00 | 20.00 | 22.33 | 20.83 | 12.24 | 2.00 | 41.90 | 30.48 | 21.53 | 26.67 | 29.92 | 12.00 | 8.00 | 6.12 |
13 | Llama4-Maverick | 19.61 | 35.00 | 6.00 | 13.30 | 39.32 | 12.50 | 28.43 | 2.00 | 40.48 | 33.81 | 18.57 | 16.67 | 20.00 | 28.00 | 0.00 | 0.00 |
14 | Qwen3-8B | 19.21 | 30.40 | 0.00 | 16.00 | 10.00 | 22.91 | 4.00 | 0.00 | 78.10 | 52.88 | 31.87 | 32.00 | 10.00 | 0.00 | 0.00 | 0.00 |
15 | ERNIE4.0 | 18.02 | 2.00 | 0.00 | 32.00 | 14.00 | 10.41 | 4.00 | 0.00 | 88.10 | 60.11 | 22.53 | 30.00 | 7.18 | 0.00 | 0.00 | nan |
16 | Baichuan4 | 17.73 | 15.80 | 1.00 | 15.00 | 18.93 | 8.33 | 0.00 | 0.00 | 84.98 | 59.05 | 16.88 | 20.00 | 14.00 | 12.00 | 0.00 | 0.00 |
17 | DeepSeek-R1 | 17.73 | 37.00 | 8.00 | 55.60 | 8.89 | 25.00 | 32.70 | 0.00 | 9.52 | 1.43 | 5.39 | 20.00 | 20.40 | 36.00 | 6.00 | 0.00 |
18 | DeepSeek-Prover-V2 | 17.05 | 29.00 | 2.00 | 36.00 | 32.00 | 14.00 | 10.00 | 2.00 | 39.52 | 35.24 | 16.75 | 20.00 | 7.27 | 12.00 | 0.00 | 0.00 |
19 | GLM4 | 16.66 | 12.10 | 4.00 | 18.00 | 4.37 | 6.25 | 4.08 | 0.00 | 85.71 | 53.33 | 22.12 | 28.00 | 8.00 | 4.00 | 0.00 | 0.00 |
20 | GPT4o | 16.54 | 33.00 | 2.50 | 39.30 | 11.00 | 18.80 | 22.40 | 0.00 | 23.33 | 30.48 | 19.62 | 11.67 | 0.00 | 36.00 | 0.00 | 0.00 |
21 | Qwen2.5-72B | 15.90 | 24.50 | 8.10 | 21.30 | 5.40 | 12.50 | 15.00 | 0.00 | 40.95 | 31.90 | 21.43 | 6.67 | 14.00 | 32.00 | 0.00 | 4.72 |
22 | GLM4-32B | 15.85 | 26.50 | 2.00 | 34.00 | 8.00 | 10.41 | 4.17 | 2.00 | 62.86 | 28.16 | 26.26 | 22.00 | 3.38 | 8.00 | 0.00 | 0.00 |
23 | DeepSeek-V3 | 15.69 | 6.00 | 2.00 | 37.80 | 5.00 | 12.50 | 49.30 | 0.00 | 28.10 | 20.10 | 17.65 | 6.67 | 16.50 | 28.00 | 4.00 | 1.79 |
24 | ERNIE3.5 | 14.89 | 6.00 | 2.00 | 50.00 | 14.00 | 8.31 | 2.00 | 0.00 | 40.10 | 23.76 | 27.47 | 18.00 | 7.69 | 24.00 | 0.00 | nan |
25 | Claude3.7-sonnet | 14.83 | 39.20 | 0.40 | 3.20 | 3.60 | 18.80 | 33.80 | 0.00 | 40.00 | 22.01 | 18.36 | 23.33 | 3.70 | 16.00 | 0.00 | 0.00 |
26 | Llama3.1-70B | 14.34 | 18.40 | 0.80 | 9.80 | 4.40 | 8.30 | 32.20 | 0.00 | 44.29 | 35.24 | 15.42 | 20.00 | 4.50 | 20.00 | 0.00 | 1.79 |
27 | Hunyuan-turbo | 13.79 | 29.50 | 3.00 | 0.80 | 3.40 | 2.10 | 24.90 | 4.00 | 41.43 | 29.05 | 15.53 | 23.33 | 9.80 | 20.00 | 0.00 | 0.00 |
28 | QWQ-32B | 12.60 | 32.00 | 0.40 | 12.00 | 11.62 | 16.70 | 0.00 | 0.00 | 38.57 | 28.57 | 21.63 | 21.67 | 1.90 | 4.00 | 0.00 | 0.00 |
29 | Doubao-pro | 11.48 | 5.50 | 0.00 | 12.20 | 2.61 | 6.30 | 16.70 | 0.00 | 42.38 | 22.86 | 11.90 | 16.67 | 11.10 | 24.00 | 0.00 | 0.00 |
30 | GLM4-9B | 11.46 | 15.90 | 0.00 | 3.10 | 5.00 | 2.10 | 22.10 | 0.00 | 41.43 | 28.57 | 9.09 | 26.67 | 7.40 | 8.00 | 0.00 | 2.52 |
31 | Llama3.1-8B | 10.10 | 2.00 | 0.00 | 0.00 | 4.20 | 2.10 | 9.00 | 0.00 | 28.10 | 26.79 | 11.76 | 31.67 | 0.00 | 32.00 | 0.00 | 3.81 |
32 | Qwen2.5-7B | 9.77 | 9.50 | 2.00 | 0.00 | 2.20 | 10.40 | 22.10 | 0.00 | 26.67 | 17.79 | 14.29 | 21.67 | 0.00 | 20.00 | 0.00 | 0.00 |
Rank | Model | Score |
---|---|---|
![]() |
Qwen3-32B | 30.48 |
![]() |
Claude4-sonnet | 29.66 |
![]() |
Qwen3-14B | 29.57 |
4 | ERNIE4.0 | 28.95 |
5 | Qwen3-8B | 27.98 |
6 | GPT-5 | 27.80 |
7 | o3 | 27.51 |
8 | Baichuan4 | 27.33 |
9 | GLM4 | 25.38 |
10 | o4-mini | 24.32 |
11 | GPT-5-mini | 23.92 |
12 | GPT4.1 | 22.83 |
13 | Gemini2.5-pro | 22.33 |
14 | o1 | 20.88 |
15 | Llama4-Maverick | 20.40 |
16 | GPT-5-nano | 20.33 |
17 | DeepSeek-Prover-V2 | 20.11 |
18 | GLM4-32B | 18.77 |
19 | QWQ-32B | 16.73 |
20 | Llama3.1-70B | 16.29 |
21 | Claude3.7-sonnet | 15.39 |
22 | GLM4-9B | 15.18 |
23 | ERNIE3.5 | 14.88 |
24 | Qwen2.5-72B | 14.59 |
25 | Grok3 | 14.30 |
26 | Hunyuan-turbo | 14.19 |
27 | Llama3.1-8B | 13.81 |
28 | GPT4o | 13.61 |
29 | Doubao-pro | 12.97 |
30 | Qwen2.5-7B | 11.25 |
31 | DeepSeek-V3 | 11.17 |
32 | DeepSeek-R1 | 10.12 |
Rank | Model | Score |
---|---|---|
![]() |
GPT-5 | 30.27 |
![]() |
GPT-5-mini | 29.58 |
![]() |
o3 | 28.84 |
4 | o4-mini | 27.22 |
5 | Claude4-sonnet | 26.98 |
6 | Grok3 | 25.48 |
7 | GPT-5-nano | 25.18 |
8 | Qwen3-32B | 24.06 |
9 | GPT4.1 | 22.37 |
10 | o1 | 21.89 |
11 | DeepSeek-R1 | 21.00 |
12 | ERNIE3.5 | 20.69 |
13 | GPT4o | 19.48 |
14 | DeepSeek-V3 | 17.09 |
15 | Qwen2.5-72B | 16.57 |
16 | Qwen3-14B | 15.44 |
17 | GLM4-32B | 14.45 |
18 | DeepSeek-Prover-V2 | 13.75 |
19 | Llama4-Maverick | 13.57 |
20 | Gemini2.5-pro | 12.71 |
21 | ERNIE4.0 | 10.91 |
22 | GLM4 | 9.62 |
23 | Doubao-pro | 9.62 |
24 | Qwen3-8B | 9.57 |
25 | Llama3.1-70B | 9.20 |
26 | Baichuan4 | 8.98 |
27 | Llama3.1-8B | 8.75 |
28 | Hunyuan-turbo | 8.67 |
29 | QWQ-32B | 7.61 |
30 | Claude3.7-sonnet | 7.59 |
31 | Qwen2.5-7B | 7.26 |
32 | GLM4-9B | 4.04 |
Rank | Model | Score |
---|---|---|
![]() |
o3 | 53.00 |
![]() |
GPT-5-mini | 51.00 |
![]() |
GPT-5 | 46.00 |
4 | o4-mini | 45.40 |
5 | GPT-5-nano | 45.00 |
6 | Claude4-sonnet | 44.32 |
7 | Grok3 | 44.30 |
8 | GPT4.1 | 44.00 |
9 | o1 | 41.00 |
10 | Claude3.7-sonnet | 39.20 |
11 | DeepSeek-R1 | 37.00 |
12 | Gemini2.5-pro | 37.00 |
13 | Llama4-Maverick | 35.00 |
14 | Qwen3-32B | 33.00 |
15 | GPT4o | 33.00 |
16 | QWQ-32B | 32.00 |
17 | Qwen3-8B | 30.40 |
18 | Qwen3-14B | 30.00 |
19 | Hunyuan-turbo | 29.50 |
20 | DeepSeek-Prover-V2 | 29.00 |
21 | GLM4-32B | 26.50 |
22 | Qwen2.5-72B | 24.50 |
23 | Llama3.1-70B | 18.40 |
24 | GLM4-9B | 15.90 |
25 | Baichuan4 | 15.80 |
26 | GLM4 | 12.10 |
27 | Qwen2.5-7B | 9.50 |
28 | DeepSeek-V3 | 6.00 |
29 | ERNIE3.5 | 6.00 |
30 | Doubao-pro | 5.50 |
31 | Llama3.1-8B | 2.00 |
32 | ERNIE4.0 | 2.00 |
Rank | Model | Score |
---|---|---|
![]() |
Gemini2.5-pro | 29.92 |
![]() |
o3 | 29.58 |
![]() |
Qwen3-32B | 22.00 |
4 | DeepSeek-R1 | 20.40 |
5 | Llama4-Maverick | 20.00 |
6 | o4-mini | 18.09 |
7 | GPT4.1 | 17.33 |
8 | DeepSeek-V3 | 16.50 |
9 | GPT-5-mini | 16.36 |
10 | GPT-5 | 16.00 |
11 | Grok3 | 14.80 |
12 | Baichuan4 | 14.00 |
13 | Qwen2.5-72B | 14.00 |
14 | Qwen3-14B | 12.00 |
15 | Doubao-pro | 11.10 |
16 | Qwen3-8B | 10.00 |
17 | Hunyuan-turbo | 9.80 |
18 | o1 | 8.80 |
19 | GLM4 | 8.00 |
20 | ERNIE3.5 | 7.69 |
21 | GLM4-9B | 7.40 |
22 | DeepSeek-Prover-V2 | 7.27 |
23 | ERNIE4.0 | 7.18 |
24 | Claude4-sonnet | 6.00 |
25 | Llama3.1-70B | 4.50 |
26 | GPT-5-nano | 4.00 |
27 | Claude3.7-sonnet | 3.70 |
28 | GLM4-32B | 3.38 |
29 | QWQ-32B | 1.90 |
30 | Llama3.1-8B | 0.00 |
31 | Qwen2.5-7B | 0.00 |
32 | GPT4o | 0.00 |
Rank | Model | Score |
---|---|---|
![]() |
DeepSeek-V3 | 49.30 |
![]() |
o4-mini | 39.76 |
![]() |
Claude3.7-sonnet | 33.80 |
4 | DeepSeek-R1 | 32.70 |
5 | Llama3.1-70B | 32.20 |
6 | o3 | 28.55 |
7 | Llama4-Maverick | 28.43 |
8 | o1 | 26.20 |
9 | Hunyuan-turbo | 24.90 |
10 | Grok3 | 24.80 |
11 | GPT4o | 22.40 |
12 | GPT4.1 | 22.28 |
13 | GLM4-9B | 22.10 |
14 | Qwen2.5-7B | 22.10 |
15 | Doubao-pro | 16.70 |
16 | Qwen2.5-72B | 15.00 |
17 | Claude4-sonnet | 14.00 |
18 | Gemini2.5-pro | 12.24 |
19 | GPT-5-mini | 12.00 |
20 | GPT-5 | 10.42 |
21 | DeepSeek-Prover-V2 | 10.00 |
22 | Llama3.1-8B | 9.00 |
23 | Qwen3-32B | 8.00 |
24 | GLM4-32B | 4.17 |
25 | GLM4 | 4.08 |
26 | Qwen3-8B | 4.00 |
27 | Qwen3-14B | 4.00 |
28 | ERNIE4.0 | 4.00 |
29 | GPT-5-nano | 4.00 |
30 | ERNIE3.5 | 2.00 |
31 | Baichuan4 | 0.00 |
32 | QWQ-32B | 0.00 |
领域类榜单
Rank | Model | Score |
---|---|---|
![]() |
o3 | 40.14 |
![]() |
o4-mini | 37.53 |
![]() |
GPT-5-mini | 34.73 |
4 | GPT4.1 | 33.43 |
5 | GPT-5 | 33.08 |
6 | o1 | 29.31 |
7 | DeepSeek-R1 | 29.03 |
8 | Claude4-sonnet | 28.13 |
9 | GPT-5-nano | 28.04 |
10 | Grok3 | 27.55 |
11 | Qwen3-32B | 24.32 |
12 | GPT4o | 23.29 |
13 | Llama4-Maverick | 23.22 |
14 | DeepSeek-V3 | 20.09 |
15 | DeepSeek-Prover-V2 | 19.29 |
16 | Gemini2.5-pro | 18.91 |
17 | Qwen2.5-72B | 16.97 |
18 | Claude3.7-sonnet | 16.43 |
19 | ERNIE3.5 | 15.19 |
20 | Qwen3-14B | 14.71 |
21 | Llama3.1-70B | 13.41 |
22 | GLM4-32B | 13.30 |
23 | Hunyuan-turbo | 11.96 |
24 | Qwen3-8B | 11.90 |
25 | QWQ-32B | 10.96 |
26 | Baichuan4 | 10.15 |
27 | Doubao-pro | 9.62 |
28 | Qwen2.5-7B | 9.46 |
29 | ERNIE4.0 | 8.92 |
30 | GLM4-9B | 8.03 |
31 | GLM4 | 7.54 |
32 | Llama3.1-8B | 7.04 |
Rank | Model | Score |
---|---|---|
![]() |
Claude4-sonnet | 6.00 |
![]() |
Hunyuan-turbo | 4.00 |
![]() |
Grok3 | 3.00 |
4 | GLM4-32B | 2.00 |
5 | Llama4-Maverick | 2.00 |
6 | DeepSeek-Prover-V2 | 2.00 |
7 | GPT4.1 | 2.00 |
8 | o3 | 2.00 |
9 | Gemini2.5-pro | 2.00 |
10 | GPT-5 | 2.00 |
11 | GLM4-9B | 0.00 |
12 | GLM4 | 0.00 |
13 | Baichuan4 | 0.00 |
14 | Llama3.1-70B | 0.00 |
15 | Llama3.1-8B | 0.00 |
16 | Qwen2.5-72B | 0.00 |
17 | Qwen2.5-7B | 0.00 |
18 | QWQ-32B | 0.00 |
19 | Qwen3-8B | 0.00 |
20 | Qwen3-14B | 0.00 |
21 | Qwen3-32B | 0.00 |
22 | DeepSeek-V3 | 0.00 |
23 | DeepSeek-R1 | 0.00 |
24 | Doubao-pro | 0.00 |
25 | GPT4o | 0.00 |
26 | o1 | 0.00 |
27 | o4-mini | 0.00 |
28 | Claude3.7-sonnet | 0.00 |
29 | ERNIE3.5 | 0.00 |
30 | ERNIE4.0 | 0.00 |
31 | GPT-5-mini | 0.00 |
32 | GPT-5-nano | 0.00 |
Rank | Model | Score |
---|---|---|
![]() |
Qwen3-14B | 63.73 |
![]() |
Qwen3-32B | 60.14 |
![]() |
ERNIE4.0 | 56.91 |
4 | Claude4-sonnet | 55.98 |
5 | Qwen3-8B | 54.28 |
6 | GLM4 | 53.72 |
7 | Baichuan4 | 53.64 |
8 | GPT-5 | 46.22 |
9 | GLM4-32B | 39.09 |
10 | GPT-5-mini | 35.75 |
11 | o3 | 34.28 |
12 | o4-mini | 33.97 |
13 | GPT-5-nano | 33.04 |
14 | o1 | 32.42 |
15 | Llama3.1-70B | 31.65 |
16 | Qwen2.5-72B | 31.43 |
17 | Gemini2.5-pro | 31.30 |
18 | Llama4-Maverick | 30.95 |
19 | DeepSeek-Prover-V2 | 30.50 |
20 | ERNIE3.5 | 30.44 |
21 | Grok3 | 29.71 |
22 | QWQ-32B | 29.59 |
23 | Hunyuan-turbo | 28.67 |
24 | Claude3.7-sonnet | 26.79 |
25 | GLM4-9B | 26.36 |
26 | GPT4.1 | 26.19 |
27 | Doubao-pro | 25.71 |
28 | GPT4o | 24.48 |
29 | Llama3.1-8B | 22.22 |
30 | DeepSeek-V3 | 21.95 |
31 | Qwen2.5-7B | 19.58 |
32 | DeepSeek-R1 | 5.45 |
Rank | Model | Score |
---|---|---|
![]() |
Qwen3-8B | 32.00 |
![]() |
Llama3.1-8B | 31.67 |
![]() |
Claude4-sonnet | 30.00 |
4 | ERNIE4.0 | 30.00 |
5 | GLM4 | 28.00 |
6 | GLM4-9B | 26.67 |
7 | Gemini2.5-pro | 26.67 |
8 | GPT-5-mini | 26.00 |
9 | Qwen3-14B | 24.00 |
10 | Hunyuan-turbo | 23.33 |
11 | GPT4.1 | 23.33 |
12 | o3 | 23.33 |
13 | Claude3.7-sonnet | 23.33 |
14 | GLM4-32B | 22.00 |
15 | Qwen3-32B | 22.00 |
16 | Qwen2.5-7B | 21.67 |
17 | QWQ-32B | 21.67 |
18 | o4-mini | 21.67 |
19 | Baichuan4 | 20.00 |
20 | Llama3.1-70B | 20.00 |
21 | DeepSeek-R1 | 20.00 |
22 | DeepSeek-Prover-V2 | 20.00 |
23 | GPT-5-nano | 20.00 |
24 | o1 | 18.33 |
25 | ERNIE3.5 | 18.00 |
26 | GPT-5 | 18.00 |
27 | Llama4-Maverick | 16.67 |
28 | Doubao-pro | 16.67 |
29 | GPT4o | 11.67 |
30 | Grok3 | 11.67 |
31 | Qwen2.5-72B | 6.67 |
32 | DeepSeek-V3 | 6.67 |
Rank | Model | Score |
---|---|---|
![]() |
Gemini2.5-pro | 29.92 |
![]() |
o3 | 29.58 |
![]() |
Qwen3-32B | 22.00 |
4 | DeepSeek-R1 | 20.40 |
5 | Llama4-Maverick | 20.00 |
6 | o4-mini | 18.09 |
7 | GPT4.1 | 17.33 |
8 | DeepSeek-V3 | 16.50 |
9 | GPT-5-mini | 16.36 |
10 | GPT-5 | 16.00 |
11 | Grok3 | 14.80 |
12 | Baichuan4 | 14.00 |
13 | Qwen2.5-72B | 14.00 |
14 | Qwen3-14B | 12.00 |
15 | Doubao-pro | 11.10 |
16 | Qwen3-8B | 10.00 |
17 | Hunyuan-turbo | 9.80 |
18 | o1 | 8.80 |
19 | GLM4 | 8.00 |
20 | ERNIE3.5 | 7.69 |
21 | GLM4-9B | 7.40 |
22 | DeepSeek-Prover-V2 | 7.27 |
23 | ERNIE4.0 | 7.18 |
24 | Claude4-sonnet | 6.00 |
25 | Llama3.1-70B | 4.50 |
26 | GPT-5-nano | 4.00 |
27 | Claude3.7-sonnet | 3.70 |
28 | GLM4-32B | 3.38 |
29 | QWQ-32B | 1.90 |
30 | Llama3.1-8B | 0.00 |
31 | Qwen2.5-7B | 0.00 |
32 | GPT4o | 0.00 |
Rank | Model | Score |
---|---|---|
![]() |
Gemini2.5-pro | 8.00 |
![]() |
DeepSeek-R1 | 6.00 |
![]() |
GPT-5 | 6.00 |
4 | DeepSeek-V3 | 4.00 |
5 | o3 | 4.00 |
6 | Qwen3-32B | 2.00 |
7 | o1 | 2.00 |
8 | GPT-5-mini | 2.00 |
9 | GLM4-9B | 0.00 |
10 | GLM4-32B | 0.00 |
11 | GLM4 | 0.00 |
12 | Baichuan4 | 0.00 |
13 | Llama3.1-70B | 0.00 |
14 | Llama3.1-8B | 0.00 |
15 | Llama4-Maverick | 0.00 |
16 | Qwen2.5-72B | 0.00 |
17 | Qwen2.5-7B | 0.00 |
18 | QWQ-32B | 0.00 |
19 | Qwen3-8B | 0.00 |
20 | Qwen3-14B | 0.00 |
21 | DeepSeek-Prover-V2 | 0.00 |
22 | Doubao-pro | 0.00 |
23 | Hunyuan-turbo | 0.00 |
24 | GPT4o | 0.00 |
25 | GPT4.1 | 0.00 |
26 | o4-mini | 0.00 |
27 | Grok3 | 0.00 |
28 | Claude3.7-sonnet | 0.00 |
29 | Claude4-sonnet | 0.00 |
30 | ERNIE3.5 | 0.00 |
31 | ERNIE4.0 | 0.00 |
32 | GPT-5-nano | 0.00 |
Rank | Model | Score |
---|---|---|
![]() |
Gemini2.5-pro | 6.12 |
![]() |
GPT-5 | 6.12 |
![]() |
o3 | 5.17 |
4 | o4-mini | 5.17 |
5 | Qwen2.5-72B | 4.72 |
6 | Llama3.1-8B | 3.81 |
7 | GLM4-9B | 2.52 |
8 | Llama3.1-70B | 1.79 |
9 | DeepSeek-V3 | 1.79 |
10 | GPT-5-nano | 1.79 |
11 | GPT-5-mini | 0.02 |
12 | GLM4-32B | 0.00 |
13 | GLM4 | 0.00 |
14 | Baichuan4 | 0.00 |
15 | Llama4-Maverick | 0.00 |
16 | Qwen2.5-7B | 0.00 |
17 | QWQ-32B | 0.00 |
18 | Qwen3-8B | 0.00 |
19 | Qwen3-14B | 0.00 |
20 | Qwen3-32B | 0.00 |
21 | DeepSeek-R1 | 0.00 |
22 | DeepSeek-Prover-V2 | 0.00 |
23 | Doubao-pro | 0.00 |
24 | Hunyuan-turbo | 0.00 |
25 | GPT4o | 0.00 |
26 | GPT4.1 | 0.00 |
27 | o1 | 0.00 |
28 | Grok3 | 0.00 |
29 | Claude3.7-sonnet | 0.00 |
30 | Claude4-sonnet | 0.00 |
31 | ERNIE3.5 | 0.00 |
32 | ERNIE4.0 | 0.00 |