评测榜单

致力于探索最先进的大模型,为产研界提供全面、客观、中立的评测参考

Rank Model Overall Score WTQ PersonRelQA ReportFixer MedQA AffairQA BioTextQA MatTextQA PharmKGQA  ChineseLawFact VersiCode
Grok 3 55.82% 76.50% 4.70% 77.80% 49.00% 45.50% 80.00% 64.29% 42.11% 54.25% 64.00%
QWQ-32B 50.65% 70.50% 3.00% 32.30% 78.30% 45.00% 76.67% 62.38% 45.67% 69.00% 23.70%
Hunyuan-turbo 50.10% 55.10% 1.40% 2.20% 84.50% 43.00% 85.71% 60.95% 32.52% 83.87% 51.70%
4 Qwen 2.5-72B 50.02% 65.50% 2.50% 38.90% 59.50% 45.00% 81.43% 62.86% 38.09% 70.50% 35.90%
5 GPT-4o 48.49% 69.40% 3.20% 44.70% 59.00% 41.00% 43.81% 61.43% 39.23% 56.63% 66.50%
6 DeepSeek-R1-671B 47.36% 74.30% 6.80% 59.70% 48.00% 45.50% 33.81% 50.48% 31.37% 58.00% 65.60%
7 DeepSeek V3 45.83% 69.90% 2.60% 57.90% 59.50% 42.50% 55.71% 39.90% 39.04% 53.87% 37.40%
8 Llama3.1-70B 44.57% 47.70% 2.20% 24.20% 27.00% 40.00% 88.57% 71.43% 34.33% 59.38% 50.90%
9 Doubao-pro 44.54% 46.00% 0.00% 25.30% 53.00% 40.00% 83.33% 50.00% 27.14% 57.50% 63.10%
10 GLM4-9B 41.24% 39.10% 0.20% 6.60% 46.50% 38.50% 80.95% 58.10% 17.70% 66.25% 58.50%
11 Claude 3.7 Sonnet 38.48% 28.30% 0.50% 42.30% 46.00% 22.10% 78.10% 48.80% 40.10% 60.38% 18.20%
12 Qwen2.5-7B 32.93% 30.60% 0.50% 17.00% 34.50% 46.00% 50.95% 37.50% 31.55% 62.88% 17.80%
13 Llama3.1-8B 30.11% 35.70% 0.20% 2.50% 17.00% 42.00% 55.23% 55.98% 23.53% 57.13% 11.80%
14 Baichuan2-7B 24.80% 4.80% 0.00% 12.00% 20.00% 43.50% 51.43% 50.95% 21.43% 43.87% 0.00%
15 Baichuan2-13B 24.74% 14.80% 0.00% 13.60% 26.50% 37.00% 57.14% 22.86% 14.76% 56.63% 4.10%
Rank Model Overall Score
Grok 3 26.57%
OpenAI o1 26.17%
Hunyuan-turbo 23.64%
4 QWQ-32B 22.07%
5 DeepSeek-R1-671B 19.70%
6 Qwen2.5-72B 18.89%
7 GPT-4o 17.79%
8 Doubao-pro 16.46%
9 Llama3.1-70B 16.45%
10 Claude 3.7 Sonnet 15.62%
11 GLM4-9B 15.42%
12 DeepSeek-V3 13.68%
13 Llama3.1-8B 10.83%
14 Baichuan2-13B 10.58%
15 Qwen2.5-7B 9.59%
16 Baichuan2-7B 9.45%
Rank Model Text Reasoning MedicalQA BioQA MaterialQA ChineseLawFact
Hunyuan-turbo 78.76% 84.50% 85.71% 60.95% 83.87%
QWQ-32B 71.59% 78.30% 76.67% 62.38% 69.00%
Qwen 2.5-72B 68.57% 59.50% 81.43% 62.86% 70.50%
4 GLM4-9B 62.95% 46.50% 80.95% 58.10% 66.25%
5 Grok 3 61.89% 49.00% 80.00% 64.29% 54.25%
6 Llama3.1-70B 61.60% 27.00% 88.57% 71.43% 59.38%
7 Doubao-pro 60.96% 53.00% 83.33% 50.00% 57.50%
8 Claude 3.7 Sonnet 58.32% 46.00% 78.10% 48.80% 60.38%
9 GPT-4o 55.22% 59.00% 43.81% 61.43% 56.63%
10 DeepSeek V3 52.25% 59.50% 55.71% 39.90% 53.87%
11 DeepSeek-R1-671B 47.57% 48.00% 33.81% 50.48% 58.00%
12 Qwen2.5-7B 46.46% 34.50% 50.95% 37.50% 62.88%
13 Llama3.1-8B 46.34% 17.00% 55.23% 55.98% 57.13%
14 Baichuan2-7B 41.56% 20.00% 51.43% 50.95% 43.87%
15 Baichuan2-13B 40.78% 26.50% 57.14% 22.86% 56.63%
Rank Model Knowledge Graph Reasoning PersonQA Report PoliticalQA PharmKGQA 
Grok 3 42.53% 4.70% 77.80% 45.50% 42.11%
DeepSeek-R1-671B 35.84% 6.80% 59.70% 45.50% 31.37%
DeepSeek V3 35.51% 2.60% 57.90% 42.50% 39.04%
4 GPT-4o 32.03% 3.20% 44.70% 41.00% 39.23%
5 QWQ-32B 31.49% 3.00% 32.30% 45.00% 45.67%
6 Qwen 2.5-72B 31.12% 2.50% 38.90% 45.00% 38.09%
7 Claude 3.7 Sonnet 26.25% 0.50% 42.30% 22.10% 40.10%
8 Llama3.1-70B 25.18% 2.20% 24.20% 40.00% 34.33%
9 Qwen2.5-7B 23.76% 0.50% 17.00% 46.00% 31.55%
10 Doubao-pro 23.11% 0.00% 25.30% 40.00% 27.14%
11 Hunyuan-turbo 19.78% 1.40% 2.20% 43.00% 32.52%
12 Baichuan2-7B 19.23% 0.00% 12.00% 43.50% 21.43%
13 Llama3.1-8B 17.06% 0.20% 2.50% 42.00% 23.53%
14 Baichuan2-13B 16.34% 0.00% 13.60% 37.00% 14.76%
15 GLM4-9B 15.75% 0.20% 6.60% 38.50% 17.70%
Rank Model Table Reasoning
Grok 3 76.50%
DeepSeek-R1-671B 74.30%
QWQ-32B 70.50%
4 DeepSeek V3 69.90%
5 GPT-4o 69.40%
6 Qwen 2.5-72B 65.50%
7 Hunyuan-turbo 55.10%
8 Llama3.1-70B 47.70%
9 Doubao-pro 46.00%
10 GLM4-9B 39.10%
11 Llama3.1-8B 35.70%
12 Qwen2.5-7B 30.60%
13 Claude 3.7 Sonnet 28.30%
14 Baichuan2-13B 14.80%
15 Baichuan2-7B 4.80%
16 Baichuan2-7B 9.45%
Rank Model Code Reasoning
GPT-4o 66.50%
DeepSeek-R1-671B 65.60%
Grok 3 64.00%
4 Doubao-pro 63.10%
5 GLM4-9B 58.50%
6 Hunyuan-turbo 51.70%
7 Llama3.1-70B 50.90%
8 DeepSeek V3 37.40%
9 Qwen 2.5-72B 35.90%
10 QWQ-32B 23.70%
11 Claude 3.7 Sonnet 18.20%
12 Qwen2.5-7B 17.80%
13 Llama3.1-8B 11.80%
14 Baichuan2-13B 4.10%
15 Baichuan2-7B 0.00%
16 Baichuan2-7B 9.45%