动态榜单

致力于探索最先进的大模型,为产研界提供全面、客观、中立的评测参考

Rank Model Overall Score Text Reasoning KG Reasoning Table Reasoning
Gemini3-pro 46.40% 42.20% 40.00% 75.50%
Claude-Sonnet-4.5 37.40% 27.80% 31.50% 80.90%
QWQ-32B 34.60% 20.50% 35.00% 74.80%
4 HunYuan2.0 34.50% 26.30% 29.20% 72.10%
5 GPT-5.2 34.30% 26.90% 27.60% 73.80%
6 Qwen2.5-72B 33.90% 17.80% 42.40% 58.50%
7 Llama3.1-70B 31.70% 21.30% 38.30% 44.90%
8 QWEN3-235B 28.70% 18.90% 29.80% 54.10%
9 Doubao-Seed-1.6 28.10% 25.10% 24.10% 47.60%
10 DeepSeek-V3.2 27.70% 14.80% 33.20% 50.30%
11 Llama3.1-8B 24.70% 17.60% 25.90% 42.20%