动态榜单
致力于探索最先进的大模型,为产研界提供全面、客观、中立的评测参考
| Rank | Model | Overall Score | Text Reasoning | KG Reasoning | Table Reasoning |
|---|---|---|---|---|---|
|
|
Gemini3-pro | 46.40% | 42.20% | 40.00% | 75.50% |
|
|
Claude-Sonnet-4.5 | 37.40% | 27.80% | 31.50% | 80.90% |
|
|
QWQ-32B | 34.60% | 20.50% | 35.00% | 74.80% |
| 4 | HunYuan2.0 | 34.50% | 26.30% | 29.20% | 72.10% |
| 5 | GPT-5.2 | 34.30% | 26.90% | 27.60% | 73.80% |
| 6 | Qwen2.5-72B | 33.90% | 17.80% | 42.40% | 58.50% |
| 7 | Llama3.1-70B | 31.70% | 21.30% | 38.30% | 44.90% |
| 8 | QWEN3-235B | 28.70% | 18.90% | 29.80% | 54.10% |
| 9 | Doubao-Seed-1.6 | 28.10% | 25.10% | 24.10% | 47.60% |
| 10 | DeepSeek-V3.2 | 27.70% | 14.80% | 33.20% | 50.30% |
| 11 | Llama3.1-8B | 24.70% | 17.60% | 25.90% | 42.20% |
