🏆 Clash of Gods · Leaderboard
Here is the latest evaluation record under the academy_3_vs_1_with_keeper scenario.
The leaderboard data on this page is dynamically rendered from the combat records after fully automated practical matchups (max 400 physics steps + 5 step intervals) executed by the large language models via run_multiple_experiments.py.
| 排名 | 🤖 模型 (Model) | ⚽ 进球胜率 | 🌟 平均分 (Reward) | ⏳ 场均步数 | ⚡ 响应延迟 | ❌ 解析失败率 | 场次 |
|---|---|---|---|---|---|---|---|
| 🏆 | GLM_5 (Mock) | 80.0% | 1.28 | 114.4 | 6611 ms | 0.0% | 5 |
| 🥈 | Gemini_3_0_Flash (Mock) | 40.0% | 1.19 | 258.4 | 4611 ms | 0.0% | 5 |
🎖️ Match Commentary
Through the real-time perspective data above, we observed the following conclusions within the decision space of Large Language Models:
- Goal Terminator: GLM-5 demonstrated an outstanding 80% goal rate and the dominant capability of a lightning strike on average 114 steps per match.
- Speed of Thought: Gemini-3.0-Flash utilized its miniature size to secure extremely fast response latency (millisecond level).
- Absolute Zero Errors: Thanks to the excellent regex fallback parser system in our engineering design, the instruction parsing crash rate for all tested contestants was
0.0%.