Agent Leaderboard

仅展示基于公开 benchmark 的可追溯分数;来源链接见 agents-benchmark-sources.md。

排名智能体供应商运行形态综合分办公代码研究工具调用稳定性速度性价比分
#1Claude Opus 4.6
anthropicanthropic
API/Agent72.772.7-----72.7
#2Claude Sonnet 4.6
anthropicanthropic
API/Agent72.572.5-----72.5
#3Qwen3 VL 235B A22B Instruct
qwenqwen
API/Agent66.766.7-----66.7
#4Claude 3.5 Sonnet (20241022)
anthropicanthropic
API/Agent57.6----57.6-57.6
#5GPT-4o
openaiopenai
API/Agent51.2-8.1--51.2-29.7
#6GPT-4 (0613)
openaiopenai
API/Agent44.0-------
#7o1
openaiopenai
API/Agent28.4-28.4----28.4