WebDev
90
模型 · 381K 票
Text
661
模型 · Top 30 展示
Vision
131
模型 · 1M 票
Agent
27
模型 · 755K sessions
Text-to-Image
70
模型 · 5.4M 票
🏆 WebDev Leaderboard · Top 50
网页开发任务(含多步推理 + 工具调用的 Agent 编码工作流)· 数据日期 2026-06-16 · 90 模型 · 381,168 票
3 个模板Overall / HTML / React7 领域筛选置信区间 / 对战胜负 / A对B
| # | 模型 | 提供方 | 分数 ± CI | 投票数 | $/M 输入/输出 | 上下文 |
|---|---|---|---|---|---|---|
| 1 | claude-fable-5 | Anthropic | 1654±16 | 2.1K | $10 / $50 | 1M |
| 2 | glm-5.2 (max) | Z.ai | 1595±16 | 1.6K | $1.4 / $4.4 | 1M |
| 3 | claude-opus-4-7-thinking | Anthropic | 1566±8 | 7.4K | $5 / $25 | 1M |
| 4 | claude-opus-4-8-thinking | Anthropic | 1561±13 | 2.6K | $5 / $25 | 1M |
| 5 | claude-opus-4-7 | Anthropic | 1556±8 | 6.8K | $5 / $25 | 1M |
| 6 | claude-opus-4-6-thinking | Anthropic | 1541±7 | 9.9K | $5 / $25 | 1M |
| 7 | claude-opus-4-8 | Anthropic | 1541±12 | 3.0K | $5 / $25 | 1M |
| 8 | claude-opus-4-6 | Anthropic | 1538±7 | 11.0K | $5 / $25 | 1M |
| 9 | glm-5.1 | Z.ai | 1531±11 | 3.6K | $1.4 / $4.4 | 202.8K |
| 10 | qwen3.7-max-20260517 | Alibaba | 1531±11 | 3.4K | $1.25 / $3.75 | 1M |
| 11 | claude-sonnet-4-6 | Anthropic | 1522±6 | 13.2K | $3 / $15 | 1M |
| 12 | kimi-k2.6 | Moonshot | 1513±9 | 5.6K | $0.95 / $4 | 262.1K |
| 13 | minimax-m3 | MiniMax | 1511±12 | 2.8K | $0.6 / $2.4 | — |
| 14 | muse-spark | Meta | 1507±16 | 1.6K | — / — | — |
| 15 | gemini-3.5-flash | 1506±13 | 2.2K | $1.5 / $9 | 1M | |
| 16 | gpt-5.5-xhigh (codex-harness) | OpenAI | 1502±9 | 6.1K | — / — | — |
| 17 | claude-opus-4-5-20251101-thinking-32k | Anthropic | 1490±7 | 13.1K | $5 / $25 | 200K |
| 18 | qwen3.6-max-preview | Alibaba | 1484±12 | 2.5K | $1.04 / $6.24 | 262.1K |
| 19 | gpt-5.5-high (codex-harness) | OpenAI | 1483±8 | 6.3K | — / — | — |
| 20 | kimi-k2.7-code | Moonshot | 1478±14 | 1.9K | $0.74 / $3.5 | 262.1K |
| 21 | mimo-v2.5-pro | Xiaomi | 1470±8 | 6.5K | $0.43 / $0.87 | 1M |
| 22 | claude-opus-4-5-20251101 | Anthropic | 1466±6 | 15.3K | $5 / $25 | 200K |
| 23 | qwen3.6-plus | Alibaba | 1462±7 | 8.0K | $0.33 / $1.95 | 1M |
| 24 | deepseek-v4-pro-thinking | DeepSeek | 1459±9 | 5.9K | $0.43 / $0.87 | 1M |
| 25 | gpt-5.4-high (codex-harness) | OpenAI | 1457±17 | 1.5K | $2.5 / $15 | 1.1M |
| 26 | gpt-5.5 (codex-harness) | OpenAI | 1450±8 | 6.1K | — / — | — |
| 27 | gemini-3.1-pro-preview | 1447±6 | 12.4K | $2 / $12 | 1M | |
| 28 | glm-4.7 | Z.ai | 1440±10 | 4.9K | $0.4 / $1.75 | 202.8K |
| 29 | gemini-3-pro | 1439±7 | 17.2K | $2 / $12 | 1M | |
| 30 | gpt-5.4-medium (codex-harness) | OpenAI | 1437±16 | 1.4K | $2.5 / $15 | 1.1M |
| 31 | gemini-3-flash | 1437±7 | 13.3K | $0.5 / $3 | 1M | |
| 32 | glm-5 | Z.ai | 1435±8 | 6.6K | — / — | — |
| 33 | mimo-v2.5 | Xiaomi | 1433±9 | 5.6K | — / — | — |
| 34 | mimo-v2-pro | Xiaomi | 1432±8 | 6.8K | — / — | — |
| 35 | kimi-k2.5-thinking | Moonshot | 1430±6 | 12.4K | — / — | — |
| 36 | kimi-k2.5-instant | Moonshot | 1408±11 | 3.6K | — / — | — |
| 37 | gpt-5.3-codex (codex-harness) | OpenAI | 1407±12 | 3.0K | — / — | — |
| 38 | gpt-5.2 | OpenAI | 1405±17 | 1.5K | — / — | — |
| 39 | gpt-5.4-mini-high | OpenAI | 1398±8 | 7.3K | — / — | — |
| 40 | gpt-5.4 | OpenAI | 1398±30 | 406 | — / — | — |
| 41 | gpt-5-medium | OpenAI | 1394±13 | 3.8K | — / — | — |
| 42 | qwen3.5-397b-a17b | Alibaba | 1394±6 | 11.6K | — / — | — |
| 43 | minimax-m2.7 | MiniMax | 1394±7 | 7.9K | — / — | — |
| 44 | minimax-m2.1-preview | MiniMax | 1392±8 | 9.3K | — / — | — |
| 45 | gpt-5.1-medium | OpenAI | 1392±9 | 6.1K | — / — | — |
| 46 | claude-sonnet-4-5-20250929-thinking-32k | Anthropic | 1388±7 | 15.7K | — / — | — |
| 47 | gemini-3-flash (thinking-minimal) | 1388±5 | 18.4K | — / — | — | |
| 48 | grok-4.20-beta-0309-reasoning | xAI | 1387±7 | 9.1K | — / — | — |
| 49 | claude-opus-4-1-20250805 | Anthropic | 1386±9 | 8.6K | — / — | — |
| 50 | claude-sonnet-4-5-20250929 | Anthropic | 1386±6 | 18.4K | — / — | — |
* 仅展示 Top 50,完整 90 模型请见原站
关于数据:Arena.ai 排行榜基于人类对模型两两盲投结果,由 LMSYS Chatbot Arena 团队运营,是 LLM 评测领域公信力最高的 ELO 排名之一。 所有分数均含 95% 置信区间,投票数越大分数越稳定。本页数据为快照,更新频率建议月级(如需自动抓取可加 cron)。