Interactive comparison of leading foundation models — Updated 18 Sep 2025
Visual Explorer
Full Score Table
“—” means not credibly published as of 18 Sep 2025.
Benchmark
ChatGPT-5
Claude Opus 4.1
Claude Sonnet 4
OpenAI o3
GPT-4.1
Gemini 2.5 Pro
Llama 3.3 70B
Mistral Large 2
Notes: Scores are % unless noted. Setup varies by vendor (attempts, tool access, subsets); use as directional, not absolute. “No tools” = pure model without external tools.