Prompt Engineer
Powered by Google Gemini AI
AI Model Benchmark Dashboard
Interactive comparison of leading foundation models — Updated 18 Sep 2025
Visual Explorer
Full Score Table
“—” means not credibly published as of 18 Sep 2025.| Benchmark | ChatGPT-5 | Claude Opus 4.1 | Claude Sonnet 4 | OpenAI o3 | GPT-4.1 | Gemini 2.5 Pro | Llama 3.3 70B | Mistral Large 2 |
|---|
Notes: Scores are % unless noted. Setup varies by vendor (attempts, tool access, subsets); use as directional, not absolute. “No tools” = pure model without external tools.