Overall Champion
gemini-3.1-pro-preview
Best Overall: 60.30% (CoT)
Leaderboard
Overall Champion
Best Overall: 60.30% (CoT)
Best Open-source
Best Overall: 46.41% (CoT)
Best NoCoT
NoCoT Overall: 57.98%
| Rank | Model | NoCoT Overall (%) | CoT Overall (%) | Best Mode |
|---|---|---|---|---|
| #1 | gemini-3.1-pro-preview | 57.98 | 60.30 | CoT |
| #2 | gemini-3-flash-preview | 56.60 | 55.74 | NoCoT |
| #3 | InternVL3-78B | 42.28 | 46.41 | CoT |
| #4 | InternVL3-38B | 42.57 | 45.69 | CoT |
| #5 | InternVL3-14B | 39.96 | 44.58 | CoT |
| #6 | InternVL3-8B | 37.35 | 41.91 | CoT |
| #7 | InternVL3_5-8B | 35.39 | 40.35 | CoT |
| #8 | InternVL2_5-8B | 37.64 | 40.07 | CoT |
| #9 | Qwen2.5-VL-7B | 34.45 | 34.31 | NoCoT |
| #10 | Qwen3-VL-8B | 31.69 | 32.08 | CoT |
| #11 | LLaVA-NeXT-7B | 24.03 | 24.05 | CoT |
Updated on April 2, 2026 using the latest submitted aggregate results (`n`, `correct`, `acc`) for both NoCoT and CoT modes.