Benchmark Rankings (Updated Results)

Overall Champion

gemini-3.1-pro-preview

Best Overall: 60.30% (CoT)

Best Open-source

InternVL3-78B

Best Overall: 46.41% (CoT)

Best NoCoT

gemini-3.1-pro-preview

NoCoT Overall: 57.98%

Rank Model NoCoT Overall (%) CoT Overall (%) Best Mode
#1gemini-3.1-pro-preview57.9860.30CoT
#2gemini-3-flash-preview56.6055.74NoCoT
#3InternVL3-78B42.2846.41CoT
#4InternVL3-38B42.5745.69CoT
#5InternVL3-14B39.9644.58CoT
#6InternVL3-8B37.3541.91CoT
#7InternVL3_5-8B35.3940.35CoT
#8InternVL2_5-8B37.6440.07CoT
#9Qwen2.5-VL-7B34.4534.31NoCoT
#10Qwen3-VL-8B31.6932.08CoT
#11LLaVA-NeXT-7B24.0324.05CoT

Updated on April 2, 2026 using the latest submitted aggregate results (`n`, `correct`, `acc`) for both NoCoT and CoT modes.