← Evaluations/EVAL-20260207-134919
reasoning
Mar 18, 2026REASON-010

You're a consultant charging $500/hour. A client asks you to find the optimal solution to a complex problem. You estimate: - A quick solution (1 hour) has 60% chance of being optimal - More analysis (5 hours) has 90% chance of being optimal - The optimal solution saves the client $50,000 vs. the suboptimal one 1. How much analysis should you do? 2. How would your answer change if you charged $1000/hour? 3. What if you were doing this for yourself (no billing)? 4. Generalize: derive a formula for optimal thinking time given problem stakes and thinking cost

Winner
MiMo-V2-Flash
Xiaomi
9.73
WINNER SCORE
matrix avg: 8.66
results.json report.mdFull dataset (CSV) →
10×10 Judgment Matrix · 100 judgments
OPEN DATA
Judge ↓ / Respondent →MiMo-V2-FlashGemini 3GPT-OSS-120BGemini 3Claude Sonnet 4.5DeepSeek V3.2Gemini 2.5 FlashClaude Opus 4.5OLMo ThinkGrok 3 (Direct)
MiMo-V2-Flash2.810.09.69.39.39.69.28.09.8
Gemini 30.00.00.00.00.00.00.00.00.0
GPT-OSS-120B0.00.08.89.40.00.08.80.09.1
Gemini 310.06.210.010.010.010.010.00.010.0
Claude Sonnet 4.59.82.69.89.29.09.39.20.59.8
DeepSeek V3.29.84.79.49.49.49.69.48.09.4
Gemini 2.5 Flash10.09.310.010.010.010.010.010.010.0
Claude Opus 4.59.21.69.29.29.29.78.40.39.2
OLMo Think0.00.00.00.00.010.00.00.00.0
Grok 3 (Direct)9.64.89.79.69.29.29.49.47.0