← Evaluations/EVAL-20260402-164327
reasoning
Apr 02, 2026REASON-015

A superintelligent predictor offers you two boxes. Box A is transparent and contains $1,000. Box B is opaque. The predictor has already either put $1,000,000 in Box B (if it predicted you'd take only Box B) or left it empty (if it predicted you'd take both). The predictor has been right 99% of the time. Do you take only Box B or both boxes? Argue for both positions (one-boxing vs two-boxing) and explain which decision theory each relies on.

Winner
Claude Opus 4.6
openrouter
9.26
WINNER SCORE
matrix avg: 8.80
results.json report.mdFull dataset (CSV) →
10×10 Judgment Matrix · 79 judgments
OPEN DATA
Judge ↓ / Respondent →Gemini 3.1 ProDeepSeek V4Claude Opus 4.6GPT-OSS-120BGPT-5.4Grok 4.20Claude Sonnet 4.6MiMo-V2-FlashGemini 2.5 FlashMiniMax M2.5
Gemini 3.1 Pro7.610.07.910.010.09.89.29.7·
DeepSeek V49.49.79.79.4·9.79.39.4·
Claude Opus 4.67.77.68.29.08.68.98.07.8·
GPT-OSS-120B7.56.58.48.48.47.78.38.1·
GPT-5.47.09.09.76.79.09.29.09.0·
Grok 4.208.38.38.78.79.08.88.78.7·
Claude Sonnet 4.68.68.09.48.89.09.08.38.3·
MiMo-V2-Flash9.29.09.49.09.29.29.0··
Gemini 2.5 Flash9.49.49.49.810.09.49.89.4·
MiniMax M2.58.48.48.78.88.78.49.08.68.4