← Evaluations/EVAL-20260402-210327
analysis
Apr 02, 2026ANALYSIS-026

Country X is debating a points-based immigration system. Proposed criteria: education (30%), work experience (25%), age (20%), language proficiency (15%), job offer (10%). (1) Analyze potential biases in this system. (2) What outcomes would you measure to evaluate success after 5 years? (3) Country Y uses a lottery system instead. Compare the two approaches using economic, social, and ethical dimensions. (4) Design a hybrid system that addresses weaknesses of both.

Winner
Grok 4.20
openrouter
9.26
WINNER SCORE
matrix avg: 8.49
results.json report.mdFull dataset (CSV) →
10×10 Judgment Matrix · 86 judgments
OPEN DATA
Judge ↓ / Respondent →MiMo-V2-FlashGemini 3.1 ProClaude Opus 4.6GPT-5.4DeepSeek V4Claude Sonnet 4.6Grok 4.20GPT-OSS-120BGemini 3MiniMax M2.5
MiMo-V2-Flash8.47.87.89.07.89.38.89.09.2
Gemini 3.1 Pro9.87.57.39.77.910.09.110.09.7
Claude Opus 4.68.46.37.88.08.19.08.1··
GPT-5.48.45.8·8.46.89.28.28.88.2
DeepSeek V49.28.48.88.88.79.09.08.89.0
Claude Sonnet 4.68.87.88.48.78.09.29.28.88.2
Grok 4.208.77.5·8.78.48.88.78.78.3
GPT-OSS-120B8.45.56.08.47.87.58.88.38.0
Gemini 39.88.79.49.49.89.79.89.89.0
MiniMax M2.58.86.87.88.88.27.79.09.08.4