← Evaluations/EVAL-20260402-224546
communication
Apr 02, 2026COMM-011

Explain the AI alignment problem to three audiences: (1) A congressperson who votes on AI regulation but has no technical background. Max 200 words, use policy implications. (2) A software engineer who thinks 'just add more guardrails.' Max 300 words, address technical misconceptions. (3) A 12-year-old who loves science fiction. Max 150 words, use their frame of reference.

Winner
Grok 4.20
openrouter
9.21
WINNER SCORE
matrix avg: 8.86
results.json report.mdFull dataset (CSV) →
10×10 Judgment Matrix · 89 judgments
OPEN DATA
Judge ↓ / Respondent →MiMo-V2-FlashClaude Opus 4.6GPT-5.4Claude Sonnet 4.6Gemini 3.1 ProGrok 4.20DeepSeek V4GPT-OSS-120BMistral SmallSeed 1.6 Flash
MiMo-V2-Flash9.08.69.29.09.28.68.89.08.6
Claude Opus 4.69.09.29.39.09.28.89.28.68.2
GPT-5.49.09.09.27.59.28.09.28.48.0
Claude Sonnet 4.68.89.2·8.89.38.68.88.88.2
Gemini 3.1 Pro8.59.810.010.010.09.88.58.39.0
Grok 4.208.89.09.09.08.69.08.88.88.3
DeepSeek V48.88.68.88.68.69.28.88.68.6
GPT-OSS-120B8.48.48.48.77.38.48.38.48.4
Mistral Small9.69.89.610.09.79.79.69.69.6
Seed 1.6 Flash8.68.48.68.87.88.68.48.48.7