← Evaluations/EVAL-20260402-232602
communication
Apr 02, 2026COMM-019

Your company's AI product generated offensive content that went viral. Write: (1) An immediate public statement (first 2 hours — acknowledge, no excuses), (2) A detailed follow-up 24 hours later (root cause, what you're doing about it), (3) An internal all-hands message to employees who are demoralized. Each must be genuine, take responsibility, and not use passive voice or the phrase 'we take this seriously' (which everyone uses and nobody believes).

Winner
GPT-5.4
openrouter
9.38
WINNER SCORE
matrix avg: 8.80
results.json report.mdFull dataset (CSV) →
10×10 Judgment Matrix · 90 judgments
OPEN DATA
Judge ↓ / Respondent →MiMo-V2-FlashClaude Opus 4.6GPT-5.4Claude Sonnet 4.6Grok 4.20Gemini 3.1 ProDeepSeek V4GPT-OSS-120BMistral SmallSeed 1.6 Flash
MiMo-V2-Flash9.29.69.69.66.09.08.89.69.6
Claude Opus 4.69.610.09.39.64.38.69.29.69.2
GPT-5.48.87.28.49.24.78.68.89.08.2
Claude Sonnet 4.69.69.69.69.65.38.88.89.38.8
Grok 4.208.88.88.88.86.08.48.89.08.8
Gemini 3.1 Pro9.18.19.18.69.29.17.08.87.9
DeepSeek V49.810.09.89.89.38.39.810.09.8
GPT-OSS-120B8.39.28.88.88.84.98.38.88.8
Mistral Small9.810.010.010.010.08.79.69.89.8
Seed 1.6 Flash8.88.88.88.68.87.88.88.88.8