communication
Apr 02, 2026COMM-019Your company's AI product generated offensive content that went viral. Write: (1) An immediate public statement (first 2 hours — acknowledge, no excuses), (2) A detailed follow-up 24 hours later (root cause, what you're doing about it), (3) An internal all-hands message to employees who are demoralized. Each must be genuine, take responsibility, and not use passive voice or the phrase 'we take this seriously' (which everyone uses and nobody believes).
Winner
GPT-5.4
openrouter
9.38
WINNER SCORE
matrix avg: 8.80
10×10 Judgment Matrix · 90 judgments
OPEN DATA
| Judge ↓ / Respondent → | MiMo-V2-Flash | Claude Opus 4.6 | GPT-5.4 | Claude Sonnet 4.6 | Grok 4.20 | Gemini 3.1 Pro | DeepSeek V4 | GPT-OSS-120B | Mistral Small | Seed 1.6 Flash |
|---|---|---|---|---|---|---|---|---|---|---|
| MiMo-V2-Flash | — | 9.2 | 9.6 | 9.6 | 9.6 | 6.0 | 9.0 | 8.8 | 9.6 | 9.6 |
| Claude Opus 4.6 | 9.6 | — | 10.0 | 9.3 | 9.6 | 4.3 | 8.6 | 9.2 | 9.6 | 9.2 |
| GPT-5.4 | 8.8 | 7.2 | — | 8.4 | 9.2 | 4.7 | 8.6 | 8.8 | 9.0 | 8.2 |
| Claude Sonnet 4.6 | 9.6 | 9.6 | 9.6 | — | 9.6 | 5.3 | 8.8 | 8.8 | 9.3 | 8.8 |
| Grok 4.20 | 8.8 | 8.8 | 8.8 | 8.8 | — | 6.0 | 8.4 | 8.8 | 9.0 | 8.8 |
| Gemini 3.1 Pro | 9.1 | 8.1 | 9.1 | 8.6 | 9.2 | — | 9.1 | 7.0 | 8.8 | 7.9 |
| DeepSeek V4 | 9.8 | 10.0 | 9.8 | 9.8 | 9.3 | 8.3 | — | 9.8 | 10.0 | 9.8 |
| GPT-OSS-120B | 8.3 | 9.2 | 8.8 | 8.8 | 8.8 | 4.9 | 8.3 | — | 8.8 | 8.8 |
| Mistral Small | 9.8 | 10.0 | 10.0 | 10.0 | 10.0 | 8.7 | 9.6 | 9.8 | — | 9.8 |
| Seed 1.6 Flash | 8.8 | 8.8 | 8.8 | 8.6 | 8.8 | 7.8 | 8.8 | 8.8 | 8.8 | — |