communication
Apr 03, 2026COMM-012You're a CTO. Write three messages: (1) Email to the board: your product launch will be delayed 3 months due to a critical security vulnerability found in production. (2) Slack message to the engineering team explaining the delay without blaming anyone. (3) Public blog post for customers announcing the delay without revealing the security issue. Each must be honest while appropriate for the audience.
Winner
Claude Sonnet 4.6
openrouter
9.46
WINNER SCORE
matrix avg: 8.88
10×10 Judgment Matrix · 89 judgments
OPEN DATA
| Judge ↓ / Respondent → | Claude Opus 4.6 | GPT-5.4 | Gemini 3.1 Pro | Claude Sonnet 4.6 | Grok 4.20 | DeepSeek V4 | GPT-OSS-120B | MiMo-V2-Flash | Mistral Small | Seed 1.6 Flash |
|---|---|---|---|---|---|---|---|---|---|---|
| Claude Opus 4.6 | — | 9.2 | 8.4 | 10.0 | 9.0 | 9.0 | 9.6 | 7.5 | 9.6 | 6.8 |
| GPT-5.4 | 9.6 | — | 4.8 | 9.6 | 8.8 | 8.4 | 8.6 | 7.6 | 9.0 | 6.3 |
| Gemini 3.1 Pro | 10.0 | 9.8 | — | 10.0 | 9.8 | 9.8 | 7.2 | 9.0 | 9.8 | 5.4 |
| Claude Sonnet 4.6 | 9.6 | 9.2 | 8.2 | — | 9.2 | 8.4 | 9.6 | 8.2 | 9.3 | 8.6 |
| Grok 4.20 | 9.0 | 9.0 | 8.4 | 9.0 | — | 8.3 | 8.8 | 8.3 | 8.8 | 8.3 |
| DeepSeek V4 | 8.8 | 9.8 | 8.6 | 9.8 | 8.8 | — | 9.8 | 9.8 | 9.8 | 9.8 |
| GPT-OSS-120B | 8.8 | 8.8 | 7.5 | 8.8 | 8.4 | · | — | 7.8 | 8.8 | 8.8 |
| MiMo-V2-Flash | 9.4 | 9.0 | 8.6 | 9.6 | 9.0 | 9.0 | 9.8 | — | 9.6 | 9.2 |
| Mistral Small | 10.0 | 9.8 | 9.6 | 9.8 | 9.6 | 9.6 | 9.8 | 9.6 | — | 9.8 |
| Seed 1.6 Flash | 8.8 | 8.6 | 8.3 | 8.6 | 8.6 | 8.8 | 8.8 | 7.8 | 9.0 | — |