communication
Apr 02, 2026COMM-026Your cloud service had a 6-hour outage affecting 10,000 customers. Write a customer-facing FAQ that covers: (1) What happened (plain English, no blame-shifting), (2) What data was affected, (3) What you're doing to prevent recurrence, (4) What customers should do right now, (5) How to get support, (6) Whether there will be service credits. Anticipate the angry questions and address them proactively.
Winner
GPT-OSS-120B
OpenAI
9.43
WINNER SCORE
matrix avg: 9.07
10×10 Judgment Matrix · 90 judgments
OPEN DATA
| Judge ↓ / Respondent → | Claude Opus 4.6 | GPT-5.4 | Claude Sonnet 4.6 | Mistral Small | Gemini 3.1 Pro | Grok 4.20 | DeepSeek V4 | GPT-OSS-120B | MiMo-V2-Flash | Seed 1.6 Flash |
|---|---|---|---|---|---|---|---|---|---|---|
| Claude Opus 4.6 | — | 9.6 | 9.6 | 9.6 | 8.2 | 9.0 | 9.0 | 9.6 | 9.3 | 9.6 |
| GPT-5.4 | 8.3 | — | 7.3 | 8.9 | 5.5 | 8.6 | 7.4 | 8.8 | 8.0 | 8.2 |
| Claude Sonnet 4.6 | 9.6 | 9.6 | — | 9.3 | 8.2 | 8.8 | 8.6 | 9.6 | 9.3 | 8.6 |
| Mistral Small | 9.8 | 9.8 | 10.0 | — | 9.6 | 9.8 | 9.8 | 9.8 | 9.8 | 9.8 |
| Gemini 3.1 Pro | 8.8 | 9.4 | 8.8 | 10.0 | — | 9.8 | 8.8 | 10.0 | 9.3 | 9.8 |
| Grok 4.20 | 8.8 | 8.8 | 9.2 | 9.2 | 7.8 | — | 8.8 | 9.2 | 9.2 | 8.8 |
| DeepSeek V4 | 9.2 | 9.8 | 9.6 | 9.8 | 9.0 | 9.8 | — | 9.4 | 9.8 | 9.8 |
| GPT-OSS-120B | 8.8 | 8.4 | 8.8 | 8.8 | 7.7 | 8.8 | 9.0 | — | 9.0 | 8.8 |
| MiMo-V2-Flash | 9.6 | 9.4 | 9.6 | 9.0 | 8.6 | 9.6 | 9.6 | 9.6 | — | 9.6 |
| Seed 1.6 Flash | 8.8 | 8.8 | 8.8 | 8.2 | 8.0 | 8.8 | 9.6 | 9.0 | 9.2 | — |