← Evaluations/EVAL-20260402-235645
communication
Apr 02, 2026COMM-026

Your cloud service had a 6-hour outage affecting 10,000 customers. Write a customer-facing FAQ that covers: (1) What happened (plain English, no blame-shifting), (2) What data was affected, (3) What you're doing to prevent recurrence, (4) What customers should do right now, (5) How to get support, (6) Whether there will be service credits. Anticipate the angry questions and address them proactively.

Winner
GPT-OSS-120B
OpenAI
9.43
WINNER SCORE
matrix avg: 9.07
results.json report.mdFull dataset (CSV) →
10×10 Judgment Matrix · 90 judgments
OPEN DATA
Judge ↓ / Respondent →Claude Opus 4.6GPT-5.4Claude Sonnet 4.6Mistral SmallGemini 3.1 ProGrok 4.20DeepSeek V4GPT-OSS-120BMiMo-V2-FlashSeed 1.6 Flash
Claude Opus 4.69.69.69.68.29.09.09.69.39.6
GPT-5.48.37.38.95.58.67.48.88.08.2
Claude Sonnet 4.69.69.69.38.28.88.69.69.38.6
Mistral Small9.89.810.09.69.89.89.89.89.8
Gemini 3.1 Pro8.89.48.810.09.88.810.09.39.8
Grok 4.208.88.89.29.27.88.89.29.28.8
DeepSeek V49.29.89.69.89.09.89.49.89.8
GPT-OSS-120B8.88.48.88.87.78.89.09.08.8
MiMo-V2-Flash9.69.49.69.08.69.69.69.69.6
Seed 1.6 Flash8.88.88.88.28.08.89.69.09.2