← Evaluations/EVAL-20260402-230713
communication
Apr 02, 2026COMM-015

You're writing the same product announcement for three markets: (1) US tech audience (direct, data-driven, features-first), (2) Japanese enterprise audience (relationship-focused, indirect, hierarchy-aware), (3) German engineering audience (precision-focused, specification-heavy, skeptical of marketing). Write all three versions of a 200-word announcement for a new AI coding assistant.

Winner
Claude Sonnet 4.6
openrouter
9.23
WINNER SCORE
matrix avg: 8.66
results.json report.mdFull dataset (CSV) →
10×10 Judgment Matrix · 80 judgments
OPEN DATA
Judge ↓ / Respondent →Grok 4.20DeepSeek V4Claude Opus 4.6GPT-5.4Claude Sonnet 4.6Gemini 3.1 ProGPT-OSS-120BMiMo-V2-FlashMistral SmallSeed 1.6 Flash
Grok 4.208.38.6·9.27.7·8.89.08.8
DeepSeek V48.88.88.89.08.6·9.09.89.8
Claude Opus 4.69.38.99.29.87.3·9.29.68.9
GPT-5.48.06.58.48.23.3·8.88.07.5
Claude Sonnet 4.68.98.69.39.37.6·9.29.38.6
Gemini 3.1 Pro8.38.310.010.010.0·9.610.09.8
GPT-OSS-120B8.08.68.88.88.84.68.68.87.8
MiMo-V2-Flash9.29.08.89.09.67.3·9.69.0
Mistral Small9.49.69.89.69.89.3·9.89.6
Seed 1.6 Flash7.68.28.48.28.82.8·8.38.4