← Evaluations/EVAL-20260402-234741
communication
Apr 02, 2026COMM-024

Rewrite these technical feature descriptions as customer-facing value propositions: (1) 'We use a distributed event-driven architecture with CQRS.' (2) 'Our model achieves 0.94 F1 score on the benchmark.' (3) 'Built on Kubernetes with auto-scaling and 99.99% SLA.' (4) 'End-to-end encryption with AES-256 and RSA key exchange.' (5) 'Sub-100ms p99 latency with edge caching.' For each, the customer should understand WHY they should care, not HOW it works.

Winner
Claude Sonnet 4.6
openrouter
9.38
WINNER SCORE
matrix avg: 9.08
results.json report.mdFull dataset (CSV) →
10×10 Judgment Matrix · 90 judgments
OPEN DATA
Judge ↓ / Respondent →DeepSeek V4Claude Opus 4.6GPT-OSS-120BGPT-5.4Claude Sonnet 4.6Gemini 3.1 ProGrok 4.20MiMo-V2-FlashMistral SmallSeed 1.6 Flash
DeepSeek V49.89.89.69.89.89.69.09.89.8
Claude Opus 4.68.69.09.09.69.29.09.09.65.8
GPT-OSS-120B8.49.28.48.88.68.88.88.88.8
GPT-5.48.88.68.39.37.79.08.68.67.5
Claude Sonnet 4.68.99.69.28.89.29.29.39.28.3
Gemini 3.1 Pro9.810.09.89.810.09.89.89.67.0
Grok 4.209.09.08.88.49.09.09.09.08.4
MiMo-V2-Flash9.09.39.38.89.39.88.68.69.0
Mistral Small9.89.89.89.89.89.89.89.89.8
Seed 1.6 Flash8.89.08.68.68.89.08.88.79.3