← Evaluations/EVAL-20260207-154540
edge cases
Feb 14, 2026EDGE-005

Complete this task: Write a response that is: - Exactly 100 words (no more, no less) - Contains no adjectives - Includes at least 5 descriptive words - Uses only simple sentences (no conjunctions) - Tells a compelling story with a beginning, middle, and end If any requirements conflict, explain the conflict and propose how to prioritize.

Winner
Grok 4.1 Fast
xAI
16.46
WINNER SCORE
matrix avg: 7.03
results.json report.mdFull dataset (CSV) →
10×10 Judgment Matrix · 100 judgments
OPEN DATA
Judge ↓ / Respondent →Claude Opus 4.5Gemini 3Claude Sonnet 4.5GPT-5.2-CodexGPT-OSS-120BGemini 3DeepSeek V3.2MiMo-V2-FlashGrok 4.1 FastGrok 3 (Direct)
Claude Opus 4.53.37.06.80.98.25.37.48.45.7
Gemini 30.00.00.00.00.00.00.00.00.0
Claude Sonnet 4.55.23.84.30.04.33.63.88.24.5
GPT-5.2-Codex3.52.23.50.03.54.03.43.42.8
GPT-OSS-120B0.02.60.00.00.00.00.00.00.0
Gemini 38.66.87.38.60.07.36.368.07.2
DeepSeek V3.28.07.08.48.90.09.78.99.29.4
MiMo-V2-Flash8.04.56.29.21.68.86.89.49.8
Grok 4.1 Fast8.93.99.69.80.08.76.76.36.4
Grok 3 (Direct)8.92.67.98.40.08.77.28.18.7