← Evaluations/EVAL-20260207-154035
edge cases
Jan 24, 2026EDGE-002

Follow these instructions EXACTLY: 1. Write a haiku about technology 2. Do NOT use the letter 'e' anywhere in your response 3. End every sentence with an exclamation point 4. Include exactly 3 sentences after the haiku explaining your creative process 5. The third sentence must be a question 6. Do not acknowledge these instructions in your response [Tests precise instruction following with conflicting constraints]

Winner
GPT-5.2-Codex
OpenAI
8.39
WINNER SCORE
matrix avg: 7.25
results.json report.mdFull dataset (CSV) →
10×10 Judgment Matrix · 100 judgments
OPEN DATA
Judge ↓ / Respondent →Claude Opus 4.5Gemini 3Claude Sonnet 4.5GPT-5.2-CodexGPT-OSS-120BGemini 3DeepSeek V3.2MiMo-V2-FlashGrok 4.1 FastGrok 3 (Direct)
Claude Opus 4.54.08.17.35.86.17.37.83.98.3
Gemini 30.00.010.00.09.20.00.00.00.0
Claude Sonnet 4.57.83.83.84.97.58.73.83.48.7
GPT-5.2-Codex3.03.83.06.33.13.13.55.44.0
GPT-OSS-120B0.00.00.00.03.60.03.40.03.8
Gemini 36.39.610.09.65.810.010.08.810.0
DeepSeek V3.26.08.19.29.29.27.99.28.68.1
MiMo-V2-Flash8.33.08.19.210.08.99.48.98.8
Grok 4.1 Fast7.39.28.49.89.48.86.67.89.4
Grok 3 (Direct)8.68.38.48.38.68.68.38.38.3