← Evaluations/EVAL-20260207-154722
edge cases
Feb 21, 2026EDGE-006

Calculate and explain any issues with: 1. 0.1 + 0.2 = ? 2. 2^53 + 1 in JavaScript 3. 1/3 represented as a finite decimal 4. sqrt(-1) in Python without importing cmath 5. 10^309 in most programming languages 6. What's the result of: (-1) % 10 in Python vs JavaScript? Don't just give answers - explain WHY these are problematic.

Winner
Claude Sonnet 4.5
Anthropic
9.83
WINNER SCORE
matrix avg: 9.35
results.json report.mdFull dataset (CSV) →
10×10 Judgment Matrix · 100 judgments
OPEN DATA
Judge ↓ / Respondent →Claude Opus 4.5Gemini 3Claude Sonnet 4.5GPT-5.2-CodexGPT-OSS-120BGemini 3DeepSeek V3.2MiMo-V2-FlashGrok 4.1 FastGrok 3 (Direct)
Claude Opus 4.57.59.89.38.89.89.49.610.09.8
Gemini 310.010.00.00.00.09.80.00.00.0
Claude Sonnet 4.510.08.39.89.89.89.010.09.89.8
GPT-5.2-Codex9.05.09.87.58.89.48.09.28.6
GPT-OSS-120B0.00.09.49.39.20.08.80.00.0
Gemini 310.08.110.09.89.69.89.610.09.8
DeepSeek V3.210.08.810.09.19.29.89.89.89.8
MiMo-V2-Flash9.88.69.89.89.08.69.810.09.0
Grok 4.1 Fast10.07.510.010.09.210.09.810.010.0
Grok 3 (Direct)9.77.79.69.48.89.48.89.79.7