← Evaluations/EVAL-20260402-140852
code
Apr 02, 2026CODE-021

Write a Python function that parses unified diff format (the output of `git diff`) and returns a structured representation: files changed, lines added/removed/modified, hunks with context. Handle edge cases: binary files, renamed files, mode changes, and empty diffs. Include tests with real diff examples.

Winner
Gemini 3 Flash Preview
Google
8.03
WINNER SCORE
matrix avg: 6.01
results.json report.mdFull dataset (CSV) →
10×10 Judgment Matrix · 74 judgments
OPEN DATA
Judge ↓ / Respondent →GPT-5.4Claude Opus 4.6Gemini 3.1 ProClaude Sonnet 4.6Gemini 3Grok 4.20DeepSeek V4GPT-OSS-120BMiniMax M2.5MiMo-V2-Flash
GPT-5.42.30.71.66.86.05.02.8·2.5
Claude Opus 4.67.80.74.57.86.45.45.3·3.3
Gemini 3.1 Pro7.44.94.79.07.3·4.5·3.3
Claude Sonnet 4.68.06.30.27.37.26.47.0·4.0
Gemini 39.08.41.67.09.28.27.0·5.8
Grok 4.207.07.9··7.86.2··4.4
DeepSeek V48.69.04.08.68.6·8.8·8.6
GPT-OSS-120B7.84.81.44.28.37.06.2·2.9
MiniMax M2.5·6.5·6.38.07.57.36.74.5
MiMo-V2-Flash7.88.62.87.08.68.67.28.1·