← Evaluations/EVAL-20260402-140137
code
Apr 02, 2026CODE-020

Refactor this 'working but unmaintainable' code into clean, testable, well-structured code. Explain every design decision. ```python def process(data, mode, flag1=False, flag2=True, output_type='json'): results = [] for item in data: if mode == 'fast': if item.get('type') == 'A': val = item['value'] * 1.1 if flag1 else item['value'] if val > 100: if flag2: results.append({'id': item['id'], 'val': val, 'status': 'high'}) else: results.append({'id': item['id'], 'val': val * 0.9, 'status': 'adjusted'}) else: results.append({'id': item['id'], 'val': val, 'status': 'normal'}) elif item.get('type') == 'B': val = item['value'] * 0.95 results.append({'id': item['id'], 'val': val, 'status': 'discounted'}) else: if flag1 and flag2: results.append({'id': item['id'], 'val': 0, 'status': 'skip'}) elif mode == 'careful': try: val = float(item.get('value', 0)) if val < 0: raise ValueError('negative') results.append({'id': item['id'], 'val': val, 'status': 'validated'}) except: results.append({'id': item['id'], 'val': 0, 'status': 'error'}) if output_type == 'json': import json; return json.dumps(results) elif output_type == 'csv': return '\n'.join([f"{r['id']},{r['val']},{r['status']}" for r in results]) return results ```

Winner
GPT-5.4
openrouter
9.18
WINNER SCORE
matrix avg: 8.36
results.json report.mdFull dataset (CSV) →
10×10 Judgment Matrix · 69 judgments
OPEN DATA
Judge ↓ / Respondent →DeepSeek V4GPT-OSS-120BMiMo-V2-FlashGPT-5.4Claude Opus 4.6Gemini 3.1 ProClaude Sonnet 4.6Grok 4.20Gemini 3MiniMax M2.5
DeepSeek V49.89.89.89.88.610.0·9.8·
GPT-OSS-120B8.38.68.86.68.06.2···
MiMo-V2-Flash9.27.89.28.88.09.3·9.310.0
GPT-5.46.33.86.24.32.04.8·8.0·
Claude Opus 4.68.08.28.89.24.98.2·9.0·
Gemini 3.1 Pro8.27.58.0·7.27.5···
Claude Sonnet 4.68.88.29.09.09.27.9·9.2·
Grok 4.208.88.78.89.08.87.9·8.8·
Gemini 39.89.69.69.610.06.79.8··
MiniMax M2.58.86.77.88.87.53.18.1·8.6