← Evaluations/EVAL-20260402-123643
code
Feb 24, 2026CODE-007

Explain what this code does in plain English. Then identify any bugs or design issues. ```python def f(x, n=3, m=None): m = m or {} if n == 0: return [[]] if x in m: return m[x] r = [] for i in range(len(x)): for p in f(x[:i] + x[i+1:], n-1, m): r.append([x[i]] + p) m[x] = r return r def g(s, k): from collections import Counter c = Counter(s) h = [] import heapq for ch, cnt in c.items(): heapq.heappush(h, (-cnt, ch)) r = [] while h and len(r) < k: cnt, ch = heapq.heappop(h) r.append(ch) return ''.join(r) ```

Winner
GPT-5.4
openrouter
9.14
WINNER SCORE
matrix avg: 7.23
results.json report.mdFull dataset (CSV) →
10×10 Judgment Matrix · 63 judgments
OPEN DATA
Judge ↓ / Respondent →GPT-5.4Claude Opus 4.6Gemini 3.1 ProClaude Sonnet 4.6Grok 4.20DeepSeek V4GPT-OSS-120BGemini 3MiniMax M2.5MiMo-V2-Flash
GPT-5.48.40.38.6·6.57.87.8··
Claude Opus 4.610.01.09.2·6.99.28.9·1.3
Gemini 3.1 Pro8.68.48.3·5.3·9.2··
Claude Sonnet 4.69.68.6··7.59.08.2·2.3
Grok 4.208.88.81.98.8·8.88.4·6.0
DeepSeek V49.08.45.09.2·9.48.8·8.1
GPT-OSS-120B8.08.41.0··6.17.0··
Gemini 310.010.02.29.8·9.09.6··
MiniMax M2.58.88.42.58.6··8.67.56.8
MiMo-V2-Flash9.68.01.99.0·8.68.69.2·