← Evaluations/EVAL-20260402-121140
code
Jan 13, 2026CODE-001

This Python async function has 3 bugs: a race condition, an unhandled exception, and a resource leak. Find all three and explain why each is problematic. ```python import asyncio import aiohttp class DataFetcher: def __init__(self): self.cache = {} self.session = aiohttp.ClientSession() async def fetch_data(self, urls): results = [] for url in urls: if url in self.cache: results.append(self.cache[url]) else: async with self.session.get(url) as response: data = await response.json() self.cache[url] = data results.append(data) return results async def fetch_parallel(self, urls): tasks = [self.fetch_single(url) for url in urls] return await asyncio.gather(*tasks) async def fetch_single(self, url): if url in self.cache: return self.cache[url] async with self.session.get(url) as response: data = await response.json() self.cache[url] = data return data ```

Winner
Grok 4.20
openrouter
9.44
WINNER SCORE
matrix avg: 8.51
results.json report.mdFull dataset (CSV) →
10×10 Judgment Matrix · 86 judgments
OPEN DATA
Judge ↓ / Respondent →GPT-5.4Claude Opus 4.6Gemini 3.1 ProClaude Sonnet 4.6Grok 4.20DeepSeek V4GPT-OSS-120BGemini 3MiniMax M2.5MiMo-V2-Flash
GPT-5.48.88.39.09.07.80.38.66.78.3
Claude Opus 4.69.28.29.68.98.10.79.27.59.2
Gemini 3.1 Pro9.710.010.010.08.0·10.06.38.5
Claude Sonnet 4.69.09.38.29.08.0·8.88.38.8
Grok 4.208.68.38.68.08.64.29.08.66.8
DeepSeek V49.48.89.410.09.67.69.69.610.0
GPT-OSS-120B8.49.38.18.89.38.88.67.38.8
Gemini 39.810.09.610.09.89.8·8.610.0
MiniMax M2.58.89.48.49.09.610.0·9.810.0
MiMo-V2-Flash9.210.010.09.69.89.08.89.88.6