← Evaluations/EVAL-20260207-135408
code
Jan 13, 2026CODE-001

This Python async function has 3 bugs: a race condition, an unhandled exception, and a resource leak. Find all three and explain why each is problematic. ```python import asyncio import aiohttp class DataFetcher: def __init__(self): self.cache = {} self.session = aiohttp.ClientSession() async def fetch_data(self, urls): results = [] for url in urls: if url in self.cache: results.append(self.cache[url]) else: async with self.session.get(url) as response: data = await response.json() self.cache[url] = data results.append(data) return results async def fetch_parallel(self, urls): tasks = [self.fetch_single(url) for url in urls] return await asyncio.gather(*tasks) async def fetch_single(self, url): if url in self.cache: return self.cache[url] async with self.session.get(url) as response: data = await response.json() self.cache[url] = data return data ```

Winner
GPT-5.2-Codex
OpenAI
9.79
WINNER SCORE
matrix avg: 9.00
results.json report.mdFull dataset (CSV) →
10×10 Judgment Matrix · 100 judgments
OPEN DATA
Judge ↓ / Respondent →GLM-4-7Gemini 3Grok Code FastClaude Opus 4.5Claude Sonnet 4.5Gemini 3MiniMax M2DeepSeek V3.2GPT-5.2-CodexGrok 3 (Direct)
GLM-4-79.28.710.09.68.49.810.09.70.0
Gemini 30.09.89.810.09.69.89.89.89.8
Grok Code Fast2.09.89.610.09.810.010.010.010.0
Claude Opus 4.50.79.89.89.88.89.69.69.69.8
Claude Sonnet 4.52.69.89.39.89.69.89.69.810.0
Gemini 30.00.010.00.00.00.00.010.00.0
MiniMax M29.09.88.610.010.06.39.89.810.0
DeepSeek V3.28.610.09.29.69.38.69.69.89.6
GPT-5.2-Codex0.08.88.78.88.87.38.88.08.8
Grok 3 (Direct)6.58.89.29.69.68.89.78.89.7