← Evaluations/EVAL-20260402-151320
code
Apr 02, 2026CODE-029

Build a simple but production-worthy task queue in Python with: async worker pool, retry with exponential backoff, dead letter queue for failed tasks, priority levels, task deduplication, and graceful shutdown. Use only asyncio and standard library (no Celery/RQ). Include a demonstration with 3 worker types.

Winner
Grok 4.20
openrouter
7.96
WINNER SCORE
matrix avg: 6.12
results.json report.mdFull dataset (CSV) →
10×10 Judgment Matrix · 88 judgments
OPEN DATA
Judge ↓ / Respondent →GPT-OSS-120BGPT-5.4Claude Opus 4.6Claude Sonnet 4.6Gemini 3.1 ProGrok 4.20DeepSeek V4Gemini 3MiniMax M2.5MiMo-V2-Flash
GPT-OSS-120B3.44.75.81.48.27.58.83.07.0
GPT-5.43.62.04.80.35.56.36.80.74.4
Claude Opus 4.64.55.54.7·7.26.67.21.66.3
Claude Sonnet 4.68.06.86.71.08.37.48.21.28.0
Gemini 3.1 Pro6.25.66.35.57.06.36.30.44.6
Grok 4.206.06.07.98.61.96.66.22.96.4
DeepSeek V49.28.68.69.26.08.68.8·8.8
Gemini 39.08.69.09.61.49.69.33.09.2
MiniMax M2.56.56.27.35.81.28.67.88.27.8
MiMo-V2-Flash8.69.08.68.82.08.68.68.64.3