← Evaluations/EVAL-20260402-131157
code
Apr 02, 2026CODE-012

Implement a production-ready circuit breaker pattern in Python. It should support three states (closed, open, half-open), configurable failure thresholds, automatic recovery with exponential backoff, proper async support, and metrics collection. Include usage example wrapping an HTTP client.

Winner
Grok 4.20
openrouter
7.44
WINNER SCORE
matrix avg: 6.10
results.json report.mdFull dataset (CSV) →
10×10 Judgment Matrix · 87 judgments
OPEN DATA
Judge ↓ / Respondent →Gemini 3GPT-5.4Claude Opus 4.6Gemini 3.1 ProClaude Sonnet 4.6Grok 4.20DeepSeek V4GPT-OSS-120BMiniMax M2.5MiMo-V2-Flash
Gemini 38.29.63.37.39.28.67.86.39.2
GPT-5.45.33.00.73.05.03.82.81.94.2
Claude Opus 4.66.06.91.65.56.85.65.32.95.4
Gemini 3.1 Pro5.45.35.74.56.74.54.82.05.4
Claude Sonnet 4.67.07.07.51.97.26.87.25.36.6
Grok 4.207.0·7.93.66.46.86.05.86.2
DeepSeek V49.69.08.88.48.69.0·7.88.8
GPT-OSS-120B7.53.64.22.03.46.86.24.76.5
MiniMax M2.56.87.36.0·6.88.25.86.37.2
MiMo-V2-Flash8.68.68.64.57.08.28.07.07.7