← Evaluations/EVAL-20260402-144911
code
Apr 02, 2026CODE-026

Design and implement health check endpoints for a microservice that depends on a database, Redis cache, and an external API. Include: liveness probe (is the process alive?), readiness probe (can it serve traffic?), and startup probe (is initialization complete?). Handle cascading failures — if Redis is down, should the service report unhealthy?

Winner
GPT-5.4
openrouter
9.12
WINNER SCORE
matrix avg: 7.66
results.json report.mdFull dataset (CSV) →
10×10 Judgment Matrix · 88 judgments
OPEN DATA
Judge ↓ / Respondent →GPT-5.4Claude Opus 4.6Gemini 3.1 ProClaude Sonnet 4.6Grok 4.20DeepSeek V4GPT-OSS-120BGemini 3MiniMax M2.5MiMo-V2-Flash
GPT-5.46.53.34.07.28.65.08.31.68.2
Claude Opus 4.69.25.77.78.37.67.09.21.78.0
Gemini 3.1 Pro9.37.36.49.69.85.29.60.27.6
Claude Sonnet 4.68.87.3·8.67.88.09.05.08.3
Grok 4.209.08.76.08.78.68.48.8·8.4
DeepSeek V49.69.08.69.09.69.69.37.09.6
GPT-OSS-120B8.46.83.85.18.88.68.84.77.8
Gemini 39.88.38.69.69.39.89.62.69.6
MiniMax M2.58.67.95.87.38.87.87.58.37.8
MiMo-V2-Flash9.38.67.28.69.38.68.69.27.8