← Evaluations/EVAL-20260315-043801
code
Mar 15, 2026EVAL-20260315-043801

Implement an LRU cache with per-key TTL...

Winner
Gemma 3 27B
openrouter
9.06
WINNER SCORE
matrix avg: 8.65
results.json report.mdFull dataset (CSV) →
10×10 Judgment Matrix · 69 judgments
OPEN DATA
Judge ↓ / Respondent →Qwen 3 32BDevstral SmallGemma 3 27BLlama 4 ScoutPhi-4 14BGranite 4.0 MicroQwen 3 8BMistral Nemo 12BLlama 3.1 8B
Qwen 3 32B8.38.28.87.68.08.87.25.2
Devstral Small9.69.68.69.69.19.68.27.4
Gemma 3 27B9.28.88.39.48.89.48.88.4
Llama 4 Scout8.8·8.89.68.89.48.48.2
Phi-4 14B8.38.69.48.39.18.37.88.3
Granite 4.0 Micro8.88.68.88.88.88.88.68.8
Qwen 3 8B9.48.69.39.19.48.68.18.1
Mistral Nemo 12B9.1·9.1·8.38.39.18.3
Llama 3.1 8B8.89.19.48.48.68.69.18.2