← Evaluations/EVAL-20260315-054647
code
Mar 15, 2026EVAL-20260315-054647

Implement an LRU cache with per-key TTL (time-to-live) support. Requirements: O(1) get/put, thread-safe, lazy expiration (don't use background threads), configurable max size, eviction callback, and cache hit/miss statistics. Include comprehensive tests.

Winner
Qwen 3 8B
openrouter
9.23
WINNER SCORE
matrix avg: 8.17
results.json report.mdFull dataset (CSV) →
10×10 Judgment Matrix · 66 judgments
OPEN DATA
Judge ↓ / Respondent →Qwen 3 32BKimi K2.5Devstral SmallGemma 3 27BLlama 4 ScoutPhi-4 14BGranite 4.0 MicroQwen 3 8BMistral Nemo 12BLlama 3.1 8B
Qwen 3 32B·8.18.38.48.37.08.43.84.8
Kimi K2.5····5.4····
Devstral Small··9.68.28.68.610.05.87.4
Gemma 3 27B··8.38.68.48.89.46.68.4
Llama 4 Scout··8.49.68.88.89.65.68.6
Phi-4 14B··8.39.68.49.09.86.68.1
Granite 4.0 Micro8.28.28.88.8·8.88.88.88.8
Qwen 3 8B··8.39.28.57.26.83.65.2
Mistral Nemo 12B··7.78.78.38.38.68.38.4
Llama 3.1 8B··8.69.39.69.69.69.68.6