← Evaluations/EVAL-20260402-144035
code
Apr 02, 2026CODE-025

Implement an LRU cache with per-key TTL (time-to-live) support. Requirements: O(1) get/put, thread-safe, lazy expiration (don't use background threads), configurable max size, eviction callback, and cache hit/miss statistics. Include comprehensive tests.

Winner
Gemini 3 Flash Preview
Google
8.34
WINNER SCORE
matrix avg: 6.43
results.json report.mdFull dataset (CSV) →
10×10 Judgment Matrix · 78 judgments
OPEN DATA
Judge ↓ / Respondent →DeepSeek V4GPT-OSS-120BMiMo-V2-FlashGPT-5.4Claude Opus 4.6Gemini 3.1 ProClaude Sonnet 4.6Grok 4.20Gemini 3MiniMax M2.5
DeepSeek V49.29.69.68.8·8.89.29.6·
GPT-OSS-120B7.56.57.74.82.93.27.08.8·
MiMo-V2-Flash7.88.68.68.6·7.68.29.0·
GPT-5.44.32.04.33.51.63.04.46.8·
Claude Opus 4.66.04.96.56.51.25.55.67.8·
Gemini 3.1 Pro6.25.26.25.55.86.54.59.0·
Claude Sonnet 4.66.86.07.08.26.71.96.87.5·
Grok 4.206.05.86.28.66.03.66.48.0·
Gemini 38.07.89.09.69.04.38.48.6·
MiniMax M2.5·4.76.96.84.53.65.77.58.6