{
  "evaluation_id": "EVAL-20260318-162358",
  "question_id": "EVAL-20260318-162358",
  "question_text": "Implement an LRU cache with per-key TTL (time-to-live) support. Requirements: O(1) get/put, thread-safe, lazy expiration (don't use background threads), configurable max size, eviction callback, and cache hit/miss statistics. Include comprehensive tests.",
  "category": "code",
  "timestamp": "2026-03-18T16:23:58.804Z",
  "display_date": "Mar 18, 2026",
  "winner": {
    "name": "MiniMax-01",
    "provider": "openrouter",
    "score": 6.97
  },
  "avg_score": 5.352,
  "matrix_size": 33,
  "models_used": [
    {
      "id": "minimax_m1",
      "name": "MiniMax M1",
      "provider": "openrouter"
    },
    {
      "id": "minimax_01",
      "name": "MiniMax-01",
      "provider": "openrouter"
    },
    {
      "id": "judge_claude_sonnet",
      "name": "Claude Sonnet 4.6",
      "provider": "openrouter"
    },
    {
      "id": "judge_gpt54",
      "name": "GPT-5.4",
      "provider": "openrouter"
    },
    {
      "id": "minimax_m25",
      "name": "MiniMax M2.5",
      "provider": "openrouter"
    }
  ],
  "rankings": {
    "minimax_01": {
      "display_name": "MiniMax-01",
      "provider": "openrouter",
      "average_score": 6.97,
      "score_count": 7,
      "min_score": 4.4,
      "max_score": 7.75,
      "rank": 1
    },
    "judge_gpt54": {
      "display_name": "GPT-5.4",
      "provider": "openrouter",
      "average_score": 6.63,
      "score_count": 7,
      "min_score": 5.6,
      "max_score": 8.6,
      "rank": 2
    },
    "minimax_m1": {
      "display_name": "MiniMax M1",
      "provider": "openrouter",
      "average_score": 5.8,
      "score_count": 7,
      "min_score": 2.25,
      "max_score": 8.6,
      "rank": 3
    },
    "judge_claude_sonnet": {
      "display_name": "Claude Sonnet 4.6",
      "provider": "openrouter",
      "average_score": 4.9,
      "score_count": 7,
      "min_score": 2.05,
      "max_score": 8.6,
      "rank": 4
    },
    "minimax_m25": {
      "display_name": "MiniMax M2.5",
      "provider": "openrouter",
      "average_score": 2.46,
      "score_count": 5,
      "min_score": 0.6,
      "max_score": 8.2,
      "rank": 5
    }
  },
  "judgments": [
    {
      "judge_key": "minimax_m27",
      "judge_name": "MiniMax M2.7",
      "respondent_key": "minimax_m1",
      "respondent_name": "MiniMax M1",
      "weighted_score": 5.9,
      "brief_justification": "The implementation shows strong design with O(1) operations, proper thread-safety via shared_mutex, and correct lazy expiration logic. However, the response is incomplete (cuts off mid-implementation) and crucially lacks the promised comprehensive tests. The move constructor also has issues with const member assignment after moving."
    },
    {
      "judge_key": "minimax_m27",
      "judge_name": "MiniMax M2.7",
      "respondent_key": "minimax_01",
      "respondent_name": "MiniMax-01",
      "weighted_score": 7.75,
      "brief_justification": "Implementation is mostly correct with O(1) operations via OrderedDict and proper thread-safety, but has a minor bug in put() where move_to_end() is called redundantly on existing keys, and test coverage is incomplete as the TTL test is cut off and lacks thread-safety tests."
    },
    {
      "judge_key": "minimax_m27",
      "judge_name": "MiniMax M2.7",
      "respondent_key": "judge_claude_sonnet",
      "respondent_name": "Claude Sonnet 4.6",
      "weighted_score": 5.65,
      "brief_justification": "The implementation shows excellent architecture (RWLock, doubly-linked list, proper abstractions) but is severely incomplete - it cuts off mid-method during the get() implementation with no put(), tests, or finalization. Cannot be used as-is."
    },
    {
      "judge_key": "minimax_m27",
      "judge_name": "MiniMax M2.7",
      "respondent_key": "judge_gpt54",
      "respondent_name": "GPT-5.4",
      "weighted_score": 7.05,
      "brief_justification": "Implementation is well-structured with correct O(1) LRU mechanics and thread-safety, but the keys() method is incomplete (starts iterating from self instead of self._head.next), and critically, no test code was provided despite being explicitly required."
    },
    {
      "judge_key": "minimax_m25",
      "judge_name": "MiniMax M2.5",
      "respondent_key": "minimax_m1",
      "respondent_name": "MiniMax M1",
      "weighted_score": 6.15,
      "brief_justification": "Implementation shows solid design with O(1) operations, proper thread safety using shared_mutex, and lazy expiration. However, the response is incomplete - it cuts off mid-implementation during the remove() method and lacks the requested comprehensive tests."
    },
    {
      "judge_key": "minimax_m25",
      "judge_name": "MiniMax M2.5",
      "respondent_key": "minimax_01",
      "respondent_name": "MiniMax-01",
      "weighted_score": 7.75,
      "brief_justification": "Implementation is mostly correct with good structure and clear explanations, but has a bug in the put() method (move_to_end called twice) and the test code is incomplete/truncated. OrderedDict provides amortized O(1) but not guaranteed O(1) like a custom doubly-linked list would."
    },
    {
      "judge_key": "minimax_m25",
      "judge_name": "MiniMax M2.5",
      "respondent_key": "judge_claude_sonnet",
      "respondent_name": "Claude Sonnet 4.6",
      "weighted_score": 4.4,
      "brief_justification": "Implementation is severely incomplete - the get() method is cut off mid-line, there is no put() method, delete/clear methods, or any tests. The partial code shows good design (doubly-linked list, RWLock, lazy expiration) but cannot function as a usable cache."
    },
    {
      "judge_key": "minimax_m25",
      "judge_name": "MiniMax M2.5",
      "respondent_key": "judge_gpt54",
      "respondent_name": "GPT-5.4",
      "weighted_score": 5.6,
      "brief_justification": "Implementation is well-structured and mostly correct, but the response is severely incomplete: it cuts off mid-code (the keys() method is unfinished) and contains NO tests despite the explicit requirement for comprehensive tests."
    },
    {
      "judge_key": "minimax_m21",
      "judge_name": "MiniMax M2.1",
      "respondent_key": "minimax_m25",
      "respondent_name": "MiniMax M2.5",
      "weighted_score": 1.65,
      "brief_justification": "Response is severely incomplete - cuts off mid-sentence during imports with no actual implementation, tests, or meaningful content provided."
    },
    {
      "judge_key": "minimax_m21",
      "judge_name": "MiniMax M2.1",
      "respondent_key": "minimax_m1",
      "respondent_name": "MiniMax M1",
      "weighted_score": 5.95,
      "brief_justification": "Implementation shows correct design patterns but is severely incomplete - the remove() method is cut off mid-implementation and no tests were provided despite the requirement. The partial code demonstrates proper O(1) operations, thread-safety via shared_mutex, and lazy expiration, but fails to meet completeness and usefulness requirements."
    },
    {
      "judge_key": "minimax_m21",
      "judge_name": "MiniMax M2.1",
      "respondent_key": "minimax_01",
      "respondent_name": "MiniMax-01",
      "weighted_score": 7.55,
      "brief_justification": "The implementation is mostly correct with O(1) operations, thread safety, and lazy expiration working properly. However, the response is incomplete - the test cases are cut off mid-implementation and lack coverage for thread-safety and edge cases. Additionally, there's a logic issue where _evict_lru() is called before move_to_end() in put(), and no explicit test for statistics functionality."
    },
    {
      "judge_key": "minimax_m21",
      "judge_name": "MiniMax M2.1",
      "respondent_key": "judge_claude_sonnet",
      "respondent_name": "Claude Sonnet 4.6",
      "weighted_score": 5.2,
      "brief_justification": "Response is incomplete - the get() method cuts off mid-line, no put() method, eviction logic, or tests are provided. The partial implementation shows good structure with RWLock, doubly-linked list, and statistics, but falls far short of requirements."
    },
    {
      "judge_key": "minimax_m21",
      "judge_name": "MiniMax M2.1",
      "respondent_key": "judge_gpt54",
      "respondent_name": "GPT-5.4",
      "weighted_score": 5.95,
      "brief_justification": "Implementation appears sound but is incomplete - the keys() method is cut off mid-implementation and critically, NO tests are provided despite explicitly promising 'comprehensive tests' in the introduction."
    },
    {
      "judge_key": "minimax_m2",
      "judge_name": "MiniMax M2",
      "respondent_key": "minimax_m1",
      "respondent_name": "MiniMax M1",
      "weighted_score": 6.1,
      "brief_justification": "Implementation demonstrates solid design with correct O(1) LRU+TTL logic, proper locking, and all core features. However, response is severely incomplete - cuts off mid-implementation (remove method unfinished) and entirely lacks the requested comprehensive tests."
    },
    {
      "judge_key": "minimax_m2",
      "judge_name": "MiniMax M2",
      "respondent_key": "minimax_01",
      "respondent_name": "MiniMax-01",
      "weighted_score": 7,
      "brief_justification": "Implementation covers most requirements but has a bug in put() where existing entry's expiration isn't checked before updating, and TTL is overwritten without validation. Tests are incomplete - test_expiration is cut off mid-method and concurrent access tests are missing."
    },
    {
      "judge_key": "minimax_m2",
      "judge_name": "MiniMax M2",
      "respondent_key": "judge_claude_sonnet",
      "respondent_name": "Claude Sonnet 4.6",
      "weighted_score": 4.6,
      "brief_justification": "Code structure and documentation are excellent, demonstrating proper O(1) LRU design with RWLock and lazy expiration. However, the response is severely incomplete - it cuts off mid-implementation in the get() method, missing put(), delete(), clear(), and entirely lacks any tests."
    },
    {
      "judge_key": "minimax_m2",
      "judge_name": "MiniMax M2",
      "respondent_key": "judge_gpt54",
      "respondent_name": "GPT-5.4",
      "weighted_score": 6.05,
      "brief_justification": "Implementation is well-structured with correct O(1) LRU + hashmap design and proper thread-safety, but the response is incomplete - the keys() method is cut off mid-implementation and critically, there are NO tests included despite the explicit requirement for comprehensive tests."
    },
    {
      "judge_key": "minimax_m1",
      "judge_name": "MiniMax M1",
      "respondent_key": "minimax_m25",
      "respondent_name": "MiniMax M2.5",
      "weighted_score": 0.6,
      "brief_justification": "Response is severely incomplete - it cuts off mid-import statement with no actual implementation, tests, or functional code provided. The partial description of the approach (Hash Map + Doubly Linked List) is the only content, but the requested implementation is entirely missing."
    },
    {
      "judge_key": "minimax_m1",
      "judge_name": "MiniMax M1",
      "respondent_key": "minimax_01",
      "respondent_name": "MiniMax-01",
      "weighted_score": 7.55,
      "brief_justification": "Implementation covers all requirements correctly using OrderedDict for O(1) operations and threading.Lock for thread-safety. However, tests are incomplete (last test is truncated) and the implementation has minor issues: put() has redundant move_to_end() calls and doesn't proactively clean expired items during eviction checks."
    },
    {
      "judge_key": "minimax_m1",
      "judge_name": "MiniMax M1",
      "respondent_key": "judge_claude_sonnet",
      "respondent_name": "Claude Sonnet 4.6",
      "weighted_score": 3.8,
      "brief_justification": "Response is severely incomplete - it ends abruptly mid-implementation with the get() method unfinished, missing put() method entirely, and no tests included. The partial code shows correct design patterns but cannot be used as-is."
    },
    {
      "judge_key": "minimax_m1",
      "judge_name": "MiniMax M1",
      "respondent_key": "judge_gpt54",
      "respondent_name": "GPT-5.4",
      "weighted_score": 6.3,
      "brief_justification": "The LRU cache implementation is well-designed with correct O(1) operations, thread safety via RLock, lazy expiration, and proper eviction callbacks. However, the response is incomplete - it lacks the promised comprehensive tests and the keys() method appears truncated (has 'node = self.'). The implementation itself is solid but the incomplete response significantly reduces its practical value."
    },
    {
      "judge_key": "minimax_01",
      "judge_name": "MiniMax-01",
      "respondent_key": "minimax_m25",
      "respondent_name": "MiniMax M2.5",
      "weighted_score": 8.2,
      "brief_justification": "The response provides a solid foundation for an LRU cache with TTL support, but the incomplete code snippet and lack of comprehensive tests reduce its overall completeness and clarity."
    },
    {
      "judge_key": "minimax_01",
      "judge_name": "MiniMax-01",
      "respondent_key": "minimax_m1",
      "respondent_name": "MiniMax M1",
      "weighted_score": 8.6,
      "brief_justification": "The implementation is mostly correct and clear, with a strong foundation for thread safety and lazy expiration, but the incomplete remove function and lack of full test coverage slightly impact completeness and depth."
    },
    {
      "judge_key": "minimax_01",
      "judge_name": "MiniMax-01",
      "respondent_key": "judge_claude_sonnet",
      "respondent_name": "Claude Sonnet 4.6",
      "weighted_score": 8.6,
      "brief_justification": "The implementation is technically sound, with clear logic for LRU eviction, TTL handling, and thread-safety, but the code is incomplete (e.g., the 'misses' increment is unfinished), and some aspects like eviction callbacks and statistics could benefit from more detailed explanations."
    },
    {
      "judge_key": "minimax_01",
      "judge_name": "MiniMax-01",
      "respondent_key": "judge_gpt54",
      "respondent_name": "GPT-5.4",
      "weighted_score": 8.6,
      "brief_justification": "The implementation is mostly correct, comprehensive, and includes O(1) operations, thread-safety, and lazy expiration. However, the code provided is incomplete (e.g., the `keys` method is cut off), and minor edge cases might not be fully handled, slightly reducing clarity and depth."
    },
    {
      "judge_key": "judge_claude_sonnet",
      "judge_name": "Claude Sonnet 4.6",
      "respondent_key": "minimax_m25",
      "respondent_name": "MiniMax M2.5",
      "weighted_score": 1.2,
      "brief_justification": "The response is severely incomplete - it cuts off mid-implementation with 'import' left dangling, providing no actual working code, no tests, and none of the required features (TTL, thread-safety, eviction callback, statistics). The description of the approach is correct but the implementation was never delivered."
    },
    {
      "judge_key": "judge_claude_sonnet",
      "judge_name": "Claude Sonnet 4.6",
      "respondent_key": "minimax_m1",
      "respondent_name": "MiniMax M1",
      "weighted_score": 5.65,
      "brief_justification": "The implementation shows good design with proper data structures (doubly-linked list + hash map for O(1) ops), shared_mutex for read/write separation, and lazy expiration logic. However, the response is incomplete - it cuts off mid-implementation (missing remove(), clear(), stats(), and all tests), the move constructor is broken (can't move const members), and there's a subtle bug in get() where splice() is called while holding a shared_lock (which doesn't allow mutation of the list)."
    },
    {
      "judge_key": "judge_claude_sonnet",
      "judge_name": "Claude Sonnet 4.6",
      "respondent_key": "minimax_01",
      "respondent_name": "MiniMax-01",
      "weighted_score": 6.8,
      "brief_justification": "The implementation is mostly correct with proper O(1) operations via OrderedDict, thread safety, lazy expiration, and eviction callbacks. However, there's a bug in `put()` where `move_to_end` is called after `_evict_lru`, potentially moving a newly inserted key that was already evicted; also the tests are incomplete (cut off mid-test), and the eviction callback is called outside the lock which could cause issues. The `_check_expiration` method uses `> ttl` instead of `>= ttl` which is a minor ed"
    },
    {
      "judge_key": "judge_claude_sonnet",
      "judge_name": "Claude Sonnet 4.6",
      "respondent_key": "judge_gpt54",
      "respondent_name": "GPT-5.4",
      "weighted_score": 6.85,
      "brief_justification": "The implementation is largely correct with proper O(1) operations, thread safety via RLock, lazy expiration, eviction callbacks, and statistics. However, the response is cut off mid-method (the `keys()` method is incomplete), and the promised 'comprehensive tests' are entirely missing, which are explicitly required in the prompt."
    },
    {
      "judge_key": "judge_gpt54",
      "judge_name": "GPT-5.4",
      "respondent_key": "minimax_m25",
      "respondent_name": "MiniMax M2.5",
      "weighted_score": 0.65,
      "brief_justification": "The response is incomplete and contains a syntax error (`import` with no module), so it does not provide a working implementation. It only briefly mentions an approach and fails to satisfy the requested requirements or include tests."
    },
    {
      "judge_key": "judge_gpt54",
      "judge_name": "GPT-5.4",
      "respondent_key": "minimax_m1",
      "respondent_name": "MiniMax M1",
      "weighted_score": 2.25,
      "brief_justification": "The response is incomplete and the provided C++ code is truncated mid-function, so it does not satisfy the implementation request or include tests. It also contains likely correctness issues, such as mutating LRU order and stats under a shared lock and an invalid/incomplete move constructor."
    },
    {
      "judge_key": "judge_gpt54",
      "judge_name": "GPT-5.4",
      "respondent_key": "minimax_01",
      "respondent_name": "MiniMax-01",
      "weighted_score": 4.4,
      "brief_justification": "The response is readable and covers the main ideas, but it does not fully satisfy the stated requirements: the tests are incomplete/truncated, O(1) is not rigorously implemented as requested, and expiration/eviction behavior is underspecified (e.g., callback not invoked on expired removals, expired entries can still occupy capacity until touched). It is a decent partial solution, but not a comprehensive, production-ready implementation with comprehensive tests."
    },
    {
      "judge_key": "judge_gpt54",
      "judge_name": "GPT-5.4",
      "respondent_key": "judge_claude_sonnet",
      "respondent_name": "Claude Sonnet 4.6",
      "weighted_score": 2.05,
      "brief_justification": "The response is truncated mid-implementation, so it is not runnable and fails to satisfy the requirement to include comprehensive tests. While the presented design is somewhat clear, key claims like O(1) behavior and thread-safety are unproven/incomplete, and important required functionality is missing."
    }
  ],
  "meta": {
    "source": "The Multivac (app.themultivac.com)",
    "methodology": "10x10 blind peer matrix evaluation",
    "criteria": "correctness, completeness, clarity, depth, usefulness",
    "self_judgments": "excluded from rankings",
    "license": "Open data — cite as: The Multivac (2026)"
  }
}