← Evaluations/EVAL-20260402-123136
code
Feb 17, 2026CODE-006

Write comprehensive unit tests for this function. Cover all edge cases, including boundary conditions, error cases, and typical usage. ```python def merge_sorted_streams(*streams, max_items=None): """ Merge multiple sorted iterables into a single sorted output. Args: *streams: Variable number of sorted iterables max_items: Optional limit on total items to yield Yields: Items from all streams in sorted order Raises: ValueError: If any stream is not sorted """ import heapq heap = [] iterators = [iter(s) for s in streams] # Initialize heap with first item from each stream for i, it in enumerate(iterators): try: item = next(it) heapq.heappush(heap, (item, i)) except StopIteration: pass count = 0 prev = None while heap and (max_items is None or count < max_items): item, stream_idx = heapq.heappop(heap) # Validate sorting if prev is not None and item < prev: raise ValueError(f"Stream {stream_idx} is not sorted") yield item prev = item count += 1 # Get next item from same stream try: next_item = next(iterators[stream_idx]) heapq.heappush(heap, (next_item, stream_idx)) except StopIteration: pass ``` Use pytest. Include parametrized tests where appropriate.

Winner
GPT-5.4
openrouter
9.08
WINNER SCORE
matrix avg: 7.20
results.json report.mdFull dataset (CSV) →
10×10 Judgment Matrix · 86 judgments
OPEN DATA
Judge ↓ / Respondent →GPT-5.4Claude Opus 4.6Gemini 3.1 ProClaude Sonnet 4.6Grok 4.20DeepSeek V4GPT-OSS-120BGemini 3MiniMax M2.5MiMo-V2-Flash
GPT-5.43.80.72.65.37.32.97.12.64.5
Claude Opus 4.69.01.67.77.68.06.87.55.37.7
Gemini 3.1 Pro9.66.44.85.59.36.09.45.37.8
Claude Sonnet 4.69.08.01.28.08.37.38.27.08.0
Grok 4.208.66.83.6·6.68.38.67.77.5
DeepSeek V49.68.68.28.69.39.19.39.68.8
GPT-OSS-120B8.86.22.2·7.58.88.86.57.5
Gemini 310.0·2.97.89.69.89.06.89.2
MiniMax M2.58.87.51.27.58.28.67.3·7.2
MiMo-V2-Flash8.37.89.08.08.88.68.67.88.6