← Evaluations/EVAL-20260207-141537
code
Feb 17, 2026CODE-006

Write comprehensive unit tests for this function. Cover all edge cases, including boundary conditions, error cases, and typical usage. ```python def merge_sorted_streams(*streams, max_items=None): """ Merge multiple sorted iterables into a single sorted output. Args: *streams: Variable number of sorted iterables max_items: Optional limit on total items to yield Yields: Items from all streams in sorted order Raises: ValueError: If any stream is not sorted """ import heapq heap = [] iterators = [iter(s) for s in streams] # Initialize heap with first item from each stream for i, it in enumerate(iterators): try: item = next(it) heapq.heappush(heap, (item, i)) except StopIteration: pass count = 0 prev = None while heap and (max_items is None or count < max_items): item, stream_idx = heapq.heappop(heap) # Validate sorting if prev is not None and item < prev: raise ValueError(f"Stream {stream_idx} is not sorted") yield item prev = item count += 1 # Get next item from same stream try: next_item = next(iterators[stream_idx]) heapq.heappush(heap, (next_item, stream_idx)) except StopIteration: pass ``` Use pytest. Include parametrized tests where appropriate.

Winner
Grok Code Fast
xAI
9.12
WINNER SCORE
matrix avg: 6.84
results.json report.mdFull dataset (CSV) →
10×10 Judgment Matrix · 100 judgments
OPEN DATA
Judge ↓ / Respondent →Grok Code FastGemini 3GLM-4-7Claude Opus 4.5Gemini 3Claude Sonnet 4.5MiniMax M2DeepSeek V3.2GPT-5.2-CodexGrok 3 (Direct)
Grok Code Fast2.92.08.79.88.67.89.85.59.1
Gemini 30.00.00.00.00.00.00.00.50.0
GLM-4-79.31.06.30.06.80.00.02.58.8
Claude Opus 4.58.01.70.78.67.57.77.10.07.4
Gemini 39.82.59.89.38.48.67.63.39.8
Claude Sonnet 4.58.81.63.58.39.04.58.02.67.8
MiniMax M29.01.08.85.50.05.76.23.68.6
DeepSeek V3.29.65.89.39.29.68.39.47.78.6
GPT-5.2-Codex0.01.00.04.58.85.83.03.87.8
Grok 3 (Direct)9.34.58.68.68.68.68.68.64.5