← Evaluations/EVAL-20260403-125023
code
Apr 03, 2026CODE-016

Given these hex dumps of network packets and their known meanings, reverse-engineer the binary protocol format and write a parser. Packet 1 (Login): 4d 56 01 00 0c 68 65 6c 6c 6f 5f 77 6f 72 6c 64 00 00 00 05 61 64 6d 69 6e Known: username="hello_world", password="admin" Packet 2 (Login): 4d 56 01 00 08 74 65 73 74 75 73 65 72 00 00 00 04 70 61 73 73 Known: username="testuser", password="pass" Packet 3 (Message): 4d 56 02 00 05 68 65 6c 6c 6f 00 00 00 01 Known: message="hello", room_id=1 Write the protocol specification and a Python parser/serializer.

Winner
GPT-OSS-120B
OpenAI
9.38
WINNER SCORE
matrix avg: 6.19
results.json report.mdFull dataset (CSV) →
10×10 Judgment Matrix · 76 judgments
OPEN DATA
Judge ↓ / Respondent →GPT-5.4Claude Opus 4.6Gemini 3.1 ProClaude Sonnet 4.6Grok 4.20DeepSeek V4GPT-OSS-120BGemini 3MiniMax M2.5MiMo-V2-Flash
GPT-5.41.90.52.33.03.89.04.2·4.2
Claude Opus 4.68.90.74.55.25.09.66.2·5.3
Gemini 3.1 Pro9.02.54.05.23.89.87.5·6.3
Claude Sonnet 4.68.84.31.08.85.39.69.0·6.3
Grok 4.208.8·2.04.35.57.58.8·5.5
DeepSeek V49.66.31.95.89.69.88.6·9.6
GPT-OSS-120B8.84.30.72.9·4.34.6·4.3
Gemini 310.0·1.66.810.07.310.0·10.0
MiniMax M2.58.84.21.05.2··9.68.89.6
MiMo-V2-Flash9.06.82.95.89.38.69.68.6·