← Evaluations/EVAL-20260402-133517
code
Apr 02, 2026CODE-016

Given these hex dumps of network packets and their known meanings, reverse-engineer the binary protocol format and write a parser. Packet 1 (Login): 4d 56 01 00 0c 68 65 6c 6c 6f 5f 77 6f 72 6c 64 00 00 00 05 61 64 6d 69 6e Known: username="hello_world", password="admin" Packet 2 (Login): 4d 56 01 00 08 74 65 73 74 75 73 65 72 00 00 00 04 70 61 73 73 Known: username="testuser", password="pass" Packet 3 (Message): 4d 56 02 00 05 68 65 6c 6c 6f 00 00 00 01 Known: message="hello", room_id=1 Write the protocol specification and a Python parser/serializer.

Winner
GPT-5.4
openrouter
9.19
WINNER SCORE
matrix avg: 6.03
results.json report.mdFull dataset (CSV) →
10×10 Judgment Matrix · 71 judgments
OPEN DATA
Judge ↓ / Respondent →Gemini 3.1 ProDeepSeek V4GPT-5.4Claude Opus 4.6Claude Sonnet 4.6Grok 4.20GPT-OSS-120BGemini 3MiniMax M2.5MiMo-V2-Flash
Gemini 3.1 Pro5.89.84.0·5.3·6.34.35.2
DeepSeek V45.09.67.35.59.3·8.88.68.8
GPT-5.40.74.32.31.43.4·5.23.34.0
Claude Opus 4.6·5.88.32.56.3·6.06.96.0
Claude Sonnet 4.61.46.09.05.65.5·4.87.08.1
Grok 4.203.35.38.84.84.5··4.88.4
GPT-OSS-120B2.9·9.13.9·5.24.24.64.2
Gemini 3·10.010.04.03.59.6·8.99.0
MiniMax M2.51.98.8·····8.68.1
MiMo-V2-Flash4.38.69.06.88.28.6·8.68.6