← Evaluations/EVAL-20260402-132144
code
Apr 02, 2026CODE-014

Implement a Last-Writer-Wins Element Set (LWW-Element-Set) CRDT in Python. It should support add, remove, lookup, and merge operations. Include a proof that merge is commutative, associative, and idempotent. Write tests demonstrating conflict resolution between two divergent replicas.

results.json report.mdFull dataset (CSV) →
10×10 Judgment Matrix · 38 judgments
OPEN DATA
Judge ↓ / Respondent →MiMo-V2-FlashGemini 3.1 ProClaude Sonnet 4.6Grok 4.20DeepSeek V4GPT-OSS-120BMiniMax M2.5Gemini 3GPT-5.4Claude Opus 4.6
MiMo-V2-Flash1.67.6·8.88.6·8.68.8·
Gemini 3.1 Pro7.96.0·8.96.8·9.8··
Claude Sonnet 4.68.05.1··8.8·8.88.77.7
Grok 4.207.7·8.67.4··8.68.68.4
DeepSeek V48.87.68.6·9.6·9.69.48.8
GPT-OSS-120B·2.46.6·8.8··8.34.7
MiniMax M2.57.7····7.18.4··
Gemini 3·········
GPT-5.4·········
Claude Opus 4.6·········