← Evaluations/EVAL-20260402-135144
code
Apr 02, 2026CODE-018

Implement a minimal but correct event sourcing system in Python. Include: an Event base class, an EventStore (in-memory is fine), an Aggregate base class that rebuilds state from events, a BankAccount aggregate with deposit/withdraw/transfer commands, and snapshot support for performance. Demonstrate eventual consistency between two projections.

Winner
Grok 4.20
openrouter
7.72
WINNER SCORE
matrix avg: 6.26
results.json report.mdFull dataset (CSV) →
10×10 Judgment Matrix · 90 judgments
OPEN DATA
Judge ↓ / Respondent →GPT-OSS-120BGPT-5.4Claude Opus 4.6Claude Sonnet 4.6Gemini 3.1 ProGrok 4.20DeepSeek V4Gemini 3MiniMax M2.5MiMo-V2-Flash
GPT-OSS-120B4.84.15.02.47.24.97.53.25.5
GPT-5.43.53.63.30.86.44.26.21.92.0
Claude Opus 4.65.86.86.30.86.85.46.51.95.3
Claude Sonnet 4.68.07.28.81.98.05.87.83.07.0
Gemini 3.1 Pro5.54.65.96.06.75.56.01.65.2
Grok 4.208.66.48.47.93.36.06.63.66.2
DeepSeek V49.48.69.68.68.19.49.67.89.6
Gemini 38.68.49.69.62.39.08.65.08.6
MiniMax M2.57.96.16.25.61.07.67.37.55.8
MiMo-V2-Flash8.68.68.88.68.88.38.09.05.8