The Multivac — Ask any model, routed by evaluation

◈ MULTIVAC
OverviewEvaluationsLeaderboardModel PulseHistoryCompareExportAPI
Routing APIExport APISign in
← Evaluations/EVAL-20260402-135609
code
Apr 02, 2026CODE-019
Implement a Bloom filter from scratch (no libraries) with the following: configurable false positive rate, optimal hash function count calculation, serialization/deserialization, a counting variant that supports deletion, and memory usage statistics. Include mathematical proof of your false positive rate formula.
Winner
Grok 4.20
openrouter
8.68
WINNER SCORE
matrix avg: 6.81
↓ results.json↓ report.mdFull dataset (CSV) →
10×10 Judgment Matrix · 80 judgments
OPEN DATA
Judge ↓ / Respondent →MiMo-V2-FlashGPT-OSS-120BGPT-5.4Claude Opus 4.6Gemini 3.1 ProClaude Sonnet 4.6Grok 4.20DeepSeek V4Gemini 3MiniMax M2.5
MiMo-V2-Flash—9.08.68.72.87.78.66.28.6·
GPT-OSS-120B7.0—5.86.03.45.29.06.38.8·
GPT-5.44.24.0—3.90.74.67.34.26.8·
Claude Opus 4.66.86.56.5—1.27.57.24.57.4·
Gemini 3.1 Pro6.45.76.05.8—6.09.44.28.4·
Claude Sonnet 4.66.87.07.88.31.2—8.65.87.8·
Grok 4.207.28.78.18.73.37.9—6.07.8·
DeepSeek V49.48.88.88.45.88.69.4—8.8·
Gemini 38.69.48.89.62.68.69.86.3—·
MiniMax M2.57.87.57.56.5·6.88.86.68.0—