The Multivac — Ask any model, routed by evaluation

◈ MULTIVAC
OverviewEvaluationsLeaderboardModel PulseHistoryCompareExportAPI
Routing APIExport APISign in
← Evaluations/EVAL-20260403-101934
analysis
Apr 03, 2026ANALYSIS-019
Estimate the total addressable market (TAM) for an AI-powered code review tool. Show your work using both top-down (global software developer market → serviceable portion) and bottom-up (pricing × target customers) approaches. (1) What's the TAM, SAM, and SOM? (2) What assumptions drive the biggest uncertainty? (3) A competitor just raised $50M. Does this validate or threaten your market opportunity?
Winner
Grok 4.20
openrouter
8.89
WINNER SCORE
matrix avg: 7.74
↓ results.json↓ report.mdFull dataset (CSV) →
10×10 Judgment Matrix · 88 judgments
OPEN DATA
Judge ↓ / Respondent →GPT-OSS-120BGemini 3Gemini 3.1 ProClaude Opus 4.6GPT-5.4DeepSeek V4MiMo-V2-FlashClaude Sonnet 4.6Grok 4.20MiniMax M2.5
GPT-OSS-120B—8.0·7.87.56.38.48.08.47.8
Gemini 39.0—5.39.69.69.09.89.69.6·
Gemini 3.1 Pro4.39.4—7.37.77.78.66.59.86.5
Claude Opus 4.68.08.83.9—8.36.48.27.48.97.3
GPT-5.46.38.22.17.0—6.08.07.08.48.0
DeepSeek V48.79.07.08.88.8—9.09.08.89.0
MiMo-V2-Flash8.09.06.08.88.68.6—8.69.28.2
Claude Sonnet 4.68.68.63.99.08.66.88.6—8.67.8
Grok 4.207.87.85.57.88.45.87.87.6—6.6
MiniMax M2.56.88.43.98.08.47.87.87.58.4—