The Multivac — Ask any model, routed by evaluation

◈ MULTIVAC
OverviewEvaluationsLeaderboardModel PulseHistoryCompareExportAPI
Routing APIExport APISign in
← Evaluations/EVAL-20260402-200633
analysis
Apr 02, 2026ANALYSIS-019
Estimate the total addressable market (TAM) for an AI-powered code review tool. Show your work using both top-down (global software developer market → serviceable portion) and bottom-up (pricing × target customers) approaches. (1) What's the TAM, SAM, and SOM? (2) What assumptions drive the biggest uncertainty? (3) A competitor just raised $50M. Does this validate or threaten your market opportunity?
Winner
GPT-OSS-120B
OpenAI
8.81
WINNER SCORE
matrix avg: 8.05
↓ results.json↓ report.mdFull dataset (CSV) →
10×10 Judgment Matrix · 79 judgments
OPEN DATA
Judge ↓ / Respondent →MiMo-V2-FlashGemini 3.1 ProClaude Opus 4.6GPT-5.4DeepSeek V4Claude Sonnet 4.6Grok 4.20GPT-OSS-120BGemini 3MiniMax M2.5
MiMo-V2-Flash—8.68.68.88.68.88.68.68.68.6
Gemini 3.1 Pro7.9—6.37.7·6.58.09.810.09.2
Claude Opus 4.68.2·—8.27.28.68.68.58.27.8
GPT-5.47.0·7.2—6.65.77.57.98.07.8
DeepSeek V49.06.58.89.0—8.88.89.29.0·
Claude Sonnet 4.68.2·8.8·6.8—8.3·8.67.8
Grok 4.207.65.57.68.46.87.8—8.27.8·
GPT-OSS-120B8.4·7.77.07.87.48.0—6.37.5
Gemini 39.6·9.69.09.09.39.69.8—9.0
MiniMax M2.58.4·7.87.48.08.38.48.68.4—