The Multivac — Ask any model, routed by evaluation

◈ MULTIVAC
OverviewEvaluationsLeaderboardModel PulseHistoryCompareExportAPI
Routing APIExport APISign in
← Evaluations/EVAL-20260402-222202
communication
Feb 20, 2026COMM-006
A junior developer submitted this pull request. Write code review comments that are:
- Technically accurate
- Educational (helps them learn, not just tells them what's wrong)
- Kind but honest
- Actionable

```python
# PR: Add user authentication

def login(user, pw):
    # get user from db
    u = db.query(f"SELECT * FROM users WHERE name='{user}'")
    if u == None:
        return False
    # check pw
    if u.password == pw:
        session['user'] = u.name
        session['admin'] = True  # give admin access
        return True
    return False

def is_admin(user):
    return session.get('admin', False)
```
Winner
GPT-OSS-120B
OpenAI
9.64
WINNER SCORE
matrix avg: 9.03
↓ results.json↓ report.mdFull dataset (CSV) →
10×10 Judgment Matrix · 84 judgments
OPEN DATA
Judge ↓ / Respondent →Claude Opus 4.6GPT-5.4Grok 4.20Claude Sonnet 4.6Gemini 3.1 ProDeepSeek V4GPT-OSS-120BMiMo-V2-FlashMistral SmallSeed 1.6 Flash
Claude Opus 4.6—9.89.39.69.69.210.09.39.6·
GPT-5.49.0—9.09.37.08.29.38.08.4·
Grok 4.209.08.8—9.38.68.89.28.89.28.4
Claude Sonnet 4.69.69.89.3—9.28.89.89.09.68.4
Gemini 3.1 Pro10.0·10.010.0—9.310.07.89.6·
DeepSeek V49.69.68.89.29.6—9.89.19.87.1
GPT-OSS-120B8.68.68.69.38.3·—8.49.0·
MiMo-V2-Flash9.29.29.39.38.99.09.8—9.64.5
Mistral Small10.010.010.010.010.09.610.09.6—8.6
Seed 1.6 Flash8.88.89.18.88.48.88.88.69.6—