The Multivac — Ask any model, routed by evaluation

◈ MULTIVAC
OverviewEvaluationsLeaderboardModel PulseHistoryCompareExportAPI
Routing APIExport APISign in
← Evaluations/EVAL-20260403-110530
communication
Feb 20, 2026COMM-006
A junior developer submitted this pull request. Write code review comments that are:
- Technically accurate
- Educational (helps them learn, not just tells them what's wrong)
- Kind but honest
- Actionable

```python
# PR: Add user authentication

def login(user, pw):
    # get user from db
    u = db.query(f"SELECT * FROM users WHERE name='{user}'")
    if u == None:
        return False
    # check pw
    if u.password == pw:
        session['user'] = u.name
        session['admin'] = True  # give admin access
        return True
    return False

def is_admin(user):
    return session.get('admin', False)
```
Winner
GPT-OSS-120B
OpenAI
9.58
WINNER SCORE
matrix avg: 9.31
↓ results.json↓ report.mdFull dataset (CSV) →
10×10 Judgment Matrix · 89 judgments
OPEN DATA
Judge ↓ / Respondent →Claude Opus 4.6Grok 4.20GPT-5.4Claude Sonnet 4.6Gemini 3.1 ProDeepSeek V4GPT-OSS-120BMiMo-V2-FlashMistral SmallSeed 1.6 Flash
Claude Opus 4.6—9.39.89.89.69.09.89.39.69.6
Grok 4.209.0—8.89.29.08.89.39.09.09.2
GPT-5.49.69.6—9.68.48.69.38.39.28.6
Claude Sonnet 4.69.89.39.8—9.28.89.88.69.69.3
Gemini 3.1 Pro10.010.010.010.0—9.410.07.710.09.8
DeepSeek V49.69.69.69.89.6—9.69.49.89.8
GPT-OSS-120B8.68.48.68.68.68.1—8.68.88.6
MiMo-V2-Flash10.09.3·10.09.38.69.6—10.010.0
Mistral Small10.010.010.010.09.89.810.09.8—10.0
Seed 1.6 Flash8.88.88.88.88.88.68.88.68.8—