# The Multivac — Evaluation Report

**Evaluation ID:** EVAL-20260318-164546
**Date:** Mar 18, 2026
**Category:** code
**Question ID:** EVAL-20260318-164546

---

## Question

A startup has 3 engineers, $50,000 monthly budget, and 90 days to launch an MVP. They need to build: user authentication, a REST API with 12 endpoints, a React dashboard, a PostgreSQL database, CI/CD pipeline, and basic monitoring.

Round 1 — Initial Plan: Create a sprint plan. Allocate engineers to tasks. Estimate completion dates. Identify the critical path.

Round 2 — Day 30 Reality Check: Engineer #2 quits. The authentication system took twice as long as estimated. 4 of 12 API endpoints are done. The React dashboard has not started. You now have 2 engineers and 60 days. Revise the entire plan. What do you cut? What do you keep? What changes architecturally?

Round 3 — Day 60 Optimization: A competitor launches a similar product. Your CEO wants to launch in 15 days, not 30. You have 2 engineers, $20,000 remaining budget, and a half-finished product. The API has 8/12 endpoints, the dashboard is 40% done, CI/CD is not set up. Design the fastest possible path to a launchable MVP. What is the minimum viable version of each component? What can be faked, deferred, or replaced with a third-party service?

After all 3 rounds: What meta-principle did you use to make cuts in each round? How did your optimization strategy change as constraints tightened? What would a fourth round look like if the timeline compressed to 7 days?

---

## Winner

**MiniMax M2.7** (openrouter)
- Winner Score: 7.44
- Matrix Average: 6.81
- Total Judgments: 49

---

## Rankings

| Rank | Model | Provider | Avg Score | Judgments |
|------|-------|----------|-----------|----------|
| 1 | MiniMax M2.7 | openrouter | 7.44 | 7 |
| 2 | GPT-5.4 | openrouter | 7.44 | 7 |
| 3 | MiniMax M1 | openrouter | 7.31 | 7 |
| 4 | Claude Sonnet 4.6 | openrouter | 6.76 | 7 |
| 5 | MiniMax-01 | openrouter | 6.56 | 7 |
| 6 | MiniMax M2 | MiniMax | 6.19 | 7 |
| 7 | MiniMax M2.1 | openrouter | 5.99 | 7 |

---

## 10×10 Judgment Matrix

Rows = Judge, Columns = Respondent. Self-judgments excluded (—).

| Judge ↓ / Resp → | MiniMax M2.7 | MiniMax M2.5 | MiniMax M2.1 | MiniMax M2 | MiniMax M1 | MiniMax-01 | Claude Sonnet | GPT-5.4 |
|---|---|---|---|---|---|---|---|---|
| MiniMax M2.7 | — | · | 5.3 | 6.3 | 9.0 | 6.8 | 7.5 | 6.7 |
| MiniMax M2.5 | 8.0 | — | 5.2 | 4.8 | 8.1 | 7.2 | 5.9 | 6.7 |
| MiniMax M2.1 | 8.9 | · | — | 5.9 | 5.8 | 6.7 | 6.5 | 9.6 |
| MiniMax M2 | 8.0 | · | 6.8 | — | 7.7 | 6.4 | 6.7 | 5.5 |
| MiniMax M1 | 5.9 | · | 5.8 | 6.5 | — | 6.0 | 8.0 | 5.5 |
| MiniMax-01 | 8.8 | · | 7.8 | 8.6 | 8.6 | — | 9.0 | 9.2 |
| Claude Sonnet | 8.2 | · | 7.7 | 7.5 | 7.6 | 6.7 | — | 8.8 |
| GPT-5.4 | 4.4 | · | 3.5 | 3.9 | 4.4 | 6.2 | 3.8 | — |

---

## Methodology

- **10×10 Blind Peer Matrix:** All models answer the same question, then all models judge all responses.
- **5 Criteria:** Correctness, completeness, clarity, depth, usefulness (each scored 1–10).
- **Self-judgments excluded:** Models do not judge their own responses.
- **Weighted Score:** Composite of all 5 criteria.

---

## Citation

The Multivac (2026). Blind Peer Evaluation: EVAL-20260318-164546. app.themultivac.com

## License

Open data. Free to use, share, and build upon. Please cite The Multivac when using this data.

Download raw JSON: https://app.themultivac.com/api/evaluations/EVAL-20260318-164546/results
Full dataset: https://app.themultivac.com/dashboard/export
