code
Mar 18, 2026EVAL-20260318-164546A startup has 3 engineers, $50,000 monthly budget, and 90 days to launch an MVP. They need to build: user authentication, a REST API with 12 endpoints, a React dashboard, a PostgreSQL database, CI/CD pipeline, and basic monitoring. Round 1 — Initial Plan: Create a sprint plan. Allocate engineers to tasks. Estimate completion dates. Identify the critical path. Round 2 — Day 30 Reality Check: Engineer #2 quits. The authentication system took twice as long as estimated. 4 of 12 API endpoints are done. The React dashboard has not started. You now have 2 engineers and 60 days. Revise the entire plan. What do you cut? What do you keep? What changes architecturally? Round 3 — Day 60 Optimization: A competitor launches a similar product. Your CEO wants to launch in 15 days, not 30. You have 2 engineers, $20,000 remaining budget, and a half-finished product. The API has 8/12 endpoints, the dashboard is 40% done, CI/CD is not set up. Design the fastest possible path to a launchable MVP. What is the minimum viable version of each component? What can be faked, deferred, or replaced with a third-party service? After all 3 rounds: What meta-principle did you use to make cuts in each round? How did your optimization strategy change as constraints tightened? What would a fourth round look like if the timeline compressed to 7 days?
Winner
MiniMax M2.7
openrouter
7.44
WINNER SCORE
matrix avg: 6.81
10×10 Judgment Matrix · 49 judgments
OPEN DATA
| Judge ↓ / Respondent → | MiniMax M2.7 | MiniMax M2.5 | MiniMax M2.1 | MiniMax M2 | MiniMax M1 | MiniMax-01 | Claude Sonnet 4.6 | GPT-5.4 |
|---|---|---|---|---|---|---|---|---|
| MiniMax M2.7 | — | · | 5.3 | 6.3 | 9.0 | 6.8 | 7.5 | 6.7 |
| MiniMax M2.5 | 8.0 | — | 5.2 | 4.8 | 8.1 | 7.2 | 5.9 | 6.7 |
| MiniMax M2.1 | 8.9 | · | — | 5.9 | 5.8 | 6.7 | 6.5 | 9.6 |
| MiniMax M2 | 8.0 | · | 6.8 | — | 7.7 | 6.4 | 6.7 | 5.5 |
| MiniMax M1 | 5.9 | · | 5.8 | 6.5 | — | 6.0 | 8.0 | 5.5 |
| MiniMax-01 | 8.8 | · | 7.8 | 8.6 | 8.6 | — | 9.0 | 9.2 |
| Claude Sonnet 4.6 | 8.2 | · | 7.7 | 7.5 | 7.6 | 6.7 | — | 8.8 |
| GPT-5.4 | 4.4 | · | 3.5 | 3.9 | 4.4 | 6.2 | 3.8 | — |