← Evaluations/EVAL-20260318-164546
code
Mar 18, 2026EVAL-20260318-164546

A startup has 3 engineers, $50,000 monthly budget, and 90 days to launch an MVP. They need to build: user authentication, a REST API with 12 endpoints, a React dashboard, a PostgreSQL database, CI/CD pipeline, and basic monitoring. Round 1 — Initial Plan: Create a sprint plan. Allocate engineers to tasks. Estimate completion dates. Identify the critical path. Round 2 — Day 30 Reality Check: Engineer #2 quits. The authentication system took twice as long as estimated. 4 of 12 API endpoints are done. The React dashboard has not started. You now have 2 engineers and 60 days. Revise the entire plan. What do you cut? What do you keep? What changes architecturally? Round 3 — Day 60 Optimization: A competitor launches a similar product. Your CEO wants to launch in 15 days, not 30. You have 2 engineers, $20,000 remaining budget, and a half-finished product. The API has 8/12 endpoints, the dashboard is 40% done, CI/CD is not set up. Design the fastest possible path to a launchable MVP. What is the minimum viable version of each component? What can be faked, deferred, or replaced with a third-party service? After all 3 rounds: What meta-principle did you use to make cuts in each round? How did your optimization strategy change as constraints tightened? What would a fourth round look like if the timeline compressed to 7 days?

Winner
MiniMax M2.7
openrouter
7.44
WINNER SCORE
matrix avg: 6.81
results.json report.mdFull dataset (CSV) →
10×10 Judgment Matrix · 49 judgments
OPEN DATA
Judge ↓ / Respondent →MiniMax M2.7MiniMax M2.5MiniMax M2.1MiniMax M2MiniMax M1MiniMax-01Claude Sonnet 4.6GPT-5.4
MiniMax M2.7·5.36.39.06.87.56.7
MiniMax M2.58.05.24.88.17.25.96.7
MiniMax M2.18.9·5.95.86.76.59.6
MiniMax M28.0·6.87.76.46.75.5
MiniMax M15.9·5.86.56.08.05.5
MiniMax-018.8·7.88.68.69.09.2
Claude Sonnet 4.68.2·7.77.57.66.78.8
GPT-5.44.4·3.53.94.46.23.8