reasoning
Feb 18, 2026REASON-006Schedule a one-day conference with these constraints: TALKS: A (90min), B (60min), C (45min), D (30min), E (30min), F (45min) ROOMS: Main Hall (capacity 500), Room 2 (capacity 100), Room 3 (capacity 50) TIME: 9:00 AM - 5:00 PM, with mandatory lunch break 12:00-1:00 PM CONSTRAINTS: 1. Talk A must be in Main Hall (expected attendance: 400) 2. Talk B and C cannot overlap (same speaker) 3. Talk D must be before Talk E (E builds on D's content) 4. Talk F requires Room 2's AV equipment 5. No room can have more than 3 talks total 6. At least one talk must be running at all times (except lunch) 7. Talk A cannot start before 10:00 AM (speaker arriving late) 8. Talk E must end by 3:00 PM (speaker leaving early) Find a valid schedule or prove none exists.
Winner
GPT-5.4
openrouter
8.32
WINNER SCORE
matrix avg: 5.13
10×10 Judgment Matrix · 72 judgments
OPEN DATA
| Judge ↓ / Respondent → | DeepSeek V4 | GPT-OSS-120B | Gemini 3.1 Pro | Claude Opus 4.6 | GPT-5.4 | Grok 4.20 | Claude Sonnet 4.6 | MiMo-V2-Flash | Gemini 2.5 Flash | MiniMax M2.5 |
|---|---|---|---|---|---|---|---|---|---|---|
| DeepSeek V4 | — | · | 2.9 | 9.6 | 9.6 | 9.3 | 8.6 | 9.3 | 8.1 | · |
| GPT-OSS-120B | 4.4 | — | 1.9 | 3.0 | · | 7.2 | 9.1 | 3.5 | 3.4 | · |
| Gemini 3.1 Pro | 3.4 | 0.7 | — | 3.9 | 9.4 | 3.8 | 4.0 | 3.8 | 1.8 | 0.5 |
| Claude Opus 4.6 | 6.5 | · | 0.2 | — | 4.0 | 5.8 | 7.4 | 5.2 | 4.5 | · |
| GPT-5.4 | 2.6 | · | 0.7 | 3.8 | — | 6.0 | 8.8 | 3.8 | 2.6 | 0.2 |
| Grok 4.20 | 6.0 | 10.0 | 1.9 | 4.5 | 8.3 | — | 4.5 | 4.4 | 5.5 | · |
| Claude Sonnet 4.6 | 8.0 | · | 1.0 | 5.0 | 7.0 | 8.3 | — | 7.3 | 4.8 | · |
| MiMo-V2-Flash | 7.8 | · | 2.2 | 8.2 | 8.7 | 9.0 | 6.0 | — | 6.0 | · |
| Gemini 2.5 Flash | 6.8 | · | · | 7.6 | 10.0 | 9.8 | 7.0 | 4.4 | — | · |
| MiniMax M2.5 | 5.8 | · | 1.4 | 7.0 | 9.6 | · | · | 6.5 | 4.8 | — |