reasoning
Jan 21, 2026REASON-002Five people (Alice, Bob, Carol, Dave, Eve) need to schedule meetings. Use these clues to determine who meets with whom on which day: 1. Each person has exactly one meeting per day (Mon-Fri) 2. Each meeting involves exactly two people 3. No person meets with the same person twice during the week 4. Alice meets with Bob before she meets with Carol 5. Dave's meeting with Eve is exactly two days after Bob's meeting with Carol 6. Carol doesn't have any meetings on Monday or Friday 7. Eve meets with Alice on Wednesday 8. Bob's meeting with Dave is the day after Alice's meeting with Dave 9. The Monday meeting involves neither Dave nor Eve Create a complete schedule showing all meetings for the week.
Winner
GPT-5.4
openrouter
8.91
WINNER SCORE
matrix avg: 4.26
10×10 Judgment Matrix · 68 judgments
OPEN DATA
| Judge ↓ / Respondent → | Gemini 3.1 Pro | DeepSeek V4 | Claude Opus 4.6 | GPT-5.4 | Grok 4.20 | Claude Sonnet 4.6 | MiMo-V2-Flash | GPT-OSS-120B | Gemini 2.5 Flash | MiniMax M2.5 |
|---|---|---|---|---|---|---|---|---|---|---|
| Gemini 3.1 Pro | — | 4.0 | 4.5 | 10.0 | 4.0 | 4.1 | 2.4 | · | 2.8 | · |
| DeepSeek V4 | 2.8 | — | 6.0 | 9.3 | 8.4 | 5.8 | 6.8 | · | 6.8 | · |
| Claude Opus 4.6 | 1.3 | 1.1 | — | 7.5 | 5.6 | 1.6 | 3.0 | · | 2.6 | · |
| GPT-5.4 | 4.3 | 1.8 | 2.6 | — | 5.2 | 1.8 | 1.8 | · | 2.2 | · |
| Grok 4.20 | 4.0 | 2.4 | 2.6 | 9.3 | — | 3.5 | 3.6 | · | 4.4 | · |
| Claude Sonnet 4.6 | 1.9 | 2.6 | 2.8 | 7.3 | 3.8 | — | 3.6 | · | 3.3 | · |
| MiMo-V2-Flash | 1.6 | 2.8 | 3.8 | 9.8 | 4.0 | 4.5 | — | · | 3.0 | · |
| GPT-OSS-120B | 1.6 | 2.4 | 2.6 | 8.6 | 3.5 | 2.4 | · | — | 4.0 | · |
| Gemini 2.5 Flash | · | · | 2.2 | 10.0 | 10.0 | 3.0 | 3.0 | · | — | · |
| MiniMax M2.5 | 2.4 | 3.3 | 5.2 | 8.5 | 6.8 | 4.8 | · | · | 6.3 | — |