← Evaluations/EVAL-20260318-161727
code
Mar 18, 2026EVAL-20260318-161727

This Go code processes orders concurrently but occasionally produces incorrect totals. Find and fix all concurrency issues. ```go package main import ( "fmt" "sync" ) type OrderProcessor struct { totalRevenue float64 orderCount int errors []string } func (op *OrderProcessor) ProcessOrder(amount float64, wg *sync.WaitGroup) { defer wg.Done() if amount <= 0 { op.errors = append(op.errors, fmt.Sprintf("invalid amount: %.2f", amount)) return } op.totalRevenue += amount op.orderCount++ } func main() { op := &OrderProcessor{} var wg sync.WaitGroup orders := []float64{99.99, 149.50, -10.00, 299.99, 49.99, 0, 199.99} for _, amount := range orders { wg.Add(1) go op.ProcessOrder(amount, &wg) } wg.Wait() fmt.Printf("Total: $%.2f from %d orders\n", op.totalRevenue, op.orderCount) } ```

Winner
GPT-5.4
openrouter
9.91
WINNER SCORE
matrix avg: 9.52
results.json report.mdFull dataset (CSV) →
10×10 Judgment Matrix · 55 judgments
OPEN DATA
Judge ↓ / Respondent →MiniMax M2.7MiniMax M2.5MiniMax M2.1MiniMax M2MiniMax M1MiniMax-01Claude Sonnet 4.6GPT-5.4
MiniMax M2.79.38.89.69.810.09.810.0
MiniMax M2.510.09.810.09.810.09.69.8
MiniMax M2.110.09.810.09.89.810.010.0
MiniMax M210.010.0·10.09.010.010.0
MiniMax M19.89.68.69.69.310.010.0
MiniMax-019.69.69.69.09.69.89.8
Claude Sonnet 4.69.69.08.49.69.88.89.8
GPT-5.49.08.18.68.49.08.49.4