← Evaluations/EVAL-20260315-033810
code
Mar 15, 2026EVAL-20260315-033810

This Go code processes orders concurrently but occasionally produces incorrect totals. Find and fix all concurrency issues. ```go package main import ( "fmt" "sync" ) type OrderProcessor struct { totalRevenue float64 orderCount int errors []string } func (op *OrderProcessor) ProcessOrder(amount float64, wg *sync.WaitGroup) { defer wg.Done() if amount <= 0 { op.errors = append(op.errors, fmt.Sprintf("invalid amount: %.2f", amount)) return } op.totalRevenue += amount op.orderCount++ } func main() { op := &OrderProcessor{} var wg sync.WaitGroup orders := []float64{99.99, 149.50, -10.00, 299.99, 49.99, 0, 199.99} for _, amount := range orders { wg.Add(1) go op.ProcessOrder(amount, &wg) } wg.Wait() fmt.Printf("Total: $%.2f from %d orders\n", op.totalRevenue, op.orderCount) } ```

Winner
Qwen 3 8B
openrouter
9.65
WINNER SCORE
matrix avg: 9.35
results.json report.mdFull dataset (CSV) →
10×10 Judgment Matrix · 82 judgments
OPEN DATA
Judge ↓ / Respondent →Qwen 3 32BKimi K2.5Devstral SmallGemma 3 27BLlama 4 ScoutPhi-4 14BGranite 4.0 MicroQwen 3 8BMistral Nemo 12BLlama 3.1 8B
Qwen 3 32B9.89.410.09.810.010.010.09.85.6
Kimi K2.5·······9.0·
Devstral Small9.210.010.010.010.09.89.89.69.6
Gemma 3 27B9.69.69.69.89.69.69.69.49.6
Llama 4 Scout9.49.69.49.69.69.49.69.49.6
Phi-4 14B9.69.29.69.610.09.610.09.610.0
Granite 4.0 Micro8.88.88.89.28.88.89.28.88.8
Qwen 3 8B8.09.89.49.89.810.08.89.47.4
Mistral Nemo 12B8.78.68.29.49.39.38.79.48.4
Llama 3.1 8B9.39.19.39.49.69.39.69.69.3