← Evaluations/EVAL-20260402-134400
code
Apr 02, 2026CODE-017

This Go code processes orders concurrently but occasionally produces incorrect totals. Find and fix all concurrency issues. ```go package main import ( "fmt" "sync" ) type OrderProcessor struct { totalRevenue float64 orderCount int errors []string } func (op *OrderProcessor) ProcessOrder(amount float64, wg *sync.WaitGroup) { defer wg.Done() if amount <= 0 { op.errors = append(op.errors, fmt.Sprintf("invalid amount: %.2f", amount)) return } op.totalRevenue += amount op.orderCount++ } func main() { op := &OrderProcessor{} var wg sync.WaitGroup orders := []float64{99.99, 149.50, -10.00, 299.99, 49.99, 0, 199.99} for _, amount := range orders { wg.Add(1) go op.ProcessOrder(amount, &wg) } wg.Wait() fmt.Printf("Total: $%.2f from %d orders\n", op.totalRevenue, op.orderCount) } ```

Winner
GPT-OSS-120B
OpenAI
9.75
WINNER SCORE
matrix avg: 9.02
results.json report.mdFull dataset (CSV) →
10×10 Judgment Matrix · 89 judgments
OPEN DATA
Judge ↓ / Respondent →MiMo-V2-FlashGPT-5.4Claude Opus 4.6Gemini 3.1 ProClaude Sonnet 4.6Gemini 3Grok 4.20DeepSeek V4GPT-OSS-120BMiniMax M2.5
MiMo-V2-Flash9.09.67.69.38.68.88.610.09.0
GPT-5.48.89.04.09.08.48.68.69.88.4
Claude Opus 4.69.49.88.69.89.09.28.69.88.6
Gemini 3.1 Pro9.89.810.09.69.89.69.810.09.4
Claude Sonnet 4.68.89.89.87.88.68.88.69.68.4
Gemini 39.810.010.09.610.010.09.810.09.8
Grok 4.208.88.88.86.29.08.68.69.08.6
DeepSeek V49.29.6·9.19.69.39.69.89.6
GPT-OSS-120B8.68.89.36.69.19.18.88.68.8
MiniMax M2.58.09.89.36.19.38.88.88.09.8