← Evaluations/EVAL-20260402-122104
code
Jan 27, 2026CODE-003

Review this Flask API endpoint for security vulnerabilities. Identify ALL security issues and explain the fix for each. ```python from flask import Flask, request, jsonify import sqlite3 import pickle import os app = Flask(__name__) @app.route('/api/user/<user_id>') def get_user(user_id): conn = sqlite3.connect('users.db') cursor = conn.cursor() query = f"SELECT * FROM users WHERE id = {user_id}" cursor.execute(query) user = cursor.fetchone() return jsonify({"user": user}) @app.route('/api/upload', methods=['POST']) def upload_file(): file = request.files['file'] filename = file.filename file.save(os.path.join('/uploads', filename)) return jsonify({"status": "uploaded", "path": f"/uploads/{filename}"}) @app.route('/api/settings', methods=['POST']) def update_settings(): data = pickle.loads(request.data) # Process settings... return jsonify({"status": "updated"}) @app.route('/api/redirect') def redirect_user(): url = request.args.get('url') return f'<meta http-equiv="refresh" content="0;url={url}">' ```

Winner
Claude Opus 4.6
openrouter
9.57
WINNER SCORE
matrix avg: 9.03
results.json report.mdFull dataset (CSV) →
10×10 Judgment Matrix · 90 judgments
OPEN DATA
Judge ↓ / Respondent →GPT-5.4Claude Opus 4.6Gemini 3.1 ProClaude Sonnet 4.6Grok 4.20DeepSeek V4GPT-OSS-120BGemini 3MiniMax M2.5MiMo-V2-Flash
GPT-5.48.66.55.08.28.28.28.27.57.8
Claude Opus 4.69.28.89.09.39.29.09.39.09.2
Gemini 3.1 Pro9.310.07.910.09.38.110.08.87.8
Claude Sonnet 4.69.89.89.69.38.89.29.08.69.0
Grok 4.209.29.28.69.08.89.08.88.88.8
DeepSeek V49.69.89.69.89.69.49.69.69.6
GPT-OSS-120B8.69.08.17.58.68.68.68.69.0
Gemini 310.010.09.89.69.89.810.09.89.8
MiniMax M2.510.09.88.67.79.09.69.09.69.4
MiMo-V2-Flash9.310.08.69.09.38.89.310.09.0