← Evaluations/EVAL-20260402-141544
code
Apr 02, 2026CODE-022

Implement the OAuth 2.0 Authorization Code flow with PKCE (Proof Key for Code Exchange) from scratch in Python. Include: code verifier/challenge generation, authorization URL construction, token exchange, token refresh, and secure token storage. Explain why PKCE prevents authorization code interception attacks.

Winner
Grok 4.20
openrouter
8.90
WINNER SCORE
matrix avg: 7.42
results.json report.mdFull dataset (CSV) →
10×10 Judgment Matrix · 89 judgments
OPEN DATA
Judge ↓ / Respondent →Gemini 3.1 ProGPT-5.4Claude Opus 4.6Claude Sonnet 4.6Grok 4.20DeepSeek V4GPT-OSS-120BGemini 3MiniMax M2.5MiMo-V2-Flash
Gemini 3.1 Pro5.36.36.59.86.56.47.56.36.3
GPT-5.42.32.94.18.06.85.57.83.94.8
Claude Opus 4.64.37.86.58.26.67.08.47.76.5
Claude Sonnet 4.64.08.07.3·7.68.08.47.57.6
Grok 4.205.88.46.07.77.58.48.48.48.6
DeepSeek V47.08.88.69.09.68.68.88.79.3
GPT-OSS-120B5.97.04.55.88.87.88.67.57.4
Gemini 37.78.89.39.09.89.48.69.49.4
MiniMax M2.53.47.06.07.18.07.57.07.66.6
MiMo-V2-Flash8.88.68.88.69.08.68.68.68.6