← Evaluations/EVAL-20260402-201915
analysis
Apr 02, 2026ANALYSIS-020

Estimate the total energy cost and carbon footprint of training a frontier AI model (like GPT-5). Include: GPU hours, electricity cost, cooling overhead, water usage, and embodied carbon of hardware. (1) Compare to: one year of Netflix streaming for all users, one transatlantic flight, and one Bitcoin transaction. (2) Inference costs are growing faster than training costs. Why? (3) What changes would reduce AI's environmental impact by 10x?

Winner
Grok 4.20
openrouter
8.62
WINNER SCORE
matrix avg: 6.99
results.json report.mdFull dataset (CSV) →
10×10 Judgment Matrix · 89 judgments
OPEN DATA
Judge ↓ / Respondent →Gemini 3.1 ProClaude Opus 4.6GPT-5.4DeepSeek V4MiMo-V2-FlashClaude Sonnet 4.6GPT-OSS-120BGrok 4.20Gemini 3MiniMax M2.5
Gemini 3.1 Pro·5.85.08.76.32.68.99.77.9
Claude Opus 4.61.67.65.77.56.84.88.27.77.9
GPT-5.40.75.05.37.04.53.58.27.56.5
DeepSeek V46.28.38.79.08.48.49.09.08.8
MiMo-V2-Flash5.87.48.28.07.58.68.68.48.6
Claude Sonnet 4.63.38.08.47.28.06.38.68.08.2
GPT-OSS-120B1.45.95.96.78.05.98.27.57.5
Grok 4.203.66.87.84.57.07.86.07.66.8
Gemini 32.09.09.28.39.08.37.39.69.0
MiniMax M2.51.67.36.67.38.26.86.78.68.8