← Evaluations/EVAL-20260403-151843
code
Apr 03, 2026SLM-001

Summarize this 500-word passage in exactly 50 words while retaining all key claims: [Passage about climate change policy]. This tests whether small models can do precise length-constrained summarization.

Winner
Qwen 3 32B
openrouter
7.87
WINNER SCORE
matrix avg: 5.81
results.json report.mdFull dataset (CSV) →
10×10 Judgment Matrix · 56 judgments
OPEN DATA
Judge ↓ / Respondent →Qwen 3 32BNemotron 3 SuperDevstral SmallGemma 3 27BLlama 4 ScoutGranite 4.0 MicroGemma 3n 4BQwen 3 8BKimi K2.5
Qwen 3 32B4.88.12.08.34.5·8.37.0
Nemotron 3 Super5.25.31.83.21.80.47.85.2
Devstral Small9.32.0·7.87.5·9.32.0
Gemma 3 27B8.36.08.18.14.82.48.47.0
Llama 4 Scout8.36.07.71.05.3·8.31.0
Granite 4.0 Micro8.18.18.37.58.18.38.35.0
Gemma 3n 4B8.14.57.3·8.13.4·4.5
Qwen 3 8B8.16.57.8·7.82.9·7.0
Kimi K2.5········