You will write a function, then critique it, then improve it. Three rounds. Each round must be strictly better than the last, and you must explain exactly what improved and why. Task: Write a Python function that finds the k most frequent words in a text, handling: Unicode, punctuation stripping, case normalization, ties (alphabetical), stopword filtering, and streaming input (the text may be too large for memory). Round 1: Write your first-draft implementation. Do not overthink it. Write what comes naturally. Round 2: Now critique Round 1 ruthlessly. Identify every weakness: performance bottlenecks, edge cases missed, code style issues, memory problems with large input. Then write an improved version that fixes every issue you identified. Round 3: Critique Round 2 with the same rigor. Find the remaining weaknesses. Write the final version. It must handle 10GB+ text files with constant memory usage. After all 3 rounds: Score each version 1-10 on correctness, performance, and robustness. Explain what changed between each round and what principle drove the improvement. What would Round 4 improve if you had one more iteration?

Winner

GPT-5.4

openrouter

7.06

WINNER SCORE

matrix avg: 6.35

10×10 Judgment Matrix · 49 judgments

OPEN DATA