Chatbots Are Cheating on Their Benchmark Tests
Generative-AI companies have been selling a narrative of unprecedented, endless progress. Just last week, OpenAI introduced GPT-4.5 as its “largest and best model for chat yet.” Earlier in February, Google called its latest version of Gemini “the world’s best AI model.” And in January, the Chinese company DeekSeek touted its R1 model as being just as powerful as OpenAI’s o1 model—which Sam Altman had called “the smartest model in the world” the previous month.