Will Smith eating spaghetti and other weird AI benchmarks that took off in 2024

When a company releases a new AI video generator, it’s not long before someone uses it to make a video of actor Will Smith eating spaghetti.

It’s become something of a meme as well as a benchmark: Seeing whether a new video generator can realistically render Smith slurping down a bowl of noodles. Smith himself parodied the trend in an Instagram post in February.

https://platform.twitter.com/embed/Tweet.html?creatorScreenName=TechCrunch&dnt=false&embedId=twitter-widget-0&features=eyJ0ZndfdGltZWxpbmVfbGlzdCI6eyJidWNrZXQiOltdLCJ2ZXJzaW9uIjpudWxsfSwidGZ3X2ZvbGxvd2VyX2NvdW50X3N1bnNldCI6eyJidWNrZXQiOnRydWUsInZlcnNpb24iOm51bGx9LCJ0ZndfdHdlZXRfZWRpdF9iYWNrZW5kIjp7ImJ1Y2tldCI6Im9uIiwidmVyc2lvbiI6bnVsbH0sInRmd19yZWZzcmNfc2Vzc2lvbiI6eyJidWNrZXQiOiJvbiIsInZlcnNpb24iOm51bGx9LCJ0ZndfZm9zbnJfc29mdF9pbnRlcnZlbnRpb25zX2VuYWJsZWQiOnsiYnVja2V0Ijoib24iLCJ2ZXJzaW9uIjpudWxsfSwidGZ3X21peGVkX21lZGlhXzE1ODk3Ijp7ImJ1Y2tldCI6InRyZWF0bWVudCIsInZlcnNpb24iOm51bGx9LCJ0ZndfZXhwZXJpbWVudHNfY29va2llX2V4cGlyYXRpb24iOnsiYnVja2V0IjoxMjA5NjAwLCJ2ZXJzaW9uIjpudWxsfSwidGZ3X3Nob3dfYmlyZHdhdGNoX3Bpdm90c19lbmFibGVkIjp7ImJ1Y2tldCI6Im9uIiwidmVyc2lvbiI6bnVsbH0sInRmd19kdXBsaWNhdGVfc2NyaWJlc190b19zZXR0aW5ncyI6eyJidWNrZXQiOiJvbiIsInZlcnNpb24iOm51bGx9LCJ0ZndfdXNlX3Byb2ZpbGVfaW1hZ2Vfc2hhcGVfZW5hYmxlZCI6eyJidWNrZXQiOiJvbiIsInZlcnNpb24iOm51bGx9LCJ0ZndfdmlkZW9faGxzX2R5bmFtaWNfbWFuaWZlc3RzXzE1MDgyIjp7ImJ1Y2tldCI6InRydWVfYml0cmF0ZSIsInZlcnNpb24iOm51bGx9LCJ0ZndfbGVnYWN5X3RpbWVsaW5lX3N1bnNldCI6eyJidWNrZXQiOnRydWUsInZlcnNpb24iOm51bGx9LCJ0ZndfdHdlZXRfZWRpdF9mcm9udGVuZCI6eyJidWNrZXQiOiJvbiIsInZlcnNpb24iOm51bGx9fQ%3D%3D&frame=false&hideCard=false&hideThread=false&id=1868809004400754871&lang=en&origin=https%3A%2F%2Ftechcrunch.com%2F2024%2F12%2F31%2Fwill-smith-eating-spaghetti-and-other-weird-ai-benchmarks-that-took-off-in-2024%2F&sessionId=2e0a428df7edf9958518ce9bc11684bb19d15f41&siteScreenName=TechCrunch&theme=light&widgetsVersion=2615f7e52b7e0%3A1702314776716&width=550px

Will Smith and pasta is but one of several bizarre “unofficial” benchmarks to take the AI community by storm in 2024. A 16-year-old developer built an app that gives AI control over Minecraft and tests its ability to design structures. Elsewhere, a British programmer created a platform where AI plays games like Pictionary and Connect 4 against each other.

It’s not like there aren’t more academic tests of an AI’s performance. So why did the weirder ones blow up?

LLM Pictionary
<strong>Image Credits<strong>Paul Calcraft

For one, many of the industry-standard AI benchmarks don’t tell the average person very much. Companies often cite their AI’s ability to answer questions on Math Olympiad exams, or figure out plausible solutions to PhD-level problems. Yet most people — yours truly included — use chatbots for things like responding to emails and basic research.

Crowdsourced industry measures aren’t necessarily better or more informative.

TC Sessions: AI
Join 1,200 tech leaders for a full day of main-stage sessions, breakouts and networking at TechCrunch Sessions: AI. Get on the waitlist to be among the first for early registrations.

Zellerbach Hall, UC Berkeley | June 5, 2025

GET ON WAITLIST

Take, for example, Chatbot Arena, a public benchmark many AI enthusiasts and developers follow obsessively. Chatbot Arena lets anyone on the web rate how well AI performs on particular tasks, like creating a web app or generating an image. But raters tend not to be representative — most come from AI and tech industry circles — and cast their votes based on personal, hard-to-pin-down preferences.

LMSYS
The Chatbot Arena interface<strong>Image Credits<strong>LMSYS

Ethan Mollick, a professor of management at Wharton, recently pointed out in a post on X another problem with many AI industry benchmarks: they don’t compare a system’s performance to that of the average person.

“The fact that there are not 30 different benchmarks from different organizations in medicine, in law, in advice quality, and so on is a real shame, as people are using systems for these things, regardless,” Mollick wrote.

Weird AI benchmarks like Connect 4, Minecraft, and Will Smith eating spaghetti are most certainly not empirical — or even all that generalizable. Just because an AI nails the Will Smith test doesn’t mean it’ll generate, say, a burger well.

Mcbench
Note the typo theres no such model as Claude 36 Sonnet<strong>Image Credits<strong>Adonis Singh

One expert I spoke to about AI benchmarks suggested that the AI community focus on the downstream impacts of AI instead of its ability in narrow domains. That’s sensible. But I have a feeling that weird benchmarks aren’t going away anytime soon. Not only are they entertaining — who doesn’t like watching AI build Minecraft castles? — but they’re easy to understand. And as my colleague Max Zeff wrote about recently, the industry continues to grapple with distilling a technology as complex as AI into digestible marketing.

The only question in my mind is, which odd new benchmarks will go viral in 2025?

TechCrunch has an AI-focused newsletter! Sign up here to get it in your inbox every Wednesday.

Tags: No tags

Leave A Comment

Your email address will not be published. Required fields are marked *