Google, OpenAI, and Anthropic are competing to see whose AI can play Pokémon the best — Twitch streams of beloved RPG game test the models' true might

While innumerable benchmarks and tests exist to determine the savvy and capabilities of AI, one perhaps more obscure benchmark appears to be making waves in the AI community. According to a new report, companies like Google, OpenAI, and Anthropic are now making their models play old-school Pokémon to evaluate performance, as reported by the Wall Street Journal.

“The thing that has made Pokémon fun and that has captured the [machine learning] community’s interest is that it’s a lot less constrained than Pong or some of the other games that people have historically done this on. It’s a pretty hard problem for a computer program to be able to do,” Anthropic AI lead David Hershey told the outlet.

Visual explainer on how Claude plays Pokémon

Visual explainer on how Claude plays Pokémon (Image credit: ClaudePlaysPokémon on Twitch)

It all started last year when Claude — Anthropic’s frontier LLM — was put on a Twitch stream by Hershey, dubbed “Claude Plays Pokémon.” David is the applied AI lead at Anthropic, meaning his job is to help customers deploy the AI, so this is just another way of testing the models. Claude’s gaming efforts have inspired freelance developers to put up similar “Gemini Plays Pokémon” and “GPT Plays Pokémon” streams, too.

These projects have received official recognition from Google and OpenAI, with their labs even stepping in to tweak the models sometimes. Such deliberation has allowed both Gemini and GPT to already beat Pokémon Blue, so they’ve moved on to the sequels, but no version of Claude has pulled through yet. The latest Opus 4.5 model is currently busy tackling the challenge on stream.

Claude playing Pokémon on Twitch with chat helping/cheering it on, on the side

(Image credit: ClaudePlaysPokémon on Twitch)

David says that using Pokémon to test these AI models is quite beneficial as “it provides [us] with, like, this great way to just see how a model is doing and to evaluate it in a quantitative way.” In the game, you have to level up, train your existing roster, and capture new Pokémon by beating their gym masters. It’s not a simple linear progression, but one that requires judgment.


Source: www.tomshardware.com…

We will be happy to hear your thoughts

Leave a reply

FOR LIFE DEALS
Logo
Register New Account
Compare items
  • Total (0)
Compare
0