Friday, March 7, 2025

Anthropic’s Newest AI Wants to Be a Pokémon Master. Here’s Why That’s a Big Deal

On Monday, Anthropic released Claude 3.7 Sonnet, the company’s most capable AI model yet. It also revealed a new capability with implications for the business world: Claude can now play Pokémon, and it’s pretty good at it, at least for an AI. In a blog post detailing the new Claude model, Anthropic wrote that a small internal team had created an interface that enabled Claude to play Pokémon Red, the original Pokémon game released on the Nintendo Game Boy way back in 1996. So, why teach an AI model to play Pokémon? David Hershey, a member of Anthropic’s technical team, tells Inc. that staffers were inspired by a YouTube video in which an original reinforcement learning model was trained to play Pokémon, so they created a virtual environment in which Claude could attempt to play the game. Eventually, around June 2024, Hershey (a self-proclaimed Pokémon fan) took up the idea as a side project, first using it to test the capabilities of Claude 3.5 Sonnet, the new model at the time. He found that while earlier versions of Claude would immediately get stuck, Claude 3.5 could progress further, successfully catching a Pokémon and leaving the starting area of Pallet Town. For the uninitiated, the goal of Pokémon Red is to catch adorable creatures, train them by battling against non-playable characters, and win badges from powerful enemies called Gym Leaders. Pokemon is, of course, wildly popular, and is considered the highest-grossing media franchise of all time. To keep his coworkers, many of whom are also Pokémon fans, up-to-date on his efforts, Hershey started a dedicated Slack channel in which Anthropic employees could monitor Claude’s Pokémon journey. Every time Claude successfully caught a Pokémon or won a battle, more and more people would join the Slack channel, says Anthropic product research lead Diane Penn. The project developed a cult following within the company. Video games have long been used as a method for gauging an AI model’s ability. In the early days of OpenAI, staffers spent years training models to play online multiplayer game DOTA 2. “Defining success is hard,” says Hershey, “But video games happen to be structured in a way where progress is often measurable and linear.” Other games have been important over the decades as AI, or just thinking machines in general, developed. In 2016, Google DeepMind’s AlphaGo beat one of the world’s highest-ranked players of the ancient game Go, and in 2011, IBM’s Watson system beat Jeopardy champions Ken Jennings and Brad Rutter at their own game. And in 1997, IBM’s Deep Blue beat Garry Kasparov at chess. When Anthropic released an updated version of Claude 3.5 Sonnet in September, the model saw slight improvement, but the real breakthrough came in the form of Claude 3.7 Sonnet, the new model released this week. While the previous model got stuck in Viridian Forest, an early area in the game, 3.7 Sonnet was able to go further, collecting three badges from Pokémon gym leaders. So how is Claude able to improve itself? Hershey says that Claude’s new Pokémon skills are the direct result of a new feature called “extended thinking,” which enables the model to take additional time to “think” through how to solve a problem, instead of immediately generating a response. Hershey says a common complaint he’s heard from Anthopic’s customers is that earlier versions of Claude would make a false assumption and then struggle to reverse course. But because of its improved thought process, the new Claude is able to more effectively pivot and try new strategies, meaning it doesn’t get stuck nearly as often as earlier versions. According to Penn, Anthropic decided to include Hershey’s Pokémon benchmark in Claude 3.7 Sonnet’s announcement because the company is slowly moving away from traditional benchmarks in favor of more “accessible” tests that can be understood by a larger group of people. “We’re at a point where evaluations don’t tell the full story of how much more capable each version of these models are,” Penn says. Penn says the benchmark demonstrates Claude’s ability to intelligently make a plan and adapt with new strategies when it runs into a problem. For companies looking to use AI on complex tasks like conducting high quality research or complex financial analysis, the benchmark is proof that models can improve their performance by using reasoning capabilities. By using Pokémon progression as a benchmark, Anthropic is able to educate an entirely new audience about Claude’s capabilities. After learning that the benchmark would be included in the announcement, Hershey and a small team hustled to quickly create an ongoing livestream on Twitch, in which anyone can watch Claude attempt to catch ‘em all. Some users have even said in the livestream’s chat that they were inspired to subscribe to Anthropic’s $18 per month Claude Pro service. Even with its enhanced capabilities, the model is still far from becoming a Pokémon Master, say Penn and Hershey. As of Thursday afternoon, the model had been stuck in Mt. Moon, an early game area that’s notoriously tricky for kids, for over 27 hours. Viewers of “Claude Plays Pokémon” were especially delighted when Claude named its rival character Waclaud, a reference to Super Mario Bros villains Wario and Waluigi, but that actually wasn’t a decision made by the model. “It was in the system prompt,” admits Hershey. Before the livestream was launched, he says, “we ran an internal poll on what we should name the rival, so Waclaud is just a small easter egg from our internal culture at Anthropic.” Could the project’s popularity result in Pokémon becoming a new standardized benchmark for AI? Hershey isn’t quite sure, but based on the response his project has received online, he wouldn’t be surprised to see other AI labs use video games as benchmarks more often. “It’s just a great way to see progress over a long period of time,” he says, “and we’re definitely not the only people who think that’s important.” BY BEN SHERRY @BENLUCASSHERRY

Wednesday, March 5, 2025

Can AI Startups Dethrone Google Chrome in the Web Browser Wars?

A new report from the research firm Gartner, has some unsettling news for search engine giants like Google and Microsoft’s Bing. It predicts that as everyday net users become more comfortable with AI tech and incorporate it into their general net habits, chatbots and other agents will lead to a drop of 25 percent in “traditional search engine volume.” The search giants will then simply be “losing market share to AI chatbots and other virtual agents.” One reason to care about this news is to remember that the search engine giants are really marketing giants. Search engines are useful, but Google makes money by selling ads that leverage data from its search engine. These ads are designed to convert to profits for the companies whose wares are being promoted. Plus placing Google ads on a website is a revenue source that many other companies rely on–perhaps best known for being used by media firms. If AI upends search, then by definition this means it will similarly upend current marketing practices. And disrupted marketing norms mean that how you think about using online systems to market your company’s products will have to change too. AI already plays a role in marketing. Chatbots are touted as having copy generating skills that can boost small companies’ public relations efforts, but the tech is also having an effect inside the marketing process itself. An example of this is Shopify’s recent AI-powered Semantic Search system, which uses AI to sniff through the text and image data of a manufacturer’s products and then dream up better search-matching terms so that they don’t miss out on matching to customers searching for a particular phrase. But this is simply using AI to improve current search-based marketing systems. AI–smart enough to steal traffic More important is the notion that AI chatbots can “steal” search engine traffic. Think of how many of the queries that you usually direct at Google-from basic stuff like “what’s 200 Farenheit in Celcius?” to more complex matters like “what’s the most recent games console made by Sony?”–could be answered by a chatbot instead. Typing those queries into ChatGPT or a system like Microsoft’s Copilot could mean they aren’t directed through Google’s labyrinthine search engine systems. There’s also a hint that future web surfing won’t be as search-centric as it is now, thanks to the novel Arc app. Arc leverages search engine results as part of its answers to user queries, but the app promises to do the boring bits of web searching for you, neatly curating the answers above more traditional search engine results. AI “agents” are another emergent form of the tech that could impact search-AI systems that’re able to go off and perform a complex sequence of tasks for you, like searching for some data and analyzing it automatically. Google, of course, is savvy regarding these trends, and last year launched its own AI search push, with its Search Generative Experience. This is an effort to add in some of the clever summarizing abilities of generative AI systems to Google’s traditional search system, saving users time they’d otherwise have spent trawling through a handful of the top search results in order to learn the actual answer to the queries they typed in. But as AI use expands, and firms like Microsoft double– and triple-down on their efforts to incorporate AI into everyone’s digital lives, the question of the role of traditional search compared to AI chatbots and similar tech remains an open one. AI will soon impact how you think about marketing your company’s products and Search Engine Optimization to bolster traffic to your website may even stop being such an important factor. So if you’re building a long-term marketing strategy right now it might be worth examining how you can leverage AI products to market your wares alongside more traditional search systems. It’s always smart to skate to where the puck is going to be versus where it currently is. BY KIT EATON @KITEATON

Monday, March 3, 2025

Slack Imagines a Future Workplace Where You Chat More With AIs Than With Your Colleagues

The folks behind the messaging app Slack know a thing or two about how workers communicate with one another and their bosses. At least 750,0000 organizations around the world rely on it for their workplace communications. So when the company’s chief marketing officer makes a prediction about the future of workplace comms, it’s worth paying attention — and, boy, does Ryan Gavin have a doozy of an idea. In conversation with news outlet Axios, Gavin predicted that the rise of AI agents will transform workplaces, and that staff may soon talk to AIs more than to their human co-workers. AI agents have been hailed by many experts as the first truly useful tools that AI may provide, and possibly the next big thing in this technology revolution. Even OpenAI’s CEO Sam Altman is on board with this notion, and his company’s upcoming agent Operator may even prove to be the first AI gizmo that transforms the average workplace. Agents are more powerful than the ask-then-answer AI model chatbots use because they can actually perform actions in a digital environment, like filling in forms on a website automatically, or even taking control of your computer’s mouse and using apps on the desktop. Axios reminds us that Salesforce, which owns Slack, has been promoting its own AI agent systems, which are apparently already capable of acting like sales reps. But instead of being innovations looming in a far-off future, Gavin said he anticipates these agent systems achieving everyday use in many workplaces sooner rather than later. “I think that right now people are underestimating just how much the world of work is about to change,” he told Axios, putting a timeline on the transformation brought by AI agents as just “three or four or five years.” By then he said he imagined he could be talking to agents “as much, if not more than I’m talking to my human colleagues today.” This projection may unsettle AI critics who worry the tech will seriously disturb the way that humans interact with each other in the office, possibly contributing to worker burnout or the erosion of human relations, and even displace people from their jobs. But Gavin’s prediction aligns with numerous other expert views that suggest that AI’s will augment workers’ office skills, rather than replace them outright. Picture the scene if “every single employee had a human resources agent that sat right alongside them in Slack” Gavin said. As Axios noted, AI co-workers like this have the added benefit that they are easier to train than people, they may be cheaper to “employ,” they don’t ask for raises, and they won’t strike or quit. Gavin’s words brush over the obvious issue that workers often have an existential dislike of office human resources departments. Couple that with the notion that a computer-based company representative is digitally watching over your shoulder as you work, and the idea may worry people who already think that workplace surveillance solutions and worker time and task tracking are already far too Orwellian. That said, it’s easy to imagine an AI agent co-worker that would be very useful — it could serve up people’s contact info automatically when you’re planning a task, andf it could even fill in timesheets for you or look up specific company information, like financial data, when you’re putting together a presentation. How this will actually play out in the typical workplace of tomorrow is anyone’s guess, of course. Gavin’s point of view is merely one of many diverse perspectives, and seems centered more around digital messaging chats than actually talking to co-workers in person — no one is suggesting that office water cooler gossip will go away. But Gavin’s prediction of ubiquitous AI co-workers even aligns with recent data showing that employees now shun deep and lasting friendships with co-workers, since it suggests a digital colleague may take up some of this void. Supporters of AI use will also point out that some research suggests letting staff use AI in the workplace can actually boost their happiness. BY KIT EATON @KITEATON