Sunday, April 27, 2025

OpenAI’s New Models Could Be Its Smartest and Most Powerful, Thanks to This Feature

On Wednesday, OpenAI introduced two new artificial intelligence models to the world. The models, named o3 and o4-mini, are both part of the Sam Altman-led company’s “O” series of reasoning models, which means they’re capable of taking time to “think” through how to best answer a query. OpenAI says o3 and o4-mini are “our smartest and most capable models to date,” thanks to one special feature: tool use. In the AI industry, “tools” usually refer to special abilities that can be bestowed upon an AI model, like the ability to write and run code, search the internet, use a web browser, and parse through internal databases. These abilities are what transform AI models into AI agents, and o3 and o4-mini are OpenAI’s first reasoning models with access to these tools. When you ask o3 or o4-mini a question, it will spend a short amount of time thinking through which tools would be most useful for completing the tasks. It then starts a multi-step process to answer the question. For example, when asked to predict how Donald Trump’s proposed tariffs will affect the burgeoning American AI industry, o3 thought for 25 seconds and then delivered a report sourced from recent articles by Time, Reuters, Axios, and Forbes. The report found, in part, that tariffs would make the hardware that powers AI “noticeably pricier … Expect higher upfront costs, a squeeze on smaller AI outfits, and a brand‑new round of supply‑chain gymnastics.” However, there’s a catch here: Because the model’s training data only goes up to June 2024, the model assumed the tariffs in question referred to Trump’s earlier suggestion of 10 percent tariffs across the board and a 60 percent tariff on China. Of course, American tariffs on Chinese imports are now up to 245 percent. This process isn’t entirely new, and should be familiar to anyone who has used ChatGPT’s Deep Research feature, which similarly turns the platform into an AI agent that can scour the internet and create a lengthy report on the topic of your choosing. Deep Research was actually built on an earlier version of the o3 model, but the current version of o3 is much faster, although may return less thorough answers. While o3 is designed to be a research whiz, o4-mini was created to serve as a coding companion. The model has set new benchmarks across several software engineering tests, and “performs especially strongly at visual tasks like analyzing images, charts, and graphics.” In examples, OpenAI researchers showed that both models will double-check answers to math questions, and will explain how they came to their conclusions. On X, Sam Altman opined on the new models’ capabilities, writing that “the ability of the new models to effectively use tools together has somehow really surprised me. Intellectually i knew this was going to happen but it hits different to see it.” Entrepreneurs are only just getting their hands on the new models now, but they could be helpful for automating internal workflows that were previously too complicated, for example. BY BEN SHERRY @BENLUCASSHERRY

No comments: