Friday, December 19, 2025
OpenAI’s Latest Model Is Scarily Good at These Important Work Functions
If you thought 2025 had a lot of AI-related job displacement, just wait until next year.
OpenAI’s latest AI model, GPT-5.2, achieved a new record in GDPval, an evaluation created by the company in order to track how well AI models perform on economically valuable, real-world tasks.
An AI model being evaluated through GDPval is directed to complete 1,320 tasks traditionally done by humans across 44 occupations in eight sectors: real estate, government, manufacturing, professional services, healthcare, finance, trade, and information. A panel of human judges then decide if the model’s work matches or exceeds the output of a skilled human worker.
With thinking mode enabled, GPT-5.2 matched or exceeded “top industry professionals” on about 71 percent of the tasks, a huge leap from GPT-5’s roughly 40 percent score. The new model took the top spot from Claude Opus 4.5, the current most advanced AI model from Anthropic, which scored about 60 percent, and Google’s Gemini 3 Pro, which scored about 54 percent. OpenAI says GPT-5.2 is “our first model that performs at or above a human expert level.”
GPT-5.2 Pro, a larger and more expensive version of the model, fared even better with a 74.1 percent GDPVal score.
OpenAI wrote that GPT‑5.2 completed the GDPval tasks 11 times faster than expert humans at just 1 percent of the cost, “suggesting that when paired with human oversight, GPT‑5.2 can help with professional work.”
But the model hasn’t crushed all business-focused evaluations. It placed third on Vending-Bench 2, a benchmark that measures AI models’ ability to run a vending machine for a simulated year and scores them based on how much they can grow their cash balance from an initial $500.
GPT-5.2 ended five simulated years with an average balance of $3,952, far below Claude Opus 4.5’s $4,967 average, and leader Gemini 3 Pro’s $5,478. Still, the model was a marked improvement over GPT-5.1, which sits in fifth place with an average balance of $1,473.
BY BEN SHERRY @BENLUCASSHERRY
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment