Monday, January 27, 2025

Why OpenAI’s Agent Tool May Be the First AI Gizmo to Improve Your Workplace

Many of us have by now chatted to one of the current generation of smart AI chatbots, like OpenAI’s market-leading ChatGPT, either for fun or for genuine help at work. Office uses include assistance with a tricky coding task, or getting the wording just right on that all important PowerPoint briefing that the CEO wants. The notable thing about all these interactions is that they’re one way: the AI waits for users to query it before responding. Tech luminaries insist that next-gen “agentic” AIs are different and can actually act with a degree of autonomy on their user’s behalf. Now rumors say that OpenAI’s agent tool, dubbed Operator, may be ready for imminent release. It could be a game changer. The news comes from a software engineer that news site TechCrunch says has a “reputation for accurately leaking upcoming AI products,” Tibor Blaho. Blaho says he’s found evidence of Operator inside the desktop computer version of OpenAI’s ChatGPT app, and publicly hidden information on OpenAI’s website, including data comparing Operator’s performance to other AI systems. AI agents are snippets of AI-powered code that can be given the ability to “act” in digital environments. This means giving an agent the ability to control a users’ computer, for example, which means it can fill in information on a webform, or even write code. According to OpenAI’s CEO Sam Altman, agents are the next big thing in AI, and they could totally change the way many officer workers spend their day. Different AI companies have already tried releasing agent-based tools, with Google’s system, for example, being designed to let retailers “operate more efficiently and create more personalized shopping experiences to meet the demands of the AI era,” and Salesforce’s “Agentforce” tool able to act like a sales rep. OpenAI’s entry to the agent marketplace could be far more transformational. That’s because if an agent can fill in webforms, that means it could be trusted with some necessary but highly mundane office tasks that eat into worker’s daily hours and potentially impact their ability to make their employers’ more money. For example, remember when your company fired Steve from accounts—the really useful guy who handled your business travel requests—in the name of efficiency? Yup, it meant you and all the other staff had to spend hours wrestling with confusing forms instead of actually working. An AI agent might be able to do most, if not all, of that form-wrangling for you. The one question hovering over OpenAI’s plans is how well Operator will actually work, which will indirectly impact how much time it may be able to save the average office cubicle dweller. The performance numbers Blaho unearthed on OpenAI’s website suggest Operator isn’t totally reliable yet, depending on the task it’s been asked to do. When tasked with signing up to a cloud services provider and launching a virtual machine (a web-based portal to a cloud based computer system) Operator could only succeed 60 percent of the time, the data say. When asked to create a Bitcoin wallet, it only succeeded 10 percent of the time, for example. These are preliminary numbers, and they may change when OpenAI actually does release Operator—which TechCrunch says could happen this month. But they’re an important reminder that, as with other generative AI systems that your office may be trying out, AI just can’t be trusted right now. Before you make decisive choices based on the AI’s advice, or use any other form of AI output, it’s worth running a fact-checking process, to make sure the information is genuine and not “hallucinated” at all. This advice may be doubly relevant when it comes to letting AI agents actually interact with your company’s computers. BY KIT EATON @KITEATON

No comments: