Monday, March 17, 2025

If You Use AI Search at Work, Be Careful: A New Study Finds It Lies

With market leader Google leaning harder into AI-generated search results (after a stuttering start last year) and peer companies like OpenAI also trying out this innovative tech, it really does seem like AI is the future of online search. That will have implications for workers at pretty much any company, no matter the industry, because searching for information is such a fundamental part of the internet experience. But a new study from Columbia University’s Tow Center for Digital Journalism, featured in the Columbia Journalism Review, highlights that your staff needs to be really careful, at least for now, because AI search tools from several major makers have serious accuracy problems. The study concentrated on eight different AI search tools, including ChatGPT, Perplexity, Google’s Gemini, Microsoft’s Copilot, and the industry-upending Chinese tool DeepSeek; it centered around the accuracy of answers when each AI was quizzed about a news story, tech news site Ars Technica reported. The big takeaway from the study is that all the AIs demonstrated stunningly bad accuracy, answering 60 percent of the queries incorrectly. Not all the AIs were as bad as each other. Perplexity was incorrect about 37 percent of the time, while ChatGPT had a 67 percent error rate. Elon Musk’s Grok 3 model scored the worst, being incorrect 94 percent of the time—perhaps to no one’s surprise, given that Musk has touted the model as being limited by fewer safety constraints than rival AIs. (The billionaire also has a somewhat freewheeling attitude to facts and free speech.) Worse still, the researchers noted that premium, paid-for versions of these search tools sometimes fared worse than their free alternatives. It’s worth noting that AI search is slightly different to using an AI chatbot, which is more of a conversation. AI search typically sees the search engine trying to do the search for you after you type in your query, summarizing what it thinks are the important details from what it’s found online, so you don’t have to go and read the original article where the data comes from. But the problem here centers around the fact that, just like that one overconfident colleague who always seems to know the truth no matter what’s being discussed, these AI models just don’t like to admit they don’t know the answer to a query. The study’s authors noted that instead of saying “no” when they weren’t able to find reliable information in answer to a query on a news story, the AI frequently served up made-up, plausible-seeming, but actually incorrect answers. Another wrinkle detected by this study is that even when these AI search tools delivered citations to go alongside their search results (ostensibly so that users can then visit these source sites to double check any details, or to verify if the data is true) these citation links often led to syndicated versions of the content rather than the original publishers’ versions. Sometimes these links just led to web addresses that didn’t exist—Gemini and Grok 3 did this for more than half of their citations. Why should you care about this? The experiment was a bit niche, since it was based on news articles, and the researchers didn’t look deeply into the accuracy of AI search results for other content found online. Instead, they fed excerpts from real news pieces into the AI tools then asked them to summarize information, including the headline and other details. You should care for one simple reason. We know that AI can speed up some humdrum office tasks and boost employee efficiency. And it seems like AI search may become the norm, replacing traditional web searching, which can sometimes be a laborious task. But if your team is, for example, looking for background information to include in a piece of content you’re going to publish, or even looking for resources online before starting a new project, they need to be super careful about trusting the results that AI search tools deliver. Imagine if you published something on your company’s site only to learn that it was actually made up by an AI search tool that wasn’t prepared to say it didn’t know the actual answer. This is another version of the well-known AI hallucination problem, and it’s yet more proof that if you’re using AI tools to boost your company’s efforts, you definitely need savvy humans in the loop to check the AI’s output. BY KIT EATON @KITEATON

No comments: