Thursday, October 24, 2024

This Expert Says AI Efficiency Soars When It Uses Video to Crunch Numbers

The newest feature of the smartest AI systems is called multimodality, a fancy term that means AIs can respond to prompts besides typed text. You can combine a query like “make up a logo for my fab new vintage furniture refurbishment company” with an image of your favorite shade of yellow, and a shell or other inspirational picture, for example. AI researcher Simon Willison recently found a clever way to use this capability to solve a tedious math problem. The way he did it shines a light on how we may use AI chatbots differently in the near future. Willison was working on one of those everyday accounting tasks that sounds simple but inevitably ends up being time-consuming. He wanted to tally all the charges he’d incurred for using a cloud company’s services. But, as news site ArsTechnica notes, Willison’s data was embedded all over the place in lots of different emails and so on, so finding it all and manually extracting the info would be one of those soul-destroying office jobs. Then inspiration struck. Willison turned on his computer’s “screen recording” system, which creates a video of everything you do on the desktop, and then he navigated between all the different emails and sources of the numbers he needed, simply scrolling past the right data along with all the other info in each message. Then he put that video into Google’s AI Studio system, which, as Ars explains, lets users try out “several versions of Google’s Gemini 1.5 Pro and Gemini 1.5 Flash AI mode” AI systems. Willison prompted the AI to look at the video, telling it to pull out any relevant numbers it could see, and then put them in a specially formatted file that could be easily loaded into a spreadsheet, including specific information like dates and exact price amounts. The task took moments, was effectively free because of the experimental nature of AI Studio, and apparently delivered accurate data that Willison was able to verify—saving him a lot of time. So far, so very nerdy. But why should you care about this feat, other than admiring Willison’s lateral thinking? Because by screen recording like this, Willison noted that there’s no real limit to where the data you’re prompting an AI comes from. There’s “no level of website authentication or anti-scraping technology that can stop me from recording a video of my screen while I manually click around inside a web application,” he noted in a blog, meaning any user could record their scrolling through a website, flicking pages in a complex Excel sheet or even scanning proprietary company emails. This may soon be how we all use AIs in our work and with other tasks. When OpenAI revealed its next-gen ChatGPT model in May, it showed how its computer apps could “watch” what users are doing on screen, acting in a kind of angel-on-your-shoulder role. You could then ask the AI to process what it had seen, without going through the tedious task of typing in lots of words or numbers—similar to Willison’s screen recording. In an office, this means you could get AI help with, say, a complex financial analysis of your company’s revenue data merely by showing it to the AI. There’s an inherent security risk here, obviously: data uploaded to an AI may be used to train its algorithms, and lead to sensitive info “leaking” from the AI to other users later on. Microsoft’s slightly similar, slightly more eerie, AI Recall feature sparked controversy and criticism this summer for similar reasons. But AI systems like this are on the way, and are being threaded ever deeper into our work PCs and smartphones. OpenAI, for example, just revealed its PC ChatGPT app, which will likely be able to do the “watching your screen” trick. And Apple is preparing to release the first integration of ChatGPT into its famous Siri chatbot. All of this is a reminder that learning how to use AIs is not a one-and-done task: You’ll need to keep training your staff to stay up to date on the latest tricks. And you’ll also have to keep reminding them to be wary of showing an AI the wrong kind of sensitive company data. BY KIT EATON @KITEATON

No comments: