Wednesday, February 21, 2024

SORA, OPENAI's MIND-BLOWING GENERATIVE VIDEO AI MODEL, IS A REMINDER THAT EXPECTATIONS ARE EVERYTHING

On Thursday, Sam Altman, the CEO of OpenAI, announced that the company has a new product called Sora that can generate up to 60 seconds of photorealistic video from a text prompt. The company released a series of videos it says were created with Sora, and they are impressive.

Until now, most of the way people interact with generative AI has been through experiences like ChatGPT or Google's Gemini, which allow users to enter text into a prompt and get back a text response. Or perhaps you've used a tool like Dall-E or Midjourney to create images. 



Video is--for obvious reasons--a much harder problem to solve. Still, what OpenAI has shown of Sora so far is pretty wild. The video above was created from the prompt "Beautiful, snowy Tokyo city is bustling. The camera moves through the bustling city street, following several people enjoying the beautiful snowy weather and shopping at nearby stalls. Gorgeous sakura petals are flying through the wind along with snowflakes."

In addition to the videos the company included in its announcement, Altman was taking prompts on X and sharing the results all afternoon. The idea that you can type words into a browser window and a computer will give you back a 60-second photorealistic video is mind-blowing. 

But--and this is an important point--these are highly controlled demos. No one outside of OpenAI is able to type prompts in to test its abilities. Journalists who were briefed on the demos were not able to try it for themselves, probably because Sora isn't yet at a point where it consistently produces good results. Even Altman's examples from user prompts were curated. 

There are really two very different things here, and both are worth considering. The first is that this is an entirely new front for AI. The fact that Sora can produce these results--even occasionally--is both compelling and concerning. OpenAI says it is allowing safety researchers to use Sora in order to create boundaries and limits--presumably because we're about to enter what is surely going to be the most contentious presidential election season ever, and Altman does not want to be sitting before Congress explaining why it isn't OpenAI's fault that its product was used to influence the outcome.

The other is that I have a lot of questions. How long do these videos take to generate? Hours? How consistent is Sora at producing the kind of results the company is showing off as demos? And, maybe more importantly, how will OpenAI prevent them from being used for the inevitable: misinformation and abuse?

I guess there's another obvious question: What is OpenAI using to train Sora? In a statement to Wired, Bill Peebles, who co-led the project, said, "The training data is from content we've licensed and also publicly available content." He did not say what "publicly available content" means, or whether that includes copyrighted material that happens to be on the internet. 

While most people will, rightfully, be focused on the questions of copyright infringement and misinformation, I actually think the second question is more interesting--at least from the perspective of the promise OpenAI is making. Generative AI is notoriously bad at getting the details right. That's true if you're typing things into ChatGPT or Google's Gemini, both of which will just make things up with no relationship to reality. It's also true for photos--and now videos.

One of the examples shown was of an older woman and a cake, presumably for her birthday. Around her are other people clapping. Except, one of the women in the video isn't actually clapping, and her hands move in ways that could only be possible if she has no bones in her fingers. You don't notice it at first, but once you do, you cannot unsee it. 

I'm not criticizing Sora for not being perfect. It isn't even available as a product yet, and when you think about what it's doing, the results so far really are incredible. I do think, however, that OpenAI is creating an expectation, and it'll be very interesting to see whether it can live up to the promise it is making. 

That's just a thing that happens with almost every new product. The company that makes it will roll out very controlled demos to show off its best capabilities. That just makes sense, but it risks a letdown if the finished product doesn't live up to those expectations every time.

Apple's Vision Pro headset is a great example. I was one of the people who had a chance to use it back in June at Apple's developers conference, WWDC, and I was blown away by how good it was. That impression came after a 30-minute demo, however. I only saw what Apple wanted me to see. 

Now that the product is available to anyone, people are poking at its limits and finding out where it doesn't live up to the expectations. Sure, it's still a pretty incredible piece of technology, but the story has gone from how mind-blowing it is to how people are returning them en masse because they can't see themselves using it.

Ultimately, the lesson here is simple: Be careful with the expectations you set. It's not surprising that OpenAI wants to show off how impressive Sora is, but once it's in the hands of 100 million users, their experience is going to be wildly different. If that experience doesn't live up to the promise, that's a lot of disappointed users. 


EXPERT OPINION BY JASON ATEN, TECH COLUMNIST@JASONATEN

No comments: