If you haven’t yet watched yesterday’s OpenAI event, I highly recommend doing so. The headline news was that the latest GPT-4o model works seamlessly with any combination of text, audio, and video.
That includes the ability to ‘show’ the GPT-4o app a screen recording you are taking of another app – and it’s this capability the company showed off with a pretty insane iPad AI tutor demo …
GPT-4o
OpenAI said that the ‘o’ stands for ‘omni.’
GPT-4o (“o” for “omni”) is a step towards much more natural human-computer interaction—it accepts as input any combination of text, audio, and image and generates any combination of text, audio, and image outputs.
It can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time(opens in a new window) in a conversation […] GPT-4o is especially better at vision and audio understanding compared to existing models.
Even the voice aspect of this is a big deal. Previously, ChatGPT could accept voice input, but it converted it to text before working with it. GPT-4o, in contrast, actually understands speech, so completely skips the conversion stage.
As we noted yesterday, free users also get a lot of features previously limited to paying subscribers.
AI iPad tutor demo
One of the capabilities OpenAI demonstrated was the ability of GPT-4o to watch what you’re doing on your iPad screen (in split-screen mode).
The example shows the AI tutoring a student with a math problem. You can hear that, initially, GPT-4o understood the problem and wanted to immediately solve it. But the new model can be interrupted, and in this case it was asked to help the student solve it himself.
Another capability seen here is that the model claims to detect emotion in speech, and can also express emotions itself. For my tastes, this was rather overdone in the demo version, and that’s reflected here – the AI is maybe a bit on the condescending side. But that’s all tuneable.
Effectively, every student in the world could have a private tutor with this kind of capability.
How much of this will Apple incorporate?
We know that AI is the primary focus of iOS 18, and that it is finalizing a deal to bring OpenAI features to Apple devices. While at the time that was described as being for ChatGPT, it now seems pretty likely that the actual deal is for access to GPT-4o.
But we also know that Apple has been working on its own AI models, with its own data centers running its own chips. For example, Apple has been working on its own way to allow Siri to make sense of app screens.
So we don’t know exactly which GPT-4o capabilities the company will bring to its devices, but this one seems so perfectly Apple that I have to believe it will be included. This is truly using technology to empower people.
Image: OpenAI. Benjamin Mayo contributed to this report.
FTC: We use income earning auto affiliate links. More.