Apple AI researchers boast useful on-device model that ‘substantially outperforms’ GPT-4

Siri has recently been attempting to describe images received in Messages when using CarPlay or the announce notifications feature. In typical Siri fashion, the feature is inconsistent and with mixed results.

Nevertheless, Apple forges ahead with the promise of AI. In a newly published research paper, Apple’s AI gurus describe a system in which Siri can do much more than try to recognize what’s in an image. The best part? It thinks one of its models for doing this benchmarks better than ChatGPT 4.0.

In the paper (ReALM: Reference Resolution As Language Modeling), Apple describes something that could give a large language model-enhanced voice assistant a usefulness boost. ReALM takes into account both what’s on your screen and what tasks are active. Here’s a snippet from the paper that describes the job:

lockquote class=”wp-block-quote”>

1. On-screen Entities: These are entities that are currently displayed on a user’s screen

2. Conversational Entities: These are entities relevant to the conversation. These entities might come from a previous turn for the user (for example, when the user says “Call Mom”, the contact for Mom would be the relevant entity in question), or from the virtual assistant (for example, when the agent provides a user a list of places or alarms to choose from).

3. Background Entities: These are relevant entities that come from background processes that might not necessarily be a direct part of what the user sees on their screen or their interaction with the virtual agent; for example, an alarm that starts ringing or music that is playing in the background.

lockquote>

If it works well, that sounds like a recipe for a smarter and more useful Siri. Apple also sounds confident in its ability to complete such a task with impressive speed. Benchmarking is compared against OpenAI’s ChatGPT 3.5 and ChatGPT 4.0:

lockquote class=”wp-block-quote”>

As another baseline, we run the GPT-3.5 (Brown et al., 2020; Ouyang et al., 2022) and GPT-4 (Achiam et al., 2023) variants of ChatGPT, as available on January 24, 2024, with in-context learning. As in our setup, we aim to get both variants to predict a list of entities from a set that is available. In the case of GPT-3.5, which only accepts text, our input consists of the prompt alone; however, in the case of GPT-4, which also has the ability to contextualize on images, we provide the system with a screenshot for the task of on-screen reference resolution, which we find helps substantially improve performance.

lockquote>

So how does Apple’s model do?

lockquote class=”wp-block-quote”>

We demonstrate large improvements over an existing system with similar functionality across different types of references, with our smallest model obtaining absolute gains of over 5% for on-screen references. We also benchmark against GPT-3.5 and GPT-4, with our smallest model achieving performance comparable to that of GPT-4, and our larger models substantially outperforming it.

lockquote>

Substantially outperforming it, you say? The paper concludes in part as follows:

lockquote class=”wp-block-quote”>

We show that ReaLM outperforms previous ap- proaches, and performs roughly as well as the state- of-the-art LLM today, GPT-4, despite consisting of far fewer parameters, even for onscreen references despite being purely in the textual domain. It also outperforms GPT-4 for domain-specific user utterances, thus making ReaLM an ideal choice for a practical reference resolution system that can exist on-device without compromising on performance.

lockquote>

On-device without compromising on performance seems key for Apple. The next few years of platform development should be interesting, hopefully, starting with iOS 18 and WWDC 2024 on June 10.

FTC: We use income earning auto affiliate links. More.

Source link

Previous News
Footage from 2020 shows Astra rocket exploding during prelaunch testing
Next News
ZachXBT onboarded as custodian for return of funds in $63M Munchables exploit

Galaxy S26 series is getting ‘Hey Plex’ and a...

Mobile Android Authority - February 22, 2026

TL;DR Samsung has announced that it’s bringing deep Perplexity integration...

Black Ops 7 tops US PlayStation sales ahead of...

Microsoft Windows Central - February 22, 2026

It’s no secret that fans and haters alike have...

Amazon’s cloud unit hit was hit by least two...

AI The Economic Times - February 22, 2026

Amazon's cloud unit suffered at least two outages...

Load more

Apple AI researchers boast useful on-device model that ‘substantially outperforms’ GPT-4

Disclaimer

Popular

TP-Link Deco 7 Pro BE14000 Review: Wi-Fi 7 Mesh Without the Sticker Shock

Winhance is such an easy way to optimize your Windows 11 PC

MOFT’s long-awaited MagSafe kickstand wallet with Find My support is now available

ICICI Mutual Fund crosses 5% stake in FirstCry’s parent

AI Impact Summit 2026: US-India tech partnership critical to make AI benefits available to everyone, says Sundar Pichai

More Like this

Galaxy S26 series is getting ‘Hey Plex’ and a major AI upgrade

Black Ops 7 tops US PlayStation sales ahead of Battlefield 6

Amazon’s cloud unit hit was hit by least two outages involving AI tools in December: FT

OpenAI expects compute spend of around $600 billion through 2030

CCPA issues notices to six e-commerce companies over online sales of these equipment

New limited-time Apple Card offer now available: Earn $75 in Daily Cash

Apple AI researchers boast useful on-device model that ‘substantially outperforms’ GPT-4

Disclaimer

More like this

Galaxy S26 series is getting ‘Hey Plex’ and a...

Black Ops 7 tops US PlayStation sales ahead of...

Amazon’s cloud unit hit was hit by least two...

Popular

Block title

Best Apple Deals of the Week: Get Up to $1,200 Off Samsung’s Best Monitors...

China’s rare-earth dominance keeps EV makers dependent

Mastercard’s AI commerce rollout pending regulatory approval

Govt plans to create ‘complete AI stack’ to power innovation: Jayant Chaudhary

Motilal Oswal Alternates closes fifth private equity fund at Rs 8,500 crore

Tripadvisor Board Targeted by Activist Investor Starboard

Apple researchers develop local AI agent that interacts with apps

Startup Events

Trending News

Galaxy S26 series is getting ‘Hey Plex’ and a major AI upgrade

Black Ops 7 tops US PlayStation sales ahead of Battlefield 6

Amazon’s cloud unit hit was hit by least two outages involving AI tools in December: FT

OpenAI expects compute spend of around $600 billion through 2030

CCPA issues notices to six e-commerce companies over online sales of these equipment

About

Partnership

Contact us