Apple researchers taught an LLM to predict tokens up to 5x faster

August 9, 2025

Share via:

A new research paper from Apple details a technique that speeds up large language model responses, while preserving output quality. Here are the details.

The nerdy bits

Traditionally, LLMs generate text one token at a time. This is slow because each step depends on all the previous ones to keep the output coherent and accurate.

If the model is writing a sentence like “The cat is black”, it predicts each token in sequence. After writing “The cat is”, it looks at everything…

Source link

Disclaimer

We strive to uphold the highest ethical standards in all of our reporting and coverage. We StartupNews.fyi want to be transparent with our readers about any potential conflicts of interest that may arise in our work. It’s possible that some of the investors we feature may have connections to other businesses, including competitors or companies we write about. However, we want to assure our readers that this will not have any impact on the integrity or impartiality of our reporting. We are committed to delivering accurate, unbiased news and information to our audience, and we will continue to uphold our ethics and principles in all of our work. Thank you for your trust and support.

Previous News

How to use xAI Grok Imagine photo and video generator for free? Check our step-by- step guide

Next News

VC funds lock in specialists to vet cybersafety bets

9to5mac