Apple recently published a paper asking large reasoning models (LRMs) to solve some simple but lengthy algorithmic challenges, such as the Towers of Hanoi disc sorting puzzle. The models failed explosively. The models were able to solve the Towers of Hanoi challenge (in which discs are shifted across pegs according to simple rules) with three discs but failed at eight or more. The paper showed that the models guess at the output of rules, even when the algorithm is provided.
Apple’s findings aren’t unique. In a paper titled “Mind The…