Apple AI Paper Questions Current Models

Apple has published a new research paper titled "The Illusion of Thinking," which argues that current AI models simulate reasoning without true structured recursion. The paper suggests that today's large language models can generate plausible reasoning traces but lack deeper, symbolic reasoning capabilities. This finding is critical for leaders in the AI and machine learning fields, highlighting a key gap in the path to more advanced artificial intelligence.

- The research tested Large Reasoning Models (LRMs) like OpenAI's o3, Google's Gemini Thinking, and Anthropic's Claude 3.7 Sonnet-Thinking using classic logic puzzles with controllable complexity, such as the Tower of Hanoi and River Crossing. - A key finding was the concept of "accuracy collapse": as the complexity of a puzzle increased (e.g., adding more discs to the Tower of Hanoi), the models' performance would sharply drop to zero accuracy. - Researchers observed that when problems became too difficult, the models would counterintuitively *reduce* their reasoning effort, using fewer computational tokens instead of more, suggesting they were effectively giving up. - The paper identifies three performance regimes: standard LLMs outperform LRMs on low-complexity tasks, LRMs have an advantage in medium-complexity scenarios, and both model types completely fail on high-complexity problems. - One of the paper's authors is Samy Bengio, a prominent figure in the field who serves as the director of AI and Machine Learning Research at Apple. - The findings have been met with significant debate, including a rebuttal paper titled "The Illusion of the Illusion of Thinking," which argues that the performance collapse was not a failure of reasoning but a result of flawed experimental design, such as models hitting their output token limits. - This research provides context for Apple's more cautious AI strategy and its perceived lag behind competitors, as the company has been seen as less focused on large-scale generative AI and more on other areas like visual intelligence. - The paper's publication shortly before Apple's WWDC event was seen as a deliberate move to frame the conversation around the limitations of current AI approaches, just as the company faces pressure to deliver on its own AI promises for features like Siri.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.