Apple’s latest study proves that AI can’t even solve basic grade-school math problems

a scale with AI on one side and a brain on the other
(Image credit: Shutterstock / Sansoen Saengsakaorat)

Several Apple researchers have confirmed what had been previously thought to be the case regarding AI—that there are serious logical faults in its reasoning, especially when it comes to basic grade school math.

According to a recently published paper from six Apple researchers, 'GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models', the mathematical “reasoning” that advanced large language models (LLMs) supposedly employ can be extremely inaccurate and fragile when those methods are changed.

The researchers started with the GSM8K's standardized set of 8,000 grade-school level mathematics word problems, a common benchmark for testing LLMs. Then they slightly altered the wording without changing the problem logic and dubbed it the GSM-Symbolic test.

The first set saw a performance drop between 0.3 percent and 9.2 percent. In contrast, the second set (which added in a red herring statement that had no bearing on the answer) saw "catastrophic performance drops" between 17.5 percent to a massive 65.7 percent.

What does this mean for AI?

It doesn’t take a scientist to understand how alarming these numbers are, as they clearly show that LLMs don’t properly solve problems but instead use simple "pattern matching" to "convert statements to operations without truly understanding their meaning." And if you slightly change the information found in those problems, it majorly interferes with the LLMs’ ability to recognize those patterns.

The main driving force behind these current LLMs is that it’s actually performing operations similar to how a human would, but studies like this one and other ones prove otherwise — there are critical limitations to how they function. It’s supposed to employ high-level reasoning but there’s no model of the logic or world behind it, severely crippling its actual potential.

And when an AI cannot perform simple math because the words are essentially too confusing and don’t follow the same exact pattern, what’s the point? Are computers not created to perform math at rates that humans normally can not? At this point, you might as well close down the AI chatbot and take out your calculator instead.

It’s rather disappointing that these current LLMs found in recent AI chatbots all function on this same faulty programming. They’re completely reliant on the sheer amount of data they horde and then process to give the illusion of logical reasoning, while never coming close to clearing the next true step in AI capability — symbol manipulation, through the use of abstract knowledge used in algebra and computer programming.

Until then, what are we really doing with AI? What’s the purpose of its catastrophic drain on natural resources if it’s not even capable of what it has been peddled to do by every corporation that pushes its own version of it? Having so many papers, especially this one, confirming this bitter truth makes the whole endeavor truly feel like a waste of time.

You might also like

TOPICS
Allisa James
Computing Staff Writer

Named by the CTA as a CES 2023 Media Trailblazer, Allisa is a Computing Staff Writer who covers breaking news and rumors in the computing industry, as well as reviews, hands-on previews, featured articles, and the latest deals and trends. In her spare time you can find her chatting it up on her two podcasts, Megaten Marathon and Combo Chain, as well as playing any JRPGs she can get her hands on.

Read more
ChatGPT app on an iPhone
ChatGPT and Google Gemini are terrible at summarizing news, according to a new study
Bored frustrated business people working in the office with an efficient robot.
Shut it all down? Microsoft research suggests AI usage is making us feel dumber – but you don't need to panic yet
Humanity's Last Exam
Could you pass 'Humanity’s Last Exam'? Probably not, but neither can AI
Apple products with Apple Intelligence against a white background
Apple rushed Apple Intelligence and now the company is stuck playing catch up
AI Learning for kids
AI doesn't belong in the classroom unless you want kids to learn all the wrong lessons
Humanity's Last Exam
OpenAI's Deep Research smashes records for the world's hardest AI exam, with ChatGPT o3-mini and DeepSeek left in its wake
Latest in Artificial Intelligence
A phone showing a ChatGPT app error message
ChatGPT is down for many – here's what's going on
Hume AI
What is Hume: Bring emotional understanding to AI-generated voices
Beautiful.ai
What is Beautiful.ai: Create modern presentations in as little time as possible
The Claude, ChatGPT, Google Gemini and Perplexity logos, clockwise from top left
The ultimate AI search face-off - I pitted Claude's new search tool against ChatGPT Search, Perplexity, and Gemini, the results might surprise you
Viggle
What is Viggle: everything you need to know about the AI animation tool and meme generator
Dream Machine on a laptop.
What is Dream Machine: everything you need to know about the AI video generator
Latest in Opinion
Judge sitting behind laptop in office
A day in the life of an AI-augmented lawyer
Cyber-security
Why Windows End of Life deadlines require a change of mindset
Polar Pacer
Polar's latest software update might have finally convinced me to ditch my Garmin
An image of the Samsung Display concept games console
Forget the Nintendo Switch 2 – I want a foldable games console
Image of Naoe in AC Shadows
Assassin's Creed Shadows is hands-down one of the most beautiful PC ports I've ever seen
Apple CEO Tim Cook
Forget Siri, Apple needs to launch a folding iPhone and get back on track