Calculators Beat LLMs at Math
Skeptics keep revealing how limited AI and LLM systems are at even some simple tasks
Some of us remember seeing electronic calculators for the first time, with their LCD or LED numbers, limited functionality, and elegant simplicity. They were for a wonder for parents and students alike, culminating in the calculator watch, the Apple Watch of its day.
In any form factor, calculators performed as promised, adding, subtracting, dividing, and multiplying numbers accurately, whether you were tallying finances, running sums for a test, or dividing a recipe.
Surely modern computer technologies wouldn’t be beaten by these simple, anachronistic technology dinosaurs — calculating tech now buried in basic and boring installed apps on our phones.
In an intriguing preprint on arXiv, Apple technologists examined the ability of state-of-the-art LLMs to multiply big numbers and not be misled by irrelevant changes.
Long story short, when it comes to giving accurate answers to requests to multiply two multi-digit numbers, LLMs proved highly unreliable:
You can see the dropoff in accuracy better in this chart:
This is very poor performance, and reminds us that LLMs are based on language — after all, how many examples can it have found in texts for 15 x 20?
As Gary Marcus writes, “Compare [the accuracy] with a calculator which would be at 100%.”
This isn’t the only failing — not by a long shot.