Why Do Computers Suck at Math?

May 13, 2009

You've probably seen this old chestnut by now.

google-calculator-incorrect.png

Insert your own joke here. Google can't be wrong -- math is! But Google is hardly alone; this is just another example in a long and storied history of obscure little computer math errors that go way back, such as this bug report from Windows 3.0.

  1. Start Calculator.
  2. Input the largest number to subtract first (for example, 12.52).
  3. Press the MINUS SIGN (-) key on the numeric keypad.
  4. Input the smaller number that is one unit lower in the decimal portion (for example, 12.51).
  5. Press the EQUAL SIGN (=) key on the numeric keypad.

On my virtual machine, 12.52 - 12.51 on Ye Olde Windows Calculator indeed results in 0.00.

Windows 3.11 calculator incorrect

And then there was the famous Excel bug.

If you have Excel 2007 installed, try this: Multiply 850 by 77.1 in Excel.

One way to do this is to type "=850*77.1" (without the quotes) into a cell. The correct answer is 65,535. However, Excel 2007 displays a result of 100,000.

At this point, you might be a little perplexed, as computers are supposed to be pretty good at this math stuff. What gives? How is it possible to produce such blatantly incorrect results from seemingly trivial calculations? Should we even be trusting our computers to do math at all?

Well, numbers are harder to represent on computers than you might think:

A standard floating point number has roughly 16 decimal places of precision and a maximum value on the order of 10308, a 1 followed by 308 zeros. (According to IEEE standard 754, the typical floating point implementation.)

Sixteen decimal places is a lot. Hardly any measured quantity is known to anywhere near that much precision. For example, the constant in Newton's Law of Gravity is only known to four significant figures. The charge of an electron is known to 11 significant figures, much more precision than Newton's gravitational constant, but still less than a floating point number. So when are 16 figures not enough? One problem area is subtraction. The other elementary operations -- addition, multiplication, division -- are very accurate. As long as you don't overflow or underflow, these operations often produce results that are correct to the last bit. But subtraction can be anywhere from exact to completely inaccurate. If two numbers agree to n figures, you can lose up to n figures of precision in their subtraction. This problem can show up unexpectedly in the middle of other calculations.

Number precision is a funny thing; did you know that an infinitely repeating sequence of 0.999.. is equal to one?

In mathematics, the repeating decimal 0.999Ö denotes a real number equal to one. In other words: the notations 0.999Ö and 1 actually represent the same real number.

0.999 infinitely repeating

This equality has long been accepted by professional mathematicians and taught in textbooks. Proofs have been formulated with varying degrees of mathematical rigour, taking into account preferred development of the real numbers, background assumptions, historical context, and target audience.

Computers are awesome, yes, but they aren't infinite.. yet. So any prospects of storing any infinitely repeating number on them are dim at best. The best we can do is work with approximations at varying levels of precision that are "good enough", where "good enough" depends on what you're doing, and how you're doing it. And it's complicated to get right.

Which brings me to What Every Computer Scientist Should Know About Floating-Point Arithmetic.

Squeezing infinitely many real numbers into a finite number of bits requires an approximate representation. Although there are infinitely many integers, in most programs the result of integer computations can be stored in 32 bits. In contrast, given any fixed number of bits, most calculations with real numbers will produce quantities that cannot be exactly represented using that many bits. Therefore the result of a floating-point calculation must often be rounded in order to fit back into its finite representation. This rounding error is the characteristic feature of floating-point computation.

What do the Google, Windows, and Excel (pdf) math errors have in common? They're all related to number precision approximation issues. Google doesn't think it's important enough to fix. They're probably right. But some mathematical rounding errors can be a bit more serious.

Interestingly, the launch failure of the Ariane 5 rocket, which exploded 37 seconds after liftoff on June 4, 1996, occurred because of a software error that resulted from converting a 64-bit floating point number to a 16-bit integer. The value of the floating point number happened to be larger than could be represented by a 16-bit integer. The overflow wasn't handled properly, and in response, the computer cleared its memory. The memory dump was interpreted by the rocket as instructions to its rocket nozzles, and an explosion resulted.

I'm starting to believe that it's not the computers that suck at math, but the people programming those computers. I know I'm living proof of that.

Posted by Jeff Atwood
251 Comments

$ python
Python 2.5.2 (r252:60911, Sep 20 2008, 22:32:52)
[GCC 4.2.3] on linux2
Type help, copyright, credits or license for more information.
399999999999999-399999999999998
1

Python WIN!!!!

thebigh on May 13, 2009 11:09 AM

Reminds me of the Pentium Floating Point Division Bug:

http://www.cs.niu.edu/other/pentium.html

David on May 13, 2009 11:20 AM

There are languages that handle arithmetics well.
I known two examples (I am sure, there are more of them)
Python: has built-in infinite precision integer numbers
Lisp: has built-in infinite precision integer and rational numbers and has arithmetical operations defined same way as in math (e.g., (mod -1 10) = 9, not -1).
So, it is clearly solved problem, not sure why not all language designers use infinite precision integers and rationals.

dmitry_vk on May 13, 2009 11:27 AM

In mathematics, the repeating decimal 0.999Ö denotes a real number equal to one., and This equality has long been accepted by professional mathematicians and taught in textbooks.

Not the textbooks I read (or did when I was in university - it's not a favorite pasttime of mine...). The correct mathematical explanation is that 0.99999... (zero point nine recurring) approaches 1. Any pure mathematician (probably not many read Coding Horror...) would surely agree.

Good post, though!

Darren on May 13, 2009 11:47 AM

PHP also calculates the example perfectly.

Jan Hancic on May 13, 2009 11:50 AM

In case it hasn't bit you yet, you might be surprised that a nice large, simple decimal like 0.1 is an infinitely repeating value in binary: .0001100110011...

You have to be very careful doing floating point math[1]. Especially when it comes to testing for equality[2].

[1] http://stackoverflow.com/questions/752738/why-is-the-result-of-this-explicit-cast-different-from-the-implicit-one

[2] http://stackoverflow.com/search?q=floating+point+equality

Dennis on May 13, 2009 11:53 AM

Here we go again (thanks, Jeff, for re-opening a can of worms).

@Darren: Nope, sorry, 0.999... EQUALS EXACTLY 1.0 no ifs ands or buts. Read the Wikipedia article. It includes some very straightforward proofs.

Dennis on May 13, 2009 11:56 AM

Not to mention that typical floating-point numbers can't represent some decimal values accurately. Some programming languages, including C#, have a decimal type that's floating-point but works in base 10 instead of base 2.

Mr.'; Drop Database -- on May 13, 2009 11:58 AM

The article is a bit misleading (to me, at least). I understand the sentence A standard floating point number has roughly 16 decimal places of precision as If you have a real number with about 16 decimal places, floating points can represent it accurately, which isn't true.

A trivial example of this is the decimal number 0.2, which can never be accurately described with IEEE-754, because the .2 is represented with an infinitely repeating pattern of 1001 (100110011001...).

Henrik Paul on May 13, 2009 12:00 PM

Interestingly, the launch failure of the Ariane 5 rocket, which exploded 37 seconds after liftoff on June 4, 1996, occurred because of a software error that resulted from converting a 64-bit floating point number to a 16-bit integer.

I'm tired of reading such nonsense. That's a total misunderstanding of the reasons that led to the loss of the rocket.

Reusing Ariane-4 software in the larger Ariane-5 (with different constraints) can hardly be called rounding error.

Vinzent Hoefler on May 13, 2009 12:06 PM

Putting the Excel 850*77.1 bug in there is a little misleading as Excel does in fact calculate the result correctly. The display logic was broken on assembler level which isn't exactly Excel doing a wrong calculation, but rather a programmer's oversight.

It's not an artifact of how computers handle numbers in contrast to the other examples you give.

Johannes on May 13, 2009 12:06 PM

Oh no. Yet another place infected with the 0.(9) vs 1.0 debate!

IMil on May 13, 2009 12:08 PM

@Simon: python and lisp implementations handle short number (fixnums, as they are called in lisp) with the same speed as arithmetics machine words (e.g., as in C). And long arithmetics is only used when numbers do not fit into registers anymore. So, you can see, that on short numbers, it does not make difference, but on long numbers, correct but slightly longer operation is preferred to fast but incorrect.

@Dennis:
nobody forces to represent 0.1 as a binary fraction. 0.1 is a rational number and is usually should be treated as such (e.g., when precision is required).
simple decimal like 0.1 is an infinitely repeating value in binary:

dmitry_vk on May 13, 2009 12:09 PM

@Darren
I am a mathematician, there is no difference between .9(bar) and 1. Just because we can't count to infinity doesn't mean that it doesn't exist. At infinity, it is equal to 1. As Dennis said, see the wikipedia page on it for proofs.
http://en.wikipedia.org/wiki/0.999...

Simon on May 13, 2009 12:14 PM

@Simon Buchan: Python, Lisp and Haskell get it partially because they are already pretty slow

Lisp is slow is a myth. Most lisp compilers do _not_ use bytecode. SBCL compiles to fast machine code that can compete (and sometimes outperform due to presence of high-level facilities) with C code (including numerical computations). SBCL can often prove that arithmetics will not overflow register and perform various optimizations (including the most interesting cases with bit-twiddling).

Most languages don't support bignum because of the code bloat inherent. C++ will likely compile these statements like:

That's the problem in C++ compiler and the problem in its static typing (e.g., typing variables, not values). Lisp compiler will compile it into completely different code due to using tagged values (i.e., in most cases the runtime dispatch is _very_ fast).

There are usually 3 steps in creating software: make it work, make it right, make it fast. Using arithmetics in machine words screws the second step.

dmitry_vk on May 13, 2009 12:25 PM

Computers are fine with math, it's the programmers that suck.

Jonathan on May 13, 2009 12:31 PM

Raymond Chen wrote about how Calculator got an infinite-precision engine: http://blogs.msdn.com/oldnewthing/archive/2004/05/25/141253.aspx

(In my opinion, the biggest waste of development resources you could think of, but oh well.)

Frederik Slijkerman on May 13, 2009 12:34 PM

In discussion like these, people seem to forget 2 important things:

First:
Computers are only high speed idiots

Second:
People/Humans normaly calculate in decimals (in base 10), whereas computers calculate with floating point (in base 2), allthough both use some form of DOT notation, using floatingpoint in programms will allways give you errors in the real world with for instance 'money'.

hgill on May 13, 2009 12:48 PM

using floatingpoint in programms will allways give you errors in the real world with for instance 'money'.

The conclusion: do not use floating point for such values, but use precise rational arithmetics.

dmitry_vk on May 13, 2009 12:51 PM

Jeff, why do you insist on including some kind of image/clipart on *every* post you make? Occasionally they help make the post entertaining. More often than not it is simply annoying.

Joe on May 14, 2009 2:00 AM

I'm a mathematician, and it is true that 0.(9)=1, but I hate that result. It's really annoying. If you want to get rid of it, you can use infinitesimals and nonstandard analysis, but they are tricky too.

But that's not really Jeff's point, since computer math doesn't even get close to being able to say that 0.(9)=1, unless you go all symbolic on me.

William on May 14, 2009 2:02 AM

This makes no sense to me, and neither this article nor the linked article attempts to explain it

Simple enough of a problem to understand that you are confused...

1. Computers are not user friendly.
2. Computers can ONLY add...
3. At that, they can only add 1+1... 1+0, or... 0+0. they add only binary integers (not decimals.)
4. Programming, either hardware, software or firmware, seemingly allow computers to do more, but, in essence, they only take the numbers and manipulate them to force the execution of subtraction, multiplication and division.
5. (here is where it gets complex) Subtraction is performed by forcing the number to become a negative value (called the compliment). and then they are added together to get the answer. That is a little hard to understand but, If you look at the equation 5 - 4, you see the same thing stated. You turn a 'positive' four, into a negative number (-4) and add them together to end up with a '1' (5 + (-4)= 1).

the old orang on May 14, 2009 2:13 AM

(before anyone sounds off...)

I am not a college graduate...
I used to be a very low level language programmer...
I had to read and comprehend a book on 'Higher Unary Mathematics' for a job I had dealing with navigation.
I may not be explaining things well, but, I know what is the essence of what I have said.
I used to have a sign on my desk that said:

Your wisdom is akin to the result of the most complex and detailed problem in higher unary mathematics.

(ALL problems in ALL levels of Unary Mathematics are answered with 0, the only number in Unary Mathematics. The only problem about the answer was, is it negative or positive... and THAT answer took a whole book).

PS: If one understood the sign, it was not directed at them. But considering the environment and the job, you would be surprised how many didn't.

the old rang on May 14, 2009 2:24 AM

One of my favorite rounding stories. During the first Gulf war, patriot rockets fired by the Americans got less and less reliable over time, and often missed their target. But strangely enough, the ones fired by the Dutch forces didn't.

Turned out, the Americans kept systems going all the time, whereas the Dutch, frugal as they are, switched them off at times when there wasn't any threat. When the Dutch switched them on again, systems reset, looked up the time on the network and were happy. But the Americans never reset, and the rounding errors caused real time and kept track of time to drift, causing them to aim in the wrong direction (because the clock was used to orient the systems).

Don't know whether the moral is be frugal or have more bytes in your time representation, but it seems to show that even addition (+ 1 second) canlead to problems.

Martin on May 14, 2009 2:26 AM

Itís good that you are acknowledging that it is important to know some mathematics, but I find this post slightly disturbing, in as far as it was felt to be necessary.

When I were a lad, learning my trade, one of the first things youíd learn was how computers represented numbers, and what that meant for precision. The fact that this post is needed suggests there may be a generation of programmers who donít care about whatís happening under the hood. Maybe dynamic languages are to blame, or maybe not enough programmers learn C.

In any event, as others have pointed out, the post is slightly misleading in that you donít mention that computers use binary, and the problem is that a fraction might have a low precision decimal expansion, but be infinite in binary.

Steve W on May 14, 2009 2:28 AM

Last time I was bitten by floating point arithmetic was calculating a triangle area with Heron's formula.

http://en.wikipedia.org/wiki/Heron%27s_formula#Numerical_stability

I've never have to care for those little errors with big numbers but in a naive implementation of Heron's formula a little disproportionate triangle results in semiperimeter equal to a side and 0.0 as resulting area.

901 on May 14, 2009 2:29 AM

JavaScript also passed this test:

html
head
titleJavaScript Test/title

script language=JavaScript
function doIt() {
var x = 399999999999999;
var y = 399999999999998;

var z = x - y;

document.write(z);
}
/script
/head

body onLoad=doIt();

/body
/html

Lucas on May 14, 2009 2:33 AM

Hmm I put that little Excel equation into Excel 2007 and got the correct answer of 65,535. Did I do something wrong???

Mike on May 14, 2009 2:36 AM

The only problem I have with the .99999... = 1 is that give me N number of 9's after 0.9 and I can give you an infinite number of number between that number and 1. Not necessarily on a computer but in theory.
One problem with all the discussion, is that .99999...=1 is a mis-statement. It is 'Wrong', or more precisely, a mis-interpretation of what is symbolised.

A simple way (very simplified) is to say that 0.99999... is so close to 1.0, that I wish not to have to print all them darned nines, and for my purposes I will round it off to 1.0, so I don't have to spend all day writing out senseless nines. The precision of avoiding the incredibly small differences is not worth the effort, since each space is 1/10 the size of the previous.

BTW pi to the 10th position is precise enough to negate almost any need to go further, except to make one think pi to the millionth place is a good encryption tool. (and a nifty dandy way to test your abilities at programing decimal precision.)

the old rang on May 14, 2009 2:43 AM

Programmers should definately be aware computers do this and how not to step into the traps it opens.

For example when increasing a value with a small fraction to predict growth in the far future. Say your daily increase was calculated to d=0,00000000000000000001241224111244. So school math will tell us to take (1 + d)^100 to forecast the value 100 days from now. Using a float with not enough precision can cut off more decimals than expected when adding 1 to the very small number. If you instead add the increase day by day you'll get a different, more accurate, result value.

Stefan on May 14, 2009 2:48 AM

Google got this wrong:

$ perl -e 'print 399999999999999-399999999999998 . \n;'
1
$

Python got this wrong:

$ perl -e 'print 38.1 * .198 . \n;'
7.5438
$

Final Grade:

Google: 50% (F)
Python: 50% (F)
Perl: 50% (A+)

Perl Wins!

David W. on May 14, 2009 2:57 AM

Aren't all numbers (other than crazy mathematical constants) representable by finite fractions ? Like 0.33333... is representable perfectly fine by 1/3, 0.1 by 1/10, etc ? Would be interesting to see if any work is being done on using such a data type to represent 'floats'.

J. Stoever on May 14, 2009 3:19 AM

-[A standard floating point number has roughly 16 decimal places of precision]-

Assuming 'decimal places' means 'significant figures', this is the precision in a 64-bit double. You only get 6 or 7 significant figures from a 32-bit float.

Chris Johnson on May 14, 2009 3:20 AM

Not sure you calculator and Excel 2007 examples are 100% accurate. On my Win7 machine running Office 2007 the calculator and Excel examples return the correct values. I have emailed you screenshots of both, just so you can see that it appears those examples are not longer good examples. Other than that this was a really good article on how we as programmers need to understand how computers handle math.

Have a great day.

smehaffie on May 14, 2009 3:38 AM

@ J. Stoever: There are very many more [vast understatement] crazy irrational numbers than there are rational ones.

Dennis on May 14, 2009 3:41 AM

@J. Stoever: Only rational numbers. That's why they are called rational. Irrational numbers like pi, sqrt(2), etc are not representable by fractions, unless you round them.

That said, there are in fact libraries and even built-in mechanisms in many languages for handling fractions the way you describe. The only issue is that they are much slower, resulting in a precision-speed trade-off when considering the two.

Liquid_Fire on May 14, 2009 3:42 AM

When I were a lad, learning my trade, one of the first things youíd learn was how computers represented numbers, and what that meant for precision. The fact that this post is needed suggests there may be a generation of programmers who donít care about whatís happening under the hood.

Don't blame the tools/generation, blame the field. And if it's this young slapdash generation, how come f-p bugs have been causing problems since before I was born?
I did Java long before messing with low-level languages, and this issue still arose, because I was doing something that involved floating point arithmetic. You can find this issue on a pocket calculator if you're doing the 'right' sum. But it's worth a reminder that it exists, that it's incredibly prevalent, and what it means for computing.
Or maybe Jeff just needed to fill his blog quota.

Tom on May 14, 2009 3:46 AM

@Darren

0.99999... doesn't approach anything. It's a number. Saying it approaches 1 is like saying 1.1 approaches 1.2, it's clearly nonsense.

0.99999... = 1

noko on May 14, 2009 3:56 AM

@Martin obviously the morale is to always run NTP daemons on your rocket launchers. :-)

@Steve W: number representation and all those basics are part of any decent comp sci program and I'd think you'd find it at a university level software engineering program too. Nowadays most programmers get trained in trade schools though. Not all of them teach the basics.

Though as someone pointed out, floating point arithmetics has its issues in dynamic languages too. Any decent language guide will contain a section on them so it seems to me most programmers would eventually get round to learning about them, one way or another.

wds on May 14, 2009 4:02 AM

@Darren

0.9999... = 1

because

(10*0.999...)-0.999... = (10-1)*0.999... = 9*0.999...

and

(10*0.999...)-0.999... = 9.999... - 0.999... = 9

Thus 9 = 9*0.999...

Arithmetics axioms say if x*y=x then y=1 (1 is the unique neutral element for * operation).


sub on May 14, 2009 4:19 AM

Actually, you're talking about calculation, not math.

Rik on May 14, 2009 4:28 AM

Hi

Try query in google:

1 usd in dkk
RESULT: 1 U.S. dollar = 5.45538856 Danish kroner

and then try:

1 us dollar in dkk
RESULT: 1 U.S. dollar = 5.47321409 Danish kroner

A small difference... But could be more serious working with larger numbers ;-)

// WiredSource

WiredSource on May 14, 2009 4:30 AM

@Pierre Lebeaupin

Well you think this is trivial the same way unicode, date/time, CRLF, typography are. But I am still surprised at how often I stumble on mistakes in computer expert codes I have to rewrite.

Seems recoding a framework fits the so called experts, real life non export developper as I have to deal with the so called trivial issues in their code.

So THIS is a great topic. As much as dont put clear text password in database as should be try to use ISO, IEEE, RFC norms for representations.

For the record I was once told my application (talking to an international webservice) had a bug, because I did not use uk for the Great Britain country code (there is no such thing as Great Britain in os3166). Another time I was told an should be used for England country code.

Trivialities is the base of begining to know something, while expertise is the art of ignoring so called trivia.

While experts are told to have a narrow specialized culture, developpers should have a broad culture, that's why I like coding horror.

Great topic Jeff, thx.

jul on May 14, 2009 4:32 AM

heheh, the vb guy said dim

mark on May 14, 2009 4:37 AM

Why does anyone need a calculator to see that the answer is 1?

mbhunter on May 14, 2009 4:46 AM

iIn mathematics, the repeating decimal 0.999Ö denotes a real number equal to one., and This equality has long been accepted by professional mathematicians and taught in textbooks.

Not the textbooks I read (or did when I was in university - it's not a favorite pasttime of mine...). The correct mathematical explanation is that 0.99999... (zero point nine recurring) approaches 1. Any pure mathematician (probably not many read Coding Horror...) would surely agree.

Good post, though!
Darren on May 13, 2009 10:47 PM /i

Uhh... No. Any pure mathematician would know that a single number doesn't converge to anything.

Certainly, the iseries/i {0.9, 0.99, 0.999, 0.9999, ...} converges to 1, but that's just further proof that the inumber/i 0.999... is 1.

Consider this: If 1 and 0.999... are not equal, then there are an infinite amount of numbers between the two. I defy you to find one.

Asmor on May 14, 2009 4:52 AM

I normally read CH in an aggregator, but came to comment about the Ariane 5 mention. Given the notoriety of the launch failure and its subsequent use in many undergrad CS/IT courses, it's not surprising that it has its own article: hxxp://en.wikipedia.org/wiki/Ariane_5_Flight_501

@Vinzent however beat me to it...the problem wasn't with the number conversion per se, but the inappropriate re-use of software from a different software project, without checking the design constraints properly and also without appropriate in-situ testing.

Jeff - misuse of previous code is a negligent act that should be talked about more, and with more attention than the Windows 3.1 calculator should ever get 17 years later.

Mark on May 14, 2009 4:58 AM

Goldberg's What Every Computer Scientist Should Know About Floating-Point Arithmetic is a must read and must understand for every software engineer.

Also, it may be useful to read the relevant chapters of the Knuth's The Art of Computer Programming. It may be a bit dense for understanding, but it's good.

It's worth to note that besides the clear difference between the numeric models we learn in school and models employed in our computers, there's yet another one, specific to the design of programming languages...

Let's look at the basic C/C++'s problems with math, the problems with which plagued an incredible amount of existing software.

1. Signed types get promoted to unsigned where necessary and unnecessary. The following condition is always false (it evaluates to 0):
1U -1
Yeah, nice, +1 isn't greater than -1. This is totally brain dead.
Although usually it's not cheap to compare signed and unsigned values, it's possible and I see no point in making the programmer do it in convoluted ways, which he very often (usually?) doesn't for one reason or another (laziness or ignorance) and ends up with a code bug.

2. Suppose our int is 16-bit long and our long is longer than that. Then the following condition is always false as well:
32767 + 1 == 32768
Sweet.
That's because the nums on the LHS will be ints, but the num on the RHS will be long per the language design. And per that same simplistic design there's no provision to avoid the overflow on the LHS when moving it into an lvalue (or comparing it with rvalue) that has more bits, where the overflow will become apparent and likely unwanted.

3. The following is false as well, also due to signed-unsigned conversion:
(-3) % 3U == 0
I'd love to see this produce mathematically more expected results at the expense of extra CPU cycles. Too bad it's not so.

4. Assuming 16-bit ints, the following will be false:
32767 * 2 == 65534
But these will be true:
32767L * 2 == 65534
32767U * 2 == 65534
It is obvious that the product of two 16-bit ints is gonna need 32 bits of storage. It is brain-dead to require the programmer to instruct the compiler to produce the full product and not just the least significant half of its bits or do some other trickery. It's easy to make a mistake here and forget an explicit type conversion to long of one of the multiplicands.

5. The following is also false:
-2 + 1U + 1.0 == -2 + 1.0 + 1U
(A+B)+C is no longer the same as (A+C)+B. Now, that's nice, broken commutativity!

6. Again, assuming 16-bit ints, shifts by 16 or more positions left or right are undefined per the design and in practice you get some funny results as if the shifts were done by count % 16. It would be natural to produce 0 with right shifts by 16 or more (assuming, we're talking unsigned ints, it's a different thing with signed ints) and I'd be OK to get a 0 (or anything defined, e.g. UINT_MAX) when doing the same in the other direction.

Basically, the problem is, in C/C++ a lot of what you learned about arithmetic is no longer true, not just the rational (AKA floating point) numbers. That is, you can't just use whatever you learned at school. You must learn the way the language does the math and you must write your code accordingly, adapting your ideal-world ideas to the brutality of the real computing. C's arithmetic expressions look familiar and seem to make sense to anybody understanding math, but make no mistake, behind this faÁade hides a great deception.

It's possible to explain in part why C/C++ is so goddamn math unfriendly. It's basically a generalized assembler language which must be quite primitive so it's easy to compile it and transform into comparable and primitive instructions of the target CPU. For instance, very few CPUs have comparison and division of a signed and unsigned operand. It is uncommon for CPUs to support shift counts larger than the register size in bits. It's possible to construct such operations, but nobody bothered to back when C was being designed and now it's too late.

Every C/C++ programmer has to learn this the hard way and figure out a way to do in C what an ideal (or almost so) calculator would do w/o any surprises.

For a long time I actually found programming in assembly more transparent and giving more expected results than programming in C. In part I attribute it to the difference between the two: when you learn an assembly language you have to open the CPU manual to see what registers and instructions are there and how they work. In C you don't seem to need to learn how + or * or == work, because they're familiar and you intuitively know how to use them. Unfortunately, these are wrong expectations and assumptions, and wrong expectations and assumptions rarely work well in software engineering. What helps in the case of C/C++ is reading and understanding the language standard. The standard isn't an easy reader. Many usually end up buying books on C/C++. These days there're titles that cover the issues well. But I remember the days when C/C++ books covered this poorly or ignored the topic almost entirely. I hated them and only finally got everything straight in my head when I'd made every possible mistake and written a sufficient amount of very portable C code.

Btw, last time I checked, the C standard was available online for the mere $18 (cheaper than a book on C/C++ you'll pick at your local bookstore). Not knowing how to use your tools correctly is not just bad, it's very bad and wrong. At the same time I wish C/C++ never existed in its current form -- it could've been done better.

Anonymous on May 14, 2009 5:05 AM

1 and 0.999... are simply condensed notation for different infinite sequences that converge to the same real number. I think that the controversy about the 1 = 0.999... thing stems from the fact that most people do not think of a decimal expansion as the limit of a convergent sequence. Here's an excellent explanation:

http://en.wikipedia.org/wiki/Decimal_representation

Andrew on May 14, 2009 5:17 AM

I never understand why some people have such an issue with 0.999Ö = 1, but never with 0.333Ö = 1/3. And given that, what do they think 3 X 0.333Ö is?

Steve W on May 14, 2009 5:29 AM

That calculator bug existed for quite a while in windows.
It was still present in 3.11 WfW.

bram

Bram on May 14, 2009 5:31 AM


At www.karenware.com you can get a freeware calculator program for Windows that will do calculations to hundreds of thousands of digits.
The bigger numbers will take a while.
I imagine it won't do infinite numbers of digits.

Teri Greene on May 14, 2009 5:37 AM

The Windows calculator issue went away when MS rewrote it to use infinite precision for basic operations...
http://blogs.msdn.com/oldnewthing/archive/2004/05/25/141253.aspx

Kai Liu on May 14, 2009 5:44 AM

Using real numbers, 0.999... is absolutely one. Limits are involved in the proof, but aren't necessary for the original statement. But, as pointed out by William, there are other number systems (hyperreal, superreal, surreal) that have an infinite number of numbers between 0.999... and 1. The defiance shouldn't be to come up with such numbers (in Hakenstrings, 10(1) followed by any combination of 0s and 1s), but rather to find a use for them in CS.

McBeth on May 14, 2009 5:45 AM

Joe wrote:
-----------
This makes no sense to me, and neither this article nor the linked article attempts to explain it. Why should subtraction be harder than addition, multiplication, or division? I've tried thinking about it from various angles, and don't see why subtraction should introduce this kind of difficulty, and especially why the agreement of the two operands should have an effect on the precision of the results. Can you elaborate?
--------------

Simple, you are right that subtraction is no problem on exact numbers.
But let's imagine you have a slight error from former results or
decimal-binary conversion:

3 = 3.000 000 000 1 (decimal arithmetic) and
2.999 999 999 = 3

The correct result should be 3-2.999 999 999 = 1e-9.
But the result of the computer is 1e-10 which gives us a relative
error of 90% ! The problem: The erronous digit which lumbers normally peacefully far far back in all operations and can be ignored slides forward if the preceding digits are erased and that can happen if you substract two adjacent numbers.

TSK on May 14, 2009 5:57 AM

What did you expect from Windows? On my Linux system, xcalc does the example correctly, even to 5.0002 - 5.0001. Ditto for command-line dc.

/smug

Penguin Pete on May 14, 2009 5:59 AM

In a similar vein this was explained to me by a teacher who also was a priest which made it all the more funny...

A naked woman is standing 10 meters away from a mMathematician and an Engineer. They are allowed to walk halfway towards the woman before stopping then go half again and so on and so on...

The Mathamatician says you will never reach her.

The Engineer says you will get close enough as it makes no difference.

Paul on May 14, 2009 5:59 AM

When I was a freshman my professor asked in our class to explain the behavior of

for (i = 0.2; i != 10; i++)
...

I was the only one to answer this question right and I got an extra point in my final exam!!!

VasandGVD on May 14, 2009 6:09 AM

Interestingly, the launch failure of the Ariane 5 rocket, which
exploded 37 seconds after liftoff on June 4, 1996, occurred because
of a software error that resulted from converting a 64-bit floating
point number to a 16-bit integer. The value of the floating point
number happened to be larger than could be represented by a 16-bit
integer. The overflow wasn't handled properly, and in response, the
computer cleared its memory. The memory dump was interpreted by the
rocket as instructions to its rocket nozzles, and an explosion
resulted.

As Vincent alreay pointed out, this is nonsense. What happened was that code from the Ariane 4 project was reused. The special thing about that was that it was proven mathematically that the input-values will never exceed the range, so that there was no need to check these boundaries.

Ariane 5 - that is much bigger than Ariane 4 - lead to the values exceeding this range resulting to an exception when executing the ADA-code and the shutdown of the specific computer. The secondary system taking over and processing the same data, shut down as well after running into the same exception leaving Ariane 5 without working stabilization system. The explosion was remotely initiated because of the rocket starting to become uncontrollable.

Hope that clears things.


Regards, Lothar

Lothar on May 14, 2009 6:10 AM

Mathcad - by PTC - get's all these values right every time. I suspect Matlab, Maple and others do too.

That's why I use applications specifically designed for calculation when accuracy is required. Excel is a spreadsheet, and while powerful it doesn't doesn't have the riggor of a dedicated mathematics engine nor does it have a symbolic engine.

I love that joke by Paul.

Philip on May 14, 2009 6:36 AM

I think that a lot of the problem comes from 2 places.

1. Trying to fit such a large number of values into such a small space.

2. Using binary floating point rather than decimal floating point.

Firstly, a standard 32 bit floating point can represent a value as large as 3x10^38, and as small as -1x10^45. That's a really big number. Most applications, unless you're dealing with astronomy, or particle physics you will never need numbers that big. 99% of people probably would be good with a smaller range of numbers if they were more accurate. For the numbers that most computers deal with on a daily basis, 10^16 would probably be more than enough.

Then there's the issue of using binary floating point. This means that we can't even represent numbers such as 0.1. There's a lot of common numbers that can't be represented exactly. If we used a representation of the numbers that more closely mapped to the numbering system we used, then there wouldn't be so much of a problem.

So, I think that we should really be using something more like what is offered in databases in our programming languages. Something like the Decimal DataType. You can declare a field as Decimal(18,4), and you know that you can represent all numbers that are up to 14 digits before the decimal, and 4 digits after the decimal. You know exactly which numbers you can represent, and any person can understand that without getting into the complexities of converting from binary representations to decimal representations.

Kibbee on May 14, 2009 6:51 AM

0.2 (or any non integer multiple) is the equalavent in binary as 0.3333333 in base 10.

This causes a ton of problems and is by any loop should use = to stop the loop and not ==.

Also, about .999999999 being equal to 1, consider this.

1/9 = .1111111111...
2/9 = .2222222222...
3/9 = .3333333333...
4/9 = .4444444444...
5/9 = .5555555555...
6/9 = .6666666666...
7/9 = .7777777777...
8/9 = .8888888888...
9/9 = .9999999999... (WAIT 9/9 = 1 doesn't it?)

Tim on May 14, 2009 6:54 AM

Tim's post is compelling.

Zack on May 14, 2009 7:00 AM

*thinks some chum has been thrown in the water as he watches the sharks circle*

AC on May 14, 2009 7:12 AM

[c#]
const float nine = 9;

for (float i = 1; i 10; i++)
{
Console.WriteLine({0}/9 = {1}, i, i / nine);
}

Console.ReadLine();
[/c#]

This outputs:
1/9 = 0.1111111
2/9 = 0.2222222
3/9 = 0.3333333
4/9 = 0.4444444
5/9 = 0.5555556
6/9 = 0.6666667
7/9 = 0.7777778
8/9 = 0.8888889
9/9 = 1

BobbyCannon on May 14, 2009 7:20 AM

I have had this issue. As a researching scientist I find errors in math all over the place. My TI-89 is one of the worst offenders. Labatory results can have huge errors because of this :(

Brandon on May 14, 2009 7:32 AM

This may all be news to those who did not have a computer science education.

Sharks are jumping all over the place.

I also agree that (once again) Jeff misunderstands or over-simplifies - i.e. the Ariane comment.

tim on May 14, 2009 7:43 AM

It can only be attributable... to human error.

Tom on May 14, 2009 7:51 AM

No proof that 0.9999... = 1 is needed because they are equivalent by definition of the reals. :)

Of course this doesn't help most people upon first encountering this conundrum, but it's a reflection of the extent to which mathematics is a creation of humans. (Viz. the famous quote by Kronecker.) Perhaps a limit on the neo-Platonist view of mathematics.

Brian Tung on May 14, 2009 7:53 AM

You FAIL Jeff.

The correct question to ask the Windows Calculator is what is the difference between Windows 3.11 and 3.1?

The answer: nothing.

PRMan on May 14, 2009 7:56 AM

Your post reminds of a joke that I heard not long ago...
An infinite number of mathematicians walk into a bar. The first one orders a beer. The second orders half a beer. The third, a quarter of a beer. The bartender says You're all idiots, and pours two beers.

Dejan on May 14, 2009 7:57 AM

@Darren - For most coders we have had just as much math as traditional mathematicians, and I am not sure where your school is but when I attended the University of Michigan it was taught that .999... infinitely repeating is exactly equal to one.

http://answers.yahoo.com/question/index?qid=20070821213704AASMKu0

sharms on May 14, 2009 8:01 AM

I really don't think it has anything to do with the programmers sucking at math, it has to do with them sucking at testing. No one is going to anticipate that the computer will give a mathematical error on any particular line of code. Rather, we need to anticipate that it's going to do something unexpected SOMEWHERE due to a bug in our code or in the runtime we're using. The crime isn't that the NASA programmers didn't realize the loss of precision with the numeric conversion, it's that they failed to create a unit test approximating real world data that would have revealed the problem before a billion dollar spacecraft blew up.

Jonathan on May 14, 2009 8:08 AM

Hmm, doesn't seem anyone's covered this angle yet... I think the reason there isn't a widespread solution to this problem built in to every programming language yet (as someone else said, programmers aren't clamoring for it) is because the vast majority of programmers don't know this problem exists. If more people knew about it, maybe they would be. Imagine if it were common knowledge among business executives that any result coming out of a spreadsheet is suspect, depending on what numbers you used. It's a dirty little secret amongst us programmers.

It is after all, ridiculous that we as people are more accurate in storing numbers with large precessions than computers are. How do we do it? We use symbols to denote repeating, and we articulate this as fractions when a division doesn't yield a zero remainder. So when we divide 9 by 2 for example, we as people might write: 4 1/2. Simple, a string of 5 characters including the space. No precision problems. Why in this day age we feel we must represent this in base 10 or base 2 is beyond me.

Now before you say that strings are slow, yes, today they are. But we're starting to see specialized processors in computers. Sound components, dedicated to that purpose. GPU's, dedicated for visuals. Why can a computer not have a better Math processing unit to make it fast?

Math errors like this are just unacceptable to me. What it comes down to is that you can't just trust the computer - because for your average programmer to just be able to write code under all the pressures we deal with every day, we just shouldn't have to deal with such an obscure problem that can manifest itself in such scary ways, when we least expect it. I wonder how many software bugs (yet to be reproduced or fixed) are as a result of some stupid math error buried so far down the abstraction chain that our puny human brains have zero chance of eradicating?

One last point that scares me about this weakness of computers: it's not that slightly less precision will hurt us very often. Let's face it, infinite precision doesn't make something more practical, necessarily. The bigger problem is in equality tests, which are used for logic switches. What if I press the button (you know which button I'm talking about?) based upon whether A - B = 0 ? (and A or B are one of those wonky values mentioned above)

Big trouble. It's time EVERY language had a true Number class. If we need to bake it into the hardware to get us there, fine, do it!

Craig Fitzpatrick on May 14, 2009 8:17 AM

There is nothing wrong with math.

Computer hardware is binary. Software tries to cope.

Steve on May 14, 2009 8:19 AM

What scares me is when people code money values as floating-point - I've even had an argument with a consultant running a Java course for Sun over this.

Kerry on May 14, 2009 8:20 AM

Your audience is software developers.

Why are you explaining this?

Michael Reiland on May 14, 2009 8:52 AM

Jeff, did you even READ the article you linked to about the Excel bug? It has NOTHING to do with floating-point error, it's entirely a display bug. The result is computed correctly.

The problem is in the float-to-string conversion code, which was hand-written assembly for maximal speed. The bug appeared when porting it from Win16 to Win32.

With 16-bit instructions, when the AX register goes from 65535 to 65536, it overflows back to 0 and sets the x86's overflow flag (OF) in the FLAGS register. With 32-bit instructions, when the EAX register goes from 65535 to 65536, it does NOT overflow and does NOT set the OF flag in the EFLAGS register.

For some very particular floating-point inputs, the float-to-string function increments the AX/EAX register from 65535 to 65536/0. As a result, a branch that was taken in 16-bit code which predicates on the OF flag was no longer being taken in the 32-bit code, eventually resulting in the bug.

Adam Rosenfield on May 14, 2009 8:53 AM

osp70:

What you say is true, but backwards. What you're missing is we aren't giving you N many 9's in 0.999.... We're putting infinitely many 9's there. You cannot give any number with more 9's than infinity many 9's, so you cannot give a number between that and 1, which means there is no difference.

1/1, 2/2, 5/5, 9/9, multiplicative identity, 1/2 + .5, all mean the same thing, but people get hung up on this one representation of the number 1.

Ens on May 14, 2009 9:02 AM

You know, that 0.9999 would only be accurate if it produced a graphic that would reach the edge of the screen and beyond :-)

tomc on May 14, 2009 9:06 AM

Err, it's not the test of the overflow flag, but rather a comparison against zero that causes the incorrect behavior with the 32-bit code but not with the 16-bit code. My point still stands, though.

Adam Rosenfield on May 14, 2009 9:10 AM

@Anders Sandvig

To be precise I was refering to fixed point representation :

http://docs.python.org/library/decimal.html
http://msdn.microsoft.com/en-us/library/system.decimal(VS.80).aspx

it spares really a lot of problem when working on e-commerce.

I may have been confus(ing|ed) though :)

I just did not remembered there is no such thing as a language working in BCD only oooolllllddd computers (IBM).

That's the problem of being graduated from last century :)


jul on May 14, 2009 9:12 AM

@the old rang:

I hate to sound off on that, but it isn't a very simplified way of looking at 0.99999... vs. 1, that's a wrong way of looking at 0.99999... vs. 1. Because the point is that they are literally the same number expressed in two different ways. Not so close as makes no difference, but the exact same number.

I'm an engineer. I know intimately close enough as makes no difference.

Anonymous on May 14, 2009 9:15 AM

It doesn't help that the real numbers are uncountably infinite, not merely countably infinite like the integers. At least with integers you can say okay, you get to represent the first N above and/or below zero and just make N large if needed. Any two different real numbers have an infinite number of other reals between them, so it's impossible to represent any nonempty segment of the real number line exactly.

Jim on May 14, 2009 9:20 AM

tomc:

Not true! For the exact same reason that 1 + 1/2 + 1/4 + 1/8... doesn't equal infinity even though you are adding together an infinite number of positive numbers, it also doesn't need to take infinite space to represent the concept given infinitely fine pixel density (which could be approximated procedurally via an infinite zooming algorithm).

Alternatively, he could replace the graphic with the numeral 1 :).

Ens on May 14, 2009 9:22 AM

I am afraid these two examples have only very little to do with precision.

In first case, I suppose, it was an optimization, where all the digits were comapred and if they were equall, result was 0. Obvious. However, last digit was forgotten. In pseudo C code:

int streq(a,b) { return (strlen(a)==strlen(b)) (all strlen(a)-1 chars are equall); }

Common mistake, in either direction.

Second error - I guess Excell is keeping big numbers in an array of 16bit numbers, and as the result somehow (it is likely, here the precision played a role) happens not to fit in these 16bits (sign + real(0.1) fraction(1/10) ), top bit has to be carried over and something is/was screwed up there.

Marian

Marian Csontos on May 14, 2009 9:37 AM

News flash: in our ongoing series, after discovering proper password salting, and after discovering the processor supervisor state/ring 0, Jeff Atwood now discovers floating-point numbers. In other news, programmers everywhere discover the bozo bit can also be flipped on an other fellow programmer.

Pierre Lebeaupin on May 14, 2009 9:45 AM

Obligatory Bistromathics reference:

Bistromathics itself is simply a revolutionary new way of understanding the behavior of numbers. Just as Einstein observed that time was not an absolute but depended on the observerís movement in space, and that space was not an absolute, but depended on the observerís movement in time, so it is now realized that numbers are not absolute, but depend on the observerís movement in restaurants.

http://www.tudy.ro/2007/07/10/the-bistromathic-drive/
http://en.wikipedia.org/wiki/Starship_Billion_Year_Bunker

DA on May 14, 2009 9:47 AM

Maybe this is the reason why this cheap calculator I bought a few weeks ago does some calculations completely wrong. 2060/3.8*3.8=1 according to that piece of crap. Even other cheap calculators don't have that problem.

Alex on May 14, 2009 9:55 AM

crap I still remember when stupid int counters would overflow at 32k and unsigned shorts were 256.

Trudy on May 14, 2009 9:57 AM

As Dennis mentioned above, 0.1 is a simple base 10 fraction that has a repeating form in base 2, i.e., 0.0(0011) where the part in parentheses repeats. 0.3 is another example: 0.0(1001). Thus, they cannot be exactly represented in IEEE arithmetic. I've made some base conversion tools for numbers with fractional parts available at

http://www.knowledgedoor.com/1/Base_Conversion/Convert_a_Number_with_a_Fractional_Part.htm

These can be very helpful in illustrating just how easy it is to run into numerical problems when switching between base 10 and base 2 representations.

Dan Linder on May 14, 2009 10:08 AM


Computers only suck at math when they don't use IBM COBOL packed decimal (BCD) fields ^^

privatehuff on May 14, 2009 10:16 AM

I hate to sound off on that, but it isn't a very simplified way of looking at 0.99999... vs. 1, that's a wrong way of looking at 0.99999... vs. 1. Because the point is that they are literally the same number expressed in two different ways. Not so close as makes no difference, but the exact same number.

Hmmm... hehehehe... Ok, engineer... Take 1.0. Subtract your number0.99999. Test for 'zero' and if it passes the test, when you are between the moon and Mars, you are right, and complete your mission. if you are wrong, you find temperatures in the solar range, and you ... should go to the sun when it is night.

The numbers are the same, by convention, NOT mathematics. People did not deal with such long decimals in olden times. (when the rules of mathematics and arithmetic were developed.)

Better still... make a gamble out of it. If the odds are that you have a 0.999999... chance of surviving an ordeal, you might take the bet... But, the odds are not 1.0... not a certainty. Would you still chance the bet? Murphy wrote the laws. The important one was 'at the worst possible time'... If certainty is required, 1.o is NOT equal to .99999...

the old rang on May 14, 2009 10:16 AM

To support precise math and to have it built in is two different things. C++ supports arbitrary precision math just fine when using GNU MP Bignum lib http://gmplib.org/, it just isn't java which has even kitchen sink built in

zokier on May 14, 2009 10:36 AM

I do agree that infinite precision math is required in end-user applications (besides maybe matlab/mathcad and alike).

One can easily guess that Jeff is a little biased against Microsoft by not finding in his post this link
http://search.live.com/results.aspx?q=399999999999999-399999999999998mkt=en-US

And this link http://blogs.msdn.com/oldnewthing/archive/2004/05/25/141253.aspx

Ruby did not exist and and Google was a very small startup when Microsoft fixed this bug.


Unlike strange floating point caluclation, 850*77.1 trick in Excel was a real bug. Fortunately, I've just tried that trick in Excel 2007 SP2 and could not reproduce the issue.

Vyacheslav Lanovets on May 14, 2009 10:37 AM

Within the real number system, '0.999...' is, by definition, equal to the real number 'one'.

In other number systems things can be defined differently:
http://en.wikipedia.org/wiki/0.999...#Alternative_number_systems

As far as I know systems are 'best effort' approximations of reality.

Jacco on May 14, 2009 10:55 AM

You are making a 'modern' assumption based on incorrect data.

Unless proper correcting software is included, computers are terrible at math. They can do it in binary, octal and hexidecimal. Unfortunately, they can not do decimal math. (There is NO octal equivalent to decimal 0.05... period. Also, there is a major difference between negative zero (-0.00) and positive zero (0.00). Control Data used to manufacture a decimal computer, and these problems were not really a question. Since the Window take over of reality, precision decimal math has taken a back seat to 'enhancements' instead of true and accurate function.

In the olde tymes (when BAL and machine language were the only languages, and punch paper tape was high speed input... Computers had to have math functions programmed (oftentimes) since, if you really know the facts, computers can only add.

Writing functions to actually do math correctly, is a lost art. After all, Computers are great at multiplying and dividing!! Right!!!... er.. ummm...

No. They aren't. Not unless a human, with understanding and knowledge, fixes the basic problem... Computers are binary, and Binary, Converted to octal, doesn't function in decimal.

Launch a space ship, from earth, to ... Ummm. Mars... The error in the system, unless corrected for (have you ever heard of mid missions flight corrections??) will miss Mars, and not by a small factor.

Unary mathematics is almost unheard of, now days... I remember a thick tome that was required reading, to do flight plotting. Seems a negative zero will, according to the computer either be equal, or not equal to positive zero, unless Murphy said it ain't, and you better listen.

Mathematically speaking, if the remainder of a subtration is even 0.000... (999 more zeroes)...01, there is a difference. Unless the difference is zero (positive or negative) there is a difference! If there is a positive or a negative zero, they you have to know why and test to see IF the difference is significant. Not doing so, will create errors like you might not believe.

the old orang on May 14, 2009 11:02 AM

Hmm...

einstein:~ siuying$ irb
irb(main):001:0 399999999999999-399999999999998
= 1
irb(main):002:0 12.51-12.52
= -0.00999999999999979
irb(main):003:0 quit

einstein:~ siuying$ python
Python 2.5.1 (r251:54863, Jan 13 2009, 10:26:13)
[GCC 4.0.1 (Apple Inc. build 5465)] on darwin
Type help, copyright, credits or license for more information.
12.51-12.52
-0.0099999999999997868

siuying on May 14, 2009 11:19 AM

Since nobody mentioned Haskell so far: Haskell is another language that handles large numbers and fractions very well. It even has a rational number datatype that represents numbers as the quotient of two integers of arbitrary size.

Kim on May 14, 2009 11:22 AM

More comments»

The comments to this entry are closed.