I <3 Steve McConnell*
Coding Horror
programming and human factors
by Jeff Atwood

Sep 10, 2007

Gigabyte: Decimal vs. Binary

Everyone who has ever purchased a hard drive finds out the hard way that there are two ways to define a gigabyte.  

When you buy a "500 Gigabyte" hard drive, the vendor defines it using the decimal powers of ten definition of the "Giga" prefix.

500 * 109 bytes = 500,000,000,000 = 500 Gigabytes

But the operating system determines the size of the drive using the computer's binary powers of two definition of the "Giga" prefix:

465 * 230 bytes = 499,289,948,160 = 465 Gigabytes

If you're wondering where 35 Gigabytes of your 500 Gigabyte drive just disappeared to, you're not alone. It's an old trick perpetuated by hard drive makers-- they intentionally use the official SI definitions of the Giga prefix so they can inflate the the sizes of their hard drives, at least on paper. This was always an annoyance, but now it's much more difficult to ignore, as it results in large discrepancies with today's enormous hard drives. When is a Terabyte hard drive not a Terabyte? When it's 931 GB.

As Ned Batchelder notes, the hard drive manufacturers are technically conforming to the letter of the SI prefix definitions. It's us computer science types who are abusing the official prefix designations:

Year Approved Official Definition Informal Meaning Difference Prefix Derived From
giga GB 1960 109230 7% Greek root for giant
tera TB 1960 1012 240 10% Greek root for monster
peta PB 1975 1015 250 13% Greek root for five, "penta"
exa EB 1975 1018 260 15% Greek root for six, "hexa"
zetta ZB 1991 1021 270 18% Latin root for seven, "septum", p dropped, first letter changed to S to avoid confusion with other SI symbols
yotta YB 1991 1024 280 21% Greek root for eight, "octo", c dropped, y added to avoid having symbol of zero-like letter O

As the size of the prefix grows, so does the gap between the official and informal meaning of the prefix. And yes, there are larger official SI prefixes beyond these, just in case someone needs more than 1000 yottabytes. Ned noted that one of the SI proposals is for the prefix "luma", representing 1063.

Speaking of impossibly large numbers, if you're like most people reading this article, then you probably arrived here through Google. Google is a tragically but forever misspelled version of Googol:

A googol is 10100, i.e. a 1 followed by 100 zeros. In official SI prefix terms, a googol is approximately a yotta squared, squared. Even larger is the googolplex, which is equal to 10 to the power of a googol (10googol); this number is about the same size as the number of possible games of chess. Even larger numbers have been defined, such as Skewes' number, Graham's number, and the Moser, which I won't even try to describe.

But I digress. When we use gigabyte to mean 230, that's an inaccurate and informal usage. Instead, we're supposed to be using the more accurate and disambiguated IEC prefixes. They were introduced in 1998 and formalized with IEEE 1541 in 2000.

kibibyte KiB 210
mebibyte MiB 220
gibibyte GiB 230
tebibyte TiB 240
pebibyte PiB 250
exbibyte EiB 260
zebibyte ZiB 270
yobibyte YiB 280

You occasionally see these more correct prefixes used in software, but adoption has been slow at best. There are several problems:

  1. They sound ridiculous. I hear the metric system used more often in the United States than I hear the words "kibibyte" or "mebibyte" uttered by anyone with a straight face. Which is to say, never.

  2. Hard drive manufacturers won't use them. Drive manufacturers don't care about being correct. What they do care about is consumers buying their drives because they have the largest possible number plastered on the front of the box. If a big lawsuit wasn't enough to get them to mend their ways, I seriously doubt that the recommendation of an international standards body is going to sway them.

  3. Tradition rules. It's hard to give up on the rich binary history of kilobytes, megabytes, and gigabytes, particularly when the alternatives are so questionable.

It's good to keep in mind the discrepancy between the decimal and binary meanings of the SI prefixes. The difference can bite you if you're not careful. But I think we're stuck with contextual, dual-use meanings of the SI prefixes for the forseeable future. Or perhaps we're all overthinking this, as Alan Green notes:

Whenever I try to discuss [this] with my friends, they say, "Yotta getta life".

Posted by Jeff Atwood    View blog reactions
« Rainbow Hash Cracking
Classic Computer Science Puzzles »
Comments

"If you're wondering where 35 megabytes of your 500 Gigabyte drive just disappeared to"

You mean 35 gigabytes, presumably.

Pete on September 11, 2007 3:25 AM

Yep, I meant 35 Gibibytes! :)

Jeff Atwood on September 11, 2007 3:44 AM

You said it in the middle there: it's us 1024 types who are in the wrong here, and the sooner we give up this particular windmill the better we will all be.

When I'm buying a hard drive, I really don't care that the 1TB drive has 931 "GB" of space. I *do* care that one "1 TB" drive will have significantly more space than another "1 TB" drive, but I can usually find that particular info on the side of the box or the manufacturer's web site after a bit of digging.

I don't know how you go shopping for a new hard drive, but I look at what data I have today, compare that to what I had a year or so ago to come up with a year-growth factor, then look for hard drives which are at about two year's larger than my current needs. Then, I look at what's there, decide the wife would have a fit if I spent $1000 on a new drive, and compromise down.

*EVEN IF* that number that I started out with (how much space I'm using today) was in *bibyte values (as typically reported by the OS), the end decision wouldn't change at all, because anything within, say, 25% of the desired size is just lost in the noise.

Still, though, when I look at how much space I am using, I use the byte counts (OS X gives that in parenthesis right next to the "*bibyte" "friendly" count so it's not like this is much more work to come up with; seems like Windows offers a quick route to the actual byte counts too, right?).

All in all, it seems a moot point. So, in the end, I agree with Alan Green's friends.

Tom Dibble on September 11, 2007 3:48 AM

I noticed this disparity first when I bought my first CD-R. Since then, I orient myself on the much more meaningful unit "minutes", usually in CD-Quality audio that can be packed on a CD-R (MPEG-2 compresses movies when it's DVD+/-R(W)).

I've noticed (or maybe this is my imagination), that DVRs use the "minutes of video", too, for their hard drive capacities.

I don't know about you, but I can grasp "30 hours of video" better than "500 GB".

Phillip Gawlowski on September 11, 2007 3:50 AM

You're so right - those binary prefixes sound soo ridiculous. Like teletubbies, if anyone remembers. ;)

Christoph on September 11, 2007 3:51 AM

I don't understand the assertion that hard drive makers are pulling a fast one by using 'real' SI units in order to make the capacity seem higher. The only think anyone ever does with that number is to compare it to other hard drive capacities, so what's the problem?

Also, I'm hard-pressed to think of a scenario where you really need to keep in mind the discrepancy between the two definitions. When are you ever comparing hard drive GBs vs. something legitimately measured in GiBs?

Chuck on September 11, 2007 3:53 AM

Ha! I've never even heard of a kibibyte.

The annoying thing with the power of 2 vs power of 10 stuff is that it didn't used to be that way. The vendors switched at some point to juice their numbers - extremely weasily.

Kevin Dente on September 11, 2007 4:01 AM

"It's us computer science types who are abusing the official prefix designations".

That's one interpretation. I often prefer to think that we're challenging SI's jurisdiction. Since metric units of measurement predate any common usage of non base 10 numbers by quite some time, my guess is that the decision that kilo was base 10 was pretty arbitrary anyways.

Why not have the unit's suffix determine the base, instead of the prefix? It's not just a bytes issue -- kibimeters is patently ridiculous too. Bytes should always be base 2, meters should always be base 10, and if not, it should be specified explicitly.

If the above diagram is correct (I'm sure it is), when "giga" was specified in 1960, there was already prior base 2 usage. In other words, they broke pre-existing business logic, so it's their fault!

;-)

Sean Reilly on September 11, 2007 4:17 AM

The annoying thing with the power of 2 vs power of 10 stuff is that
it didn't used to be that way. The vendors switched at some point to
juice their numbers - extremely weasily.

I'd like to see proof of this. I don't think there ever was a switch, I think it always was like this. The oldest hard drive add I can find advertises 5 million 7 bit characters. Floppy disk and CDs were in 1000 byte kilo/mega bytes. Except some floppies which were even weirder with 1024 * 1000 byte megabytes. Can anybody produce a hard drive advertisement with 1024 * 1024 megabytes as the unit?

FigBug on September 11, 2007 4:25 AM

The second calculation has "465" twice. The first instance should be "500".

Tim McCormack on September 11, 2007 4:27 AM

The problem has an easy solution:

The OS needs to start using SI prefixes correctly. Linux already does.

$ dd if=/dev/zero of=test bs=1MB count=10
10+0 records in
10+0 records out
10000000 bytes (10 MB) copied, 0.0261481 seconds, 382 MB/s

There's no reason to use powers of two for displaying file sizes. It's ridiculous and makes it more confusing for the user.

Sean on September 11, 2007 4:38 AM

It should be noted that network speed has ALWAYS been in base10.

Your ancient 10baseT ethernet card? That was 10 million bits/second.
a gigabit card is 1000 Mbits, not 1024.

Sean on September 11, 2007 4:44 AM

"Why not have the unit's suffix determine the base, instead of the prefix? It's not just a bytes issue -- kibimeters is patently ridiculous too. Bytes should always be base 2, meters should always be base 10, and if not, it should be specified explicitly."

(first, it's not base-2; it's base-1024. What would "deka" and "hecto" map to in "base 2"? "8" and "128"?)

Huh? Why would I want a completely different series of multipliers with the same name as the universal (not just meters: EVERY SI unit of measure uses the base-10 system!) standard? Why propagate confusion? Instead of just learning "kilo- means 1000", the rule is "kilo means 1000 when the base is meter, liter, gram, watt, ampere, joule, [list continues for a page or so and needs to get updated with each new "thing" to measure]; it means "1024" when the base is byte"? That's ludicrous!

First: what reason is there for defining a kilobyte as 1024 bytes instead of 1000? Why not measure in 2^8 increments (256, then 65536, then 16777216, etc), since that reflects the number of 8-bit bytes used to store it? Does anyone use 10-bit bytes anymore? Is, in fact, the ONLY reason we use 1024 as the base for file sizes that it APPROXIMATES 1,000 as a power of 2???

Second: why on earth are OS's still so bass-ackwards and reporting these 1024-power numbers instead of a sensible standard that matches what the rest of the world uses?

Tom Dibble on September 11, 2007 4:47 AM

I worked on the system information utility for Windows (msinfo32.exe). There were periodic bugs filed of the form "I have a 500GB drive and msinfo only reports it as 465GB!".

You can't please everyone. If it was changed to powers of ten it wouldn't match the rest of the operating system. And if you changed the whole operating system, it wouldn't match legacy systems. It's hard to turn that barge once it has momentum...

Jim Martin on September 11, 2007 4:51 AM

Jim: you've got to start somewhere. The programs that currently use "KB" to describe 2^10 are wrong. File a bug. Linux is currently in a transitional period. Nautilus has several bugs filed against it for reporting file sizes incorrectly... most GNU programs, like the one I demonstrated above, are correct.

I don't understand why geeks are so set on using powers of two divisions. Jeff's complaint is that he doesn't want to sound stupid by saying "kibibyte" it never even occurred to him to use powers of ten. WTF?

If you're going to use base-10 numbers, you should be using base-10 prefixes. As soon as you start talking about a 0x1f4 GiB harddrive you can start complaining.

Sean on September 11, 2007 5:09 AM

Speaking of google and bytes, google still thinks 1 kilobyte (1 KB, to them) is equal to 1024 (2^10) bytes: http://www.google.com/search?hl=enq=1+KB+in+bytes

Jacob on September 11, 2007 5:35 AM

Google's wrong.

http://www.google.com/search?hl=enq=1+megabit+in+bytesbtnG=Search

That's not correct by anyone's definition.

Sean on September 11, 2007 5:38 AM

Personally I think zetta and yotta are pretty lame prefixes, so I'm glad I won't have to deal with saying them in my lifetime. Maybe I'm just more familiar with the smaller prefixes, even the relatively exotic peta and exa. When was the last time you actually *needed* to express something in terms of petabytes?

My children can deal with the yobibi and zebibi controversy.

Jeff Atwood on September 11, 2007 5:52 AM

Shannon: "The worst was the "1.44MB" disks. These are actually 1044 kilobytes. (1044 * 2 ^ 10 bytes, in case people aren't keeping up.)"

I think you mean 1440 kilobytes (1440 * 1024 = 1,474,560 bytes).

Sebastian on September 11, 2007 5:53 AM

Sean: Yes it is correct, it's just one eighth of "1 megabyte in bytes".

Peter K. on September 11, 2007 6:03 AM

Peter: megabit is 1 million bits. It has never meant 1,048,576 bits.

I still don't get why you are so stuck on measuring file size in powers of two.

Is my 3ghz machine running at 3.2 billion hertz? No. It's running at exactly 3.0 billion hertz.

When you start seeing a file that's 1,500,000 bytes as 1.5 megabytes and not 1.43 megabytes, your brain will feel much better.

Sean on September 11, 2007 6:14 AM

The problem is not when you compare 2 hard drives.

We have a problem when we have a 500GB (1000 based) and you need a real 500GB (1024 based).
Workers in computer (Admins, programmers...) can deal with that.
But ordinary people are very confused about that.

Luc Martineau on September 11, 2007 6:50 AM

Memory is one area where the power of 2 meaning makes complete sense. Since it's all manufactured as 2^n, it would just confuse people to call your 1GB RAM "1.073 GB of memory".

coderprof on September 11, 2007 6:58 AM

Yobibi Zebibi were the names of Sudam's sons right? =P

`Josh on September 11, 2007 7:01 AM

Bah!

If you ask me, the standards organisations should be making sure that our conventions are well documented and standardised. Not running around self-importantly defining and redefining things we've used for years. There *is* a reason we use binary measurements.

Ideally they could have gone out and defined "when is kilo binary and when is it decimal?" - after all - that is often confusing. Heck they could have simply defined the standard to say "kilobyte often means this, but sometimes that".

Then they could have run around adding well-defined "kibi" stuff to their hearts content.

Instead they made the confusion *worse* by actively trying to delete the common definition. A bunch of nerds with no understanding of the social outcome of their actions, if you ask me.

Andrew Russell on September 11, 2007 7:08 AM

Isn't the important thing that, when we start saying "tebibyte", that our theme song is already mostly written? http://youtube.com/watch?v=19MNzKL5Swk

Geno Z Heinlein on September 11, 2007 7:37 AM

"There *is* a reason we use binary measurements."

Andrew: care to fill us in?

Please, just list one benefit for having a file that's 2,500 bytes be represented as 2.44 KiB instead of 2.5 KB.

Sean on September 11, 2007 8:07 AM

Most annoying about the SI prefix are the graph legends....
when you see a graph with a little 'm' next to the y axis, is that milli or mega?
The general rule of lowercase being less than one helps, except that 'k' is kilo, meaning 1000 (or preferably 1024 depending on the actual metric). Shame there is always an exception!

Eric on September 11, 2007 8:10 AM

Let's not kid ourself: the only reason we all already KNOW that difference is because practically ALL of us got cheated once and then asked someone and learned why we were some MB's short. I'm pretty sure that in a lawsuit it could be reasonably argued that no normal first time customer can be expected to know the difference, and hence it being an intentional misleading or something of that sort.

J. Stoever on September 11, 2007 8:46 AM

I work for a networked storage solutions company that does both hardware and software, and I will tell you that the IEEE notation is never used. In fact, I wrote a .NET class to manage storage capacities into the 'yotta' range, and during the work came across much of the research you did. End result: most people don't care since we are talking about 3 orders of magnitude between prefix, even though the discrepancy grows larger.

Kit on September 11, 2007 9:06 AM

Sean ("care to fill us in?"):

Sure. Pretty much anything that involves working with memory involves measuring in binary units. Because that's how memory gets addressed. A 32-bit computer is so named because it uses 32 binary bits to address... wait for it... 4GB (sigh: GiB) of memory!

As a programmer I have no interest running around talking about 4.294967296 "GB" of memory. Or convoluting my tounge with silly made-up words.

Andrew Russell on September 11, 2007 9:24 AM

@sean

I am a different Andrew, but I will answer your question. Most things are stored, accessed, or otherwise dealt with in powers of two. There is a technical reason why you use powers of two, mostly dealing with the fact that you have a binary state (high voltage vs. low voltage) which is used to compute everything. Memory, for instance, is a series of binary elements indexed by an address defined by a series of binary elements. Wikipedia can give you more information (look up a muxer, for instance) but basically everything is stored in a medium (almost everything, anyway) that in the end is a power of two. As was previously mentioned, 1 gibibyte of RAM is an exact number, because that is how it is built (1024 mebibytes). So it makes sense to advertise it as such, rather than 1.024 megabytes. Early geeks just thought it was kind of cool how you could round down the numbers and not lose much, which is what got us to where we are today.

Nowadays, however, the difference between 1000 and 1024 (ambiguous) gigabytes is five movies, or your music library, or your photography collection. There is a practical aspect to the binary notation, it is just unfortunate that there is so much momentum to stick with it.

Hope that helps clear up some confusion about the topic.

Andrew on September 11, 2007 9:27 AM

Many fields of study invent their own terminology for their use by co-opting words from general use. Using these terms differently does not make them "wrong", it just makes them technical jargon specific to the field.

Within the field of computer science, one kilobyte = 1024 bytes. This isn't wrong, in fact, the other view (1 KB = 1000 bytes) IS wrong. It's wrong on several levels.

First, it attempts to use the wrong meaning of an overloaded word, rather than the one that is correct in context. You don't complain when a physicist talks about the color or spin of a fundamental particle, even if you know the particle has no "color" nor is it "spinning". The physicist isn't wrong, he's just using terms correctly in a physics context, where their meaning differs from other contexts. Or is the assertion here that physicists can co-opt words for their own meaning, but computer scientists are "wrong" for doing the same?

Second, it mistakes "byte" for an SI unit. According to SI, a kilometer is 1000 meters. This is very true. However, it's absolutely false that, according to SI, a kilobyte is 1000 bytes. SI has no more to say on how many bytes are in a kilobyte than it has to say on how many feet are in a mile, since neither miles nor bytes are SI units. Incidentally, kilobyte has traditionally been abbreviated KB, not the capital K. The SI "kilo" prefix is abbreviated with a small "k", but since the "kilo" in kilobyte is NOT the SI prefix, this is irrelevant.

GT on September 11, 2007 10:26 AM

andrews: you both explained why memory is sized in powers of two (actually it's multiples of bus width, but that's a power of two).

Neither explained why it's better for explorer and other apps to describe filesize and diskspace in powers of two. Why is the average user exposed to this?

Sean on September 11, 2007 10:30 AM

"Google's wrong.
a href="http://www.google.com/search?hl=enq=1+megabit+in+bytesbtnG=Search"http://www.google.com/search?hl=enq=1+megabit+in+bytesbtnG=Search/a
That's not correct by anyone's definition"

Actually what google says: 1 megabit = 131 072 bytes
is in fact correct.... do the math: 1024*1024/8 = 131072

lubos on September 11, 2007 12:48 PM

I don't know about you, but I can grasp "30 hours of video" better than "500 GB".

Yeah, but then you get to the next Monty Python question - are those 30 hours MPEG-2, DivX, DVD-quality, HD-quality at 720, 1080, or...? Same with the "songs" metric for iPods - bitrate isn't taken into account, just the default length size of a pop song.

As long as we can get 'm to switch after tera, then I'll be happy enough.

Rob Janssen on September 11, 2007 12:49 PM

Karel: "Baud is not bit per second, it is symbol per second."

Indeed. That's why I wrote "naturally encapsulates" rather than "is" and took care to specify a serial line.

gwenhwyfaer on September 12, 2007 2:08 AM

gi-bE-BYE-TUH

bin-ary
or

gi-bye-bye-tuh?

as in bye-nary

apeinago on September 12, 2007 2:20 AM

What I don't see explained or knew was: How can 1TB drives have different capacities? I thought they allways sported 10^12 raw bytes

You know what is even more cheatish than the fact above? Some laptops and computers have a hidden "rescue partition" with a copy of the OS installed in it for easy recovery. I still knew a friend of mine got so angry at a computer store that sold him a 60GB laptop that had only 25 MB of free space after a clean installation - appearantly Windows XP + all the "apps" took 15 GB *and* there was a 20 GB recovery partition. After a lot of swearing he got a 80 GB laptop of the same brand though (which still had 50 GB free space - here the rescue partition of an otherwise same install was only 15 GB)

Tijmen Stam/IIVQ on September 12, 2007 2:36 AM

The 1.44 3 1/2" floppy shows the problem best

Unformatted capacity 2MB (as per hard drive so 2,000,000 bytes)

Formatted capacity = 1,474,560 Bytes
or as 1024x1024 = 1.41 MiB
or as 1000x1000 = 1.47 MB
but 1.44M as quoted is the size in units of 1000x1024!

The hardware people love this because it make their drives look bigger

The Software people (especially OS people) love this because it gives them an excuse as to why there is a large difference between the advertised size and the usable capacity, see above the advertised size is 2MB but the normal usable capacity is however you show it less than 1.5MB (There was Microsofts DMF formatted floppies which were 1,763,328 bytes (1.68 MiB) but still way off the total capacity)


Jaster on September 12, 2007 2:53 AM

I'm just waiting for the hard drive manufacturers to define a byte as 10 bits.

That's essentially what this whole argument comes down to. Bytes are arrangements of bits, and kilobytes are arrangements of bytes. Megabytes arrangements of kilobytes, bytes, and bits; and so on. It doesn't make any sense to arrange bytes in decimal values because they are themselves a binary representation, 2^3 bits.

We could, as programmers, abstract it all to decimal for the end user. The question is whether or not we should, just to legitimize the choice of the hard drive manufacturers (one of the very small number of groups that actually distort these values). The end users don't care a bit until they buy a hard drive that turns out to be smaller than they thought it would be.

Another area in which this is often done is with network speeds, especially in the world of dial-up modems. Unfortunately for most people, these devices rarely deliver even the full speed promised in decimal (because of the network they're on, usually), so it's rarely an issue that people face with their newly purchased modem or NIC.

Vizeroth on September 12, 2007 2:53 AM

The worst is Broadband ....

That Lovely 10Meg Broadband connection how fast is it

10MB per second ...?
10MiB per second ...?

No 10,000,000 *Bits* per second .....

Jaster on September 12, 2007 3:04 AM

I don't think it's a "trick" that storage manufacturers use ... it's simply the well-established tradition. And besides, it is correct. Hard to fault them for being right.

I've been writing low-level software that deals with storage (drivers, filesystems, CD/DVD burning, etc) for about ten years now. In order to stay sane, I have become pedantic about it and always specify binary vs decimal KB/KiB, MB/MiB, GB/GiB, etc. In verbal communication I don't bother so much, but you really have to be explicit in code and other written communications.

The only problem I have with the whole thing is that it's not an easy 1:1 mapping. The new prefixes have the virtue of being unambiguous: when someone writes "2 GiB" it's perfectly clear what they mean. But the old prefixes haven't been fixed. When someone writes "2 GB" you have to consider the context and decide whether they mean 2.00 decimal GB or 2.00 GiB = ~2.15 decimal GB. Ugh.

Drew Thaler on September 12, 2007 3:07 AM

Magnetic disks tend to store files in multiples of 512 bytes (sectors). Optical discs tend to use 2048 byte data sectors. Ignoring powers of 2, the capacity is always a multiple of 2048 or at least 512. Any disk that's got a capacity below 1 MB is probably using 2^10 even if it says KB.

It looks like that's about where they switch from 2^10 to 10^3: http://www.buildorbuy.org/floppydisk.html

josh on September 12, 2007 3:08 AM

Although I understood the difference between SI and IEC units,
this still bit me when I was creating disk images for installing
to flash (compact flash, usb keys etc.).
Even though flash is based on powers of 2 logic internally,
manafacturers use _different amounts_ of space for wareleveling etc.
So if you use the max space available for one flash device, the
image may be too big for another flash device.
The only safe thing to do is to only use the SI space.
I.E. if you have a 64MB flash, only use 64000000 bytes of it.

I find the unix units command handy for these conversions:
units -t '500GB' 'GiB'
http://www.pixelbeat.org/cmdline.html#math

Pdraig Brady on September 12, 2007 3:29 AM

jaster, The broadband situation is even worse than that.
Here's a handy calculator so you can switch between MB/s Mib/s ...:
http://www.pixelbeat.org/speeds.html

Pdraig Brady on September 12, 2007 3:36 AM

When I was 10, I remember my dad trying to explain to me the relative capacity of 20 MEGABYTE hard drive he got in a new computer. I asked him if it was possible to ever fill up that much space on a hard drive. He said that, practically, it was not possible.

Then when I went to high school, a friend of mine had a father with an Audio/Video production facility, and he told me he had an external 1 GB drive (he pronounced it "Jigga-byte"). I nearly fainted at the sheer magnitude of drive space.


Matias Nino on September 12, 2007 3:48 AM

Given all the confusion here in the comments section of this blog, I think making a distinction between kilo (=1000) and kibi (= 1024) is very useful.

In our compnay, wehave been doing this for several years and it saved us from a few embarrassing errors in our software.

For those that do not like the word 'kibi': now that is what I call a relevant argument. Get over it.

Karel Thnissen on September 12, 2007 4:02 AM

@Verizoth

If they could, hard-drive manufacturers would be much more likely to redefine a byte as 5 bits. This gives a bigger number on the front of the box cynicism

Cog on September 12, 2007 4:36 AM

19"(17.4" Viewable), 500GB (465GB Usable).

Ah yes, excellent example.

The discrepancy on monitor size only existed for CRTs, because the CRT tube itself was always partially obscured by the bezel of the monitor, and not all the tube could be used for display phosphors anyway. So a 19" CRT tube ends up with 17.4" of viewable space after you factored this stuff in.

Now that we've all pretty much switched to LCDs, this is a moot point. LCD monitors don't use tubes; every inch of the flat panel (well, probably 99% of it) is filled with RGB elements visible from edge to edge. Thus, a 19" LCD is by definition, a 19" viewable LCD. :)

Jeff Atwood on September 12, 2007 5:10 AM

@Jaster
Wrong answer!
10 megabits/s = 10 485 760 bits/s

tema on September 12, 2007 5:12 AM

It seems totally absurd that to redefine well established units of measurement in order to create metric units. This is like saying that from now on, a mile will be 1000 yards: get over it.

A more appropriate course of action would have been to define new metric units of measurement.

kidebytes?

fgb on September 12, 2007 5:16 AM

I remember when drives were defined using power-of-2 designations. The manufacturers definitely changed at some point. Problem is, if one changes, they all have to change, or their drives look smaller in comparison.

The problem, for those who ask, is that computer memory is always defined with the power-of-2 system. So, there's a mismatch. Back when disk drives were close to the size of RAM in your machine it mattered more, I guess.

Thanks for the info, Jeff. Hadn't seen the SI power-of-2 system before. Think I'll start using it just to confuse everybody I know.

A. Lloyd Flanagan on September 12, 2007 6:03 AM

Besides, a company that sold good CRT monitors rarely made it even remotely hard (they usually printed it on the box, though generally in smaller print) to find out what the viewable area is on the monitor.

Half the time you have a pretty hard time finding out what the actual formatted size of the hard drive is before you put it in your computer and format it (even if you're using a very common file system).

In my personal use, I rarely run into issues with this sort of thing, except when I have to explain to someone why their new 500GB hard drive can't actually hold 500GB (usually just explaining the powers of 10 vs 2 is enough for them without going into gritty details, since Windows still displays drive space in powers of 2 (ie my primary partition is listed as 89,589,747,712 bytes - 83.4 GB in the drive properties dialog)).

Vizeroth on September 12, 2007 6:14 AM

Hey Now Jeff,
I'm so glad I read this post, I always wondered why there was space missing. Now I know the reason why the drives show up as less space. I discovered your great blog though a shrinkster link on a DNR show not googol.
Thx,
Catto

Catto on September 12, 2007 6:16 AM

@Tema

No, he is correct. Network speeds have always been in bits per second, and have never used powers of two. Your old 28.8k modem was 28800 bits per second if it could negotiate that rate over a potentially noisy phone line. And those bits were the signaling speed of the line, of which there was usually framing and error protection and detection overhead. Even with RS-232 signaling for example at 9600 you have an overhead for a start bit and a stop bit (assuming 8 bits per word, no parity, 1 stop bit).

Personally, I think software should start to standardize on using the SI meanings to display sizes of things (file sizes, drive sizes, download speeds, et cetera). Those things are almost never an even power of two. The only place it doesn't make sense to use that notation is total size of memory that is inherently a power of two because of how the hardware is made (e.g., CPU cache, system memory). That's the only place I can think of you'd need some kind of qualifier on the spec sheet or on the package.

Brendan Dowling on September 12, 2007 6:18 AM

I can handle the drive-capacity issue, but I wonder why we don't use the SI prefixes for RAM?

Daniel 'Dang' Griffith on September 12, 2007 6:29 AM

Great, now we have to deal with imperial metric measurements. d'oh!

In the longer term having a measurement which is 1024 instead of 1000 will seem as silly as having 12 inches in a foot or 3 feet in a yard.

The simplest solution that I can see is that everyone switches to K=1000. Your average Joe wouldn't notice the difference.

-Andrew

Andrew on September 12, 2007 6:49 AM

Sean: why list file sizes in KiB, rather than KB? Because even though hard disk manufacturers have squatted on the traditional descriptions of size like dogs in mangers, hard disks are still naturally sized in 512-byte units. So all file sizes are rounded up to multiples of at least 2^9 - if not more, when one considers clusters of blocks.

I think that's what galls me the most - hard drive manufacturers aren't even using the best units for their products in their keenness to pull a fast one on their customers.

Meanwhile, Sean, have you noticed that you're the only person mounting a strident defence of the new way of doing things - almost to the point of telling anyone over 25 that they're brain-damaged...? Methinks thou dost protest too much.

As for the question of Mbits/s, once upon a time there was a unit that naturally encapsulated the "bits per second" measurement; it was called Baud. I remember 300 baud modems; somewhere around the 14.4 era, Kbaud (which as has correctly been stated, was always a decimal measure, having long predated the era of binary computers) suddenly became Kbps. If one were to refer to "gigabit Ethernet" gigabaud instead, the confusion goes away. (No, it's a complex unit - so is the volt, but nobody talks about joules per coulomb.)

So I suggest that the status quo is just fine:

A megabyte is 2^20 bytes, the natural measurement for memory.
A megabyte per second is the natural measurement for data transfer across parallel buses.
A megabaud is 10^6 bits per second, the natural measurement for data transfer across serial lines.

There. What's the problem? What's ambiguous about it? Why should we change the way things have been done for decades just because hard drive manufacturers are greedy? (They've always been greedy. Remember "unformatted capacity", anyone?)

gwenhwyfaer on September 12, 2007 6:55 AM

Memory sizes weren't originally binary, they were originally decimal, just like hard disk sizes. 8,000 digits of RAM meant 8,000 decimal digits, and if that's what you had, then 8K wasn't even an approximation, it was exact. 20,000,000 digits of disk storage meant 20,000,000 decimal digits, and if that's what you had, then 20M wasn't even an approximation, it was exact.

Of course you can fit more information into memory by storing values in pure binary instead of in BCD. If you have 4 bits plus parity bit plus some other stuff, and only store digits 0 to 9 plus some other stuff, you're wasting resources. Also computations in BCD are far slower than in pure binary. But it was easier for customers and programmers to use decimal, so computer manufacturers delivered BCD.

Then computer manufacturers decided that customers and programmers could understand binary well enough, so in order to maximize storage capacity and speed with the same amount of resources, they started delivering pure binary instead of BCD.

Well, it looks like they were wrong. Customers don't understand binary well enough. Even Jeff Atwood gets confused. Computer manufacturers should have stuck with BCD.

Catherine Zetta-Yones on September 12, 2007 7:02 AM

Rob wrote:
Yeah, but then you get to the next Monty Python question - are those 30
hours MPEG-2, DivX, DVD-quality, HD-quality at 720, 1080, or...? Same
with the "songs" metric for iPods - bitrate isn't taken into account,
just the default length size of a pop song.

Well, usually, it is MPEG-II in movies (since it is the most common format for consumers), and 128kbit/s VBR MP3 (most common, again) for music.

But yes, it still needs qualifiers. Still, it is more "life-like" in a "What can I do with so much space?" kind of way, than the dry, scientific, and very-much-geek-friendly state of bytes.

In a way, it is a usability thing: It puts something in relation to something else familiar. Which is, of course, error prone. No silver bullet, either.

Phillip Gawlowski on September 12, 2007 7:04 AM

Why do I get the feeling you wrote this entire post to deliver the pun at the end?

Yakka on September 12, 2007 7:19 AM

From my understanding IEC prefixes were developed by Mushmouth formerly of Fat Albert and the Cosby kids during his brief employment at the Institute of Electrical and Electronics Engineers.

Mike on September 12, 2007 7:24 AM

Baud is not bit per second, it is symbol per second. That is something completely different if symbols consist of more than one bit.

The confusion is even larger than I thought it was. Our job as developers is to be precise in our specifications, designs and code, so we have to change something. To bad so many here are not willing to see that there is a problem, or expect that the *rest* of the world should change.

Karel Thnissen on September 12, 2007 7:45 AM

Shannon wrote: "Honestly, it never used to be a problem. SI prefixes have always been powers of two for binary quantities (which is only bytes) and powers of ten for decimal quantities."

I don't think so. If I'm not mistaken, kilobytes/second has always meant 1000 bytes/second, when referring to modem transfer rates.

This issue is that the use of the kilo/mega/giga prefixes is *ambiguous*, even if followed by the word "byte". For anyone who thinks this whole discussion is stupid, then you won't mind if I borrow $1024 from you (a kilobuck), and eventually pay you back $1000 (a kilobuck)?

I work in software development with some very smart people. If I had a file 5,323,123 bytes in size, and someone asked me "How big is that file?", I would be forced to answer "about 5.3 *megabytes*". You know why? Because, in an informal discussion, they would understand "5.3 megabytes" to mean "about 5,300,000 bytes". If I had bothered to do the math and answered (more correctly) "about 5.08 *megabytes*", pretty much no one would think I meant "about 5.08 * 1024 * 1024 bytes". Binary calculations may be easy for computers - not so for human beings. We all know that the proper convention for kilo/mega/gigabyte is binary, but in practice, nobody is shy about using those prefixes in the decimal sense. And nobody says "5.3 million bytes", although you'd think that'd be a reasonably unambiguous alternative.

For those who still say it's not ambiguous - to the average customer it is, for all practical purposes, if they have to remember/understand that:
- Windows/Linux will report memory/file size in binary units
- Hard drive sizes are specified in decimal units
= DVD sizes are specified in decimal units (a 4.7 GB DVD packed full of data will show up as ~4.38 GB in your favourite OS)

And just try explaining to the average person why this is so. I've seen plenty of people asking the following question on various PC/gaming/tech forums:
"I just bought a 250 GB hard drive. How come it only shows up as 232 GB (or whatever) in Windows?"

Among the misleading answers I've seen:
"Windows 'lies' to you about the disk space"
"The hard drive manufacturer 'lies' to you about the drive size"
"250 GB is the 'unformatted' capacity. After you format the drive, you only have 232 GB left" (*)

Just because the industry has been doing the wrong thing for decades, doesn't mean it's correct or user-friendly to continue doing so.

(*) You may be laughing about how bogus this explanation sounds, but I've heard it from professional IT managers. If someone in the industry cannot be bothered to know/understand that GB has 2 meanings, good luck explaining that to the average joe on the street. This is by no means a criticism of them - it is actually an indictment of the tech industry. It's no wonder that "techies" have a rep for poor communication skills, since we feel the need to redefine well-known prefixes with ambiguous meanings.

Will on September 12, 2007 7:57 AM

They should label the HDDs like monitors. 19"(17.4" Viewable), 500GB (465GB Usable).

Mattkins on September 12, 2007 8:10 AM

I'm not sure a field that has wholeheartedly embraced silly made-up unit names (byte, nybble), in-jokey recursive acronyms (GNU, LAME), stupid, obfuscatory acronyms (PCMCIA), and the brain-melting stupidity of deliberately meaningless non-acronym acronyms (NT, XP, .NET) has any standing to complain that kibibytes and mebibytes "sound ridiculous." We're already soaking in ridiculous. Pissing on "mebi" et al is like pissing into an ocean of piss.

While it may be natural to use ordinary decimal measurements for storage media like optical and hard disks, solid-state storage like RAM and flash are likely to come in powers-of-two-sized chunks for the foreseeable future. That particular opportunity for confusion isn't likely to be cleared up entirely by some simple policy change.

Western Infidels on September 12, 2007 8:16 AM

How is NT a "non-acronym acronym" ? Looks like a regular acronym to me. It even makes sense, which is something I can't say about every acronym.

J. Stoever on September 12, 2007 9:42 AM

As any sys-admin from the 90's knows, NT is an acronym for "Nice Try".

Andrew R on September 12, 2007 10:11 AM

Anyone with a programmig history going back at least to the 16-bit machines is likely to
think something like:

1 byte = 8 bits
1 KB = 1024 bytes
1 MB = 1024 KB
1 GB = 1024 MB
1 TB = 1024 GB

If GB is 10^9 bytes for hard drives, then consistency dictates the same for memory. So your memory would actually come in chunks of 1.073741824GB.

"You need to upgrade this machine to at least 2.147483648GB of memory"

For programmers it makes sense with a notation that uses the 1024 bytes = KB shorthand if you do anything related to memory.

It's just a sad day when it got hijacked by the SI standards crowd.

It's not like there are things like centibytes or anything. Or maybe we should create a "decibyte" which then would be a practical 1/10th of a byte, or in other words 0.8 bits.

Yeah. Now we're talking.

Nibbles, here we come.

Christoffer Lern on September 12, 2007 10:15 AM

Andrew Russell This is why Linux won't be a consumer operating
Andrew Russell system - because it has the attitude that this
Andrew Russell whole messy rename thing should be exposed to
Andrew Russell end-users.... I'd say it's a form of elitism.

Why are you saying this like it's a bad thing? We have a consumer operating system, from Microsoft, and it's a poor product for a premium price. And exposing end-users to knowledge? Oh, no, where will this horror end!

We should expect people to raise their standards, not lower ours. Elitism is good. When did striving to be better informed and more capable become some kind of insult?

Geno Z Heinlein on September 12, 2007 10:16 AM

'They should label the HDDs like monitors. 19"(17.4" Viewable), 500GB (465GB Usable).'
---

That isn't the best analogy.

A) For the 2 reported measurements of the CRT monitor, there's a discrepancy in *what's being measured* ('tube size' versus 'viewable size'), but the unit of measurement (inches) is the same.

B) For the 2 reported measurements of the hard drive, what's being measured (storage space) is the same, but the *unit of measurement* (decimal GB vs. binary GB) is different.

To say a hard drive has 500 GB (465 GB usable) of space is misleading to say the least. I could just as easily turn around and say that *500 GB are usable*, as long as you define 1 GB = 1,000,000,000 bytes. And as we all know, that is exactly what hard drive manufacturers do.

Will on September 12, 2007 10:21 AM

J. Stoever: Microsoft originally said NT meant "New Technology." As opposed to that Old Technology they usually push, I guess. Later Microsoft divorced itself from that acronym, insisting that NT didn't stand for anything and adopting it as a simple product-line name. The Windows 2000 boot screen even says "built with NT technology," which clearly makes no sense if NT is an acronym.

Western Infidels on September 12, 2007 10:32 AM

Is there any other example of an ISO standard that redefines accepted ussage and makes up something totally new to replace it? That strikes me as exactly what standards bodies are *not* supposed to do.

Chris on September 12, 2007 10:32 AM

Jeff, I think we ought to just forget about making the power-of-two-types use the silly IEC names. It's just not going to happen.

Alternatively, we could go on writing "megabyte", but use the abbreviation "MiB". I wouldn't mind using the abbreviations, but I'm just not going to say "meh-buh-bite". It's just silly.

aggieben on September 12, 2007 10:55 AM

Who can name the bigger number ?

Just a follow-up with Jeff's disgression about really big numbers : some of you might have heard about a family of numbers, called "Busy Beavers" (they were introduced by Rado in 1962).

I think these numbers still hold the record for the biggest number series ever imagined, and the way they are defined - which has a lot to do with computer theory - is fascinating :

Imagine the simplest a href=http://en.wikipedia.org/wiki/Turing_machine#Informal_descriptionsimple turing machine/a, which would read its instructions on a tape.

Then, for a given N, feed this machine with all possible programs which can be coded with N instructions on the tape. Out of these programs, some will never end, and some will halt at some point. Out of those which halt at some point, let's consider the one which halts after the longest number of steps (which you could see as processor cycles) : this particular program is called the "Busy Beaver", and we can then define BusyBeaverB(N) as the numbers of steps it takes before it halts.

So, how big is BusyBeaver(N) (or BB(N) )?
As a matter of fact, it is big, very big : the values are known of BB for N=1...4, then the other values (for N=5 and higher) are still unknown and may well be out of reach of any human brain and computer processors. It was proved that BB(5) is higher than 8,690,333,381,690,951 but it might well be much greater.

Now, suppose that one particularily guilted programmer writes a complex mathematical library that can handle any large numbers and advanced arithmetical operators (such as the a href=http://en.wikipedia.org/wiki/Knuth%27s_up-arrow_notationKnuth's up-arrow notationa). Suppose then that this programmer uses this library to the best possible program that would output the largest number one ever thought of. To be fair, let's also accept that the execution of this program might require billions of years or more.

Turing proved that his result will be far (by galactical dimensions) below the Busy Beaver values : as a matter of fact BB(N) beats any other computable number!

For those who are interested, here is a recommended link for more information about those numbers : a href=http://www.scottaaronson.com/writings/bignumbers.htmlhttp://www.scottaaronson.com/writings/bignumbers.html/a">http://www.scottaaronson.com/writings/bignumbers.html/a">http://www.scottaaronson.com/writings/bignumbers.htmlhttp://www.scottaaronson.com/writings/bignumbers.html/a

And an extract which explains this in details :
"
Turing proved that this problem, called the Halting Problem, is unsolvable by Turing machines. The proof is a beautiful example of self-reference. It formalizes an old argument about why you can never have perfect introspection: because if you could, then you could determine what you were going to do ten seconds from now, and then do something else. Turing imagined that there was a special machine that could solve the Halting Problem. Then he showed how we could have this machine analyze itself, in such a way that it has to halt if it runs forever, and run forever if it halts. Like a hound that finally catches its tail and devours itself, the mythical machine vanishes in a fury of contradiction. (That’s the sort of thing you don’t say in a research paper.)
"


Pascal Thomet on September 12, 2007 11:52 AM

Permalink for that Ned Batchelder post: http://www.nedbatchelder.com/blog/200709.html#e20070909T081225

Kartik Agaram on September 12, 2007 12:38 PM

although "Yotta getta life" dude

Baha on September 12, 2007 1:05 PM

Sean: They're not really exposed to it - simply because "that drive has 120 and that drive 160". 160 of what? they don't really know - except one can hold more illegally downloaded movies.

Frankly, users should be seeing size expressed as either "30 hours of video" so they can understand it, or the commonly accepted industry standard (which, sadly, is decimal for HDDs and binary for everything else) so products can be fairly compared (because you just *know* that one vendor's hour of video will be at a different bitrate to another's).

It's actually unfortunate that end-users could end up having to also deal with this messy "rename", along with software developers.

And if I may trigger a linux-Windows flamewar for a moment: This is why Linux won't be a consumer operating system - because it has the attitude that this whole messy rename thing should be exposed to end-users. The same applies to Wikipedia. I'd say it's a form of elitism.

Andrew Russell on September 12, 2007 1:08 PM

Yes hard drive manufacturers use SI prefixes correctly, as you observed. So do CPU manufacturers, as Sean observed. Did you complain about the GHz of your recent CPU purchase not being binary GHz?

When you hire someone for $100K, you pay them $102,400, right?

When you measure response time in milliseconds, that's how many times 1/1024 of a second, right? You buy a 47K resistor, that means 47 x 1024 ohms? No ... because you said you don't use metric. In your recent power supply purchase, you measured it in BTUs because you don't use metric. 1K BTU is 1024 BTUs, right?

Don't think about petabytes. You don't want to estimate how much spam there is each day.

Catherine Zetta-Yones on September 12, 2007 1:23 PM

To me, it seems there is a perfectly logical reason why hard drives are measured in derivatives of base-2 - and that is that any and all programs, or any other bits of data, when viewed in their rawest form, are stored in binary on the actual disc. As such, it makes complete and total sense to use base-2, or some derivative of it, to measure the size of the disc.

Oh, and John Pirie? To most people buying hard drives, whether it be by themselves, or as part of a pre-built system, '500GB' means nothing, really, by itself. To them, the only thing it means is that it should be the case that this is how much space Windows says the drive has. Currently, Windows, just like most other operating systems, will report a '500GB' drive as 465GB. In my opinion, drive manufacturers should measure the size of their drives the same way most computers would, simply in the interests of being clear to the end user.

Zmiddy on September 13, 2007 4:38 AM

Damn whippersnappers.

Drives have been measured in powers of ten since the dawn of time, back when 32K (yes, K) was a big expensive disk. There's no conspiracy here. There's simply *NO GOOD REASON* to measure hard drives in powers of two, never was, never will be. If you want to blame someone, try DOS. They probably converted by (binary) shifting.

Memory, on the other hand, always grows by powers of two (add an address line, get double the memory.) There it makes some sense to measure in powers of two.

Next you young turks are going to say you want to measure your broadband speed in powers of two! Arrgh!

Now get offa my lawn!

-hans

Hans on September 13, 2007 5:09 AM

I can't even bring myself to read all the bs in the responses. Here's a quick fix. Look at Western Digital's new drive. It's called a 1TB drive. Here's the physical specs

Physical Specifications
Formatted Capacity 1,000,204 MB
Capacity 1 TB
Interface SATA 3 Gb/s
User Sectors Per Drive 1,953,525,168

see the rest at:
http://www.westerndigital.com/en/products/Products.asp?DriveID=336

The Scientists see units at 10^2; just the way it is. Consumers see units in 10 base units. Advertisers used the discrepancy to pad pocket liners. Western Digital was sued over this and has made the change. Now, we cannot make the Scientists change from counting data in binary units (which is how we got here to begin with), so let's just grow up and get over it.

stoogemaster on September 13, 2007 5:31 AM

Aaron G is completely right. The fact is, there is a meaning that most people have been assigning to terms like ‘kilobyte’ for decades. Some standards organization cannot come in and dictate that we all change how we use these terms any more than the French can be told not to use ‘e-mail’. You just cannot dictate human language. It won't change how people use these terms, but it will cause a lot of confusion instead.

I can understand the usability reasons for using decimal notation, but redefining decades-old common terms won't do it. Instead, all it does is cause people either to ignore that “standards” or simply to change to the new funny-sounding binary prefixes. In the end, the only ones using the decimal system are marketers. If they really wanted people to switch to base 10, new terms should have been invented for that instead (such as ‘kidebyte’).

David on September 13, 2007 5:41 AM

One thing that lept out at me from the table in the article is the etymological derivations of the prefixes -- the higher ones, at least.

petta, exa, zetta, yotto = penta, hexa, septa, octo = 5, 6, 7, 8

Now if those prefixes stand for 2^50, 2^60, 2^70 and 2^80 they make sense, etymologically-speaking. 5, 6, 7, 8, see?

But if they stand for 10^15, 10^18, 10^21 and 10^24 there's no obvious relation between the names and the values -- they seem entirely arbitrary, and that's not good for memorability.

(Actually they're not arbitrary, they're a factor of three out, but that's not really going to help them from a mnemonic point of view.)

Considering that not many people who aren't storage nerds use petta- yet, let alone the others, the SI might as well come up with *different* decimal-friendly prefixes and leave the ones that actually refer to the binary powers alone. Surely?

Andrew Clegg on September 13, 2007 7:00 AM

I think it's a bit out of line to characterize adoption of the International System of Units as "an old trick perpetuated by hard drive makers."

To a whole lot of buyers of those drives, 500G just means five hundred trillion.

John Pirie on September 13, 2007 10:24 AM

Yobibyte sounds funny in Russian. "Yob" is a root of the Russian equivalent of "f#ck".

Dmitry on September 13, 2007 11:05 AM

Makes me want to pay for the drive in Canadian dollars.

brian on September 13, 2007 12:16 PM

@ Matias When I was 10, I remember my dad trying to explain to me the relative capacity of 20 MEGABYTE hard drive he got in a new computer. I asked him if it was possible to ever fill up that much space on a hard drive. He said that, practically, it was not possible.

Consider the amount of work you have to do to generate simple text (without all the cruft of Word files) - if you have to do it yourself, 20mb means a lot of typing. If that's what your dad had in mind, his point is quite understandable for that time.

I recall a friend of mine getting one of the first Pentiums (at 90MHz, which wasn't spectacular because our 486 ran at 80) with a gigabyte of harddisk space. I thought the same - how to fill all of that? Luckily, games that came on CD wanting to have their content installed from the harddisk because double-speed drives weren't fast enough solved that :).

Anyway, back to the 20mb drive. Go to a more advanced mode of content generation; image editing. Autodesk Animator makes 320 x 240 GIF files in 256 colors - if the computer could handle this in the first place. I think the time of 20 megabyte drives still had amber or greenscreen most of the time (I only had a C64, never an ST or Amiga). Every file is at most 60kb (if you'd make it random noise; if it contains actual art it's probably less). By the time you've made several of these files, you'd use floppies anyway (because you don't want to completely fill up the harddisk).

Now, shoot a few 4 megapixel pictures and you're through those 20 mb.

Rob Janssen on September 13, 2007 1:28 PM

Sextillion sounds funny in English, for the same reason.

Catherine Zetta-Yobs on September 14, 2007 2:14 AM

Yup that happened to me. I bought 2 500 GB drives installed in a RAID configuration and installed my OS expecting to see 1TB and all I got was 931 GB. :(

If the drive manufactures are going to do that then a they should be selling 534 GB drives instead of 500 GB drives.

I WANT MY "1 TB"!!!!

Tim on September 14, 2007 2:39 AM

...any more than the French can be told not to use ‘e-mail’...

Which is being replaced by a French equivalent not without success. In the same way that they (the French) forced upon us the metric system, the use of family names, a birth register, etc.

Things like these are possible and done all the time.

Karel Thnissen on September 14, 2007 3:02 AM

Instead of forcing these ridiculous kibibytes and gibibytes on us, which is impossible for us macho ADA .Net programmers, and has created even more confusion, why can't the HDD manufacturers stay in SI-land without overloading our KB's, MB's and GB's?

My solution is simple: write 1K B (1000 bytes) instead of 1 KB (1024 bytes for programmers and computers, but 1000 bytes HDD manufacturers and buyers).

Everybody can save face: HDD manufacturers won't be lying any more, just by subtly moving a space, and programmers won't have to agree on disagreeing or agreeing with the kibibytes whenever they talk about volumes of bytes.

Joost on September 14, 2007 3:34 AM

Instead of forcing these ridiculous kibibytes and gibibytes on us, which is impossible for us macho ADA .Net programmers, and has created even more confusion, why can't the HDD manufacturers stay in SI-land without overloading our KB's, MB's and GB's?

My solution is simple: write 1K B (1000 bytes) instead of 1 KB (1024 bytes for programmers and computers, but 1000 bytes for HDD manufacturers and buyers).

Everybody can save face: HDD manufacturers won't be lying any more, just by subtly moving a space, and programmers won't have to agree on disagreeing or agreeing with the kibibytes whenever they talk about volumes of bytes.

Joost on September 14, 2007 3:35 AM

Let us not forget that screen measurements are on the diagonal.
I still remember taking a tape measure with me to buy a TV, because at that time there was "confusion" over the "proper" way to measure the display's size. IMO, it should always have both dimensions listed, just like the video cards do.

Doofus on September 14, 2007 6:01 AM

I rather waiting for a price drop in the terrabyte drives. I want to fill a T with my 0wn rainbow tables :)

kaylaangel on September 14, 2007 8:24 AM

I know using 1024 as 1k etc is part of computing history, it especially makes sence in memory referencing etc. But why the hell does windows (and other programs are just as guilty)count files on hard drives this way. Ever tried comparing file sizes when ones in megabytes and ones in gigabytes across say two different programs and you don't know if there working off 1024 or 1000. You end up having to look at the full number in bytes to be sure your correct.

Pete on September 14, 2007 9:27 AM

Mainstream computers (this means every PC, Mac or otherwise) are binary computers without exception. We store our data on hard drive that write the data as binary (ones and zeros only please).

The blocks of data that you write to the hard drive are done in blocks which are ALWAYS multiples of 512 bytes. For example, 512 bytes, 1024 bytes, 4096 bytes.

There was an example of the Linux command "dd" use which was:

The problem has an easy solution:

The OS needs to start using SI prefixes correctly. Linux already does.

$ dd if=/dev/zero of=test bs=1MB count=10
10+0 records in
10+0 records out
10000000 bytes (10 MB) copied, 0.0261481 seconds, 382 MB/s

There's no reason to use powers of two for displaying file sizes. It's ridiculous and makes it more confusing for the user.
Sean on September 11, 2007 03:38 PM

This is a horrible example I/O use. The bs variable should also be one of the above examples block sizes (512, 1024, 4096) and specifically should be the block size of the filesystem to which you are writing. I agree the Linux is literal in its interpretation of MB but you are using the "dd" command poorly, Sean. Linux will do what you are asking, but please don't waste your lovely OS's time with commands like that. :P

Finally when you are referencing data in the processor you use binary to do so and we humans read hexadecimal more easily. In hexadecimal, 1024KiB is 0x400 KiB. 1000 KB is is 0x3E8 KB in hexadecimal.

If you want to use metric, be an engineer. Meanwhile, leave my metrics alone. 1024 bytes is 1 KB. The computer knows this. The programmer knows this. Even web developers use hex. Colors codes are hexadecimal.

Do mechanical engineers want me to redefine 1 meter as 99.53 cm? No, they don't. Leave your decimal out of my pure binary computer.

overlordofmu on September 14, 2007 10:30 AM

I think the problem is that in CS, we can use whatever word for whatever purpose and its OK. Now, CS is a large field that has great tendency to overlap other fields like finance (for example). For a computer scientist, kB is ok to be 1024 byte. For a manager, its not. Before, kB were used only by computer scientists, now they are used everywhere.

I personally think that all this is an issue of whether we tell the world about that (like in school when we learn about kilometers) or we change to use the kilo as one thousand meaning.

I also think that 1 kB as 1024 is usefull when programming.

loki.jf on September 14, 2007 1:37 PM

Joost wrote: "My solution is simple: write 1K B (1000 bytes) instead of 1 KB (1024 bytes for programmers and computers, but 1000 bytes for HDD manufacturers and buyers)."
---
And how you would pronounce "1K B" so it's distinguishable from "1 KB"? Not all communication is written or electronic. Sometimes you actually have to *talk* to people.

Will on September 16, 2007 4:49 AM

More comments»

The comments to this entry are closed.

Content (c) 2012 . Logo image used with permission of the author. (c) 1993 Steven C. McConnell. All Rights Reserved.