Everyone who has ever purchased a hard drive finds out the hard way that there are two ways to define a gigabyte.
When you buy a "500 Gigabyte" hard drive, the vendor defines it using the decimal powers of ten definition of the "Giga" prefix.
500 * 109 bytes = 500,000,000,000 = 500 Gigabytes
But the operating system determines the size of the drive using the computer's binary powers of two definition of the "Giga" prefix:
465 * 230 bytes = 499,289,948,160 = 465 Gigabytes
If you're wondering where 35 Gigabytes of your 500 Gigabyte drive just disappeared to, you're not alone. It's an old trick perpetuated by hard drive makers-- they intentionally use the official SI definitions of the Giga prefix so they can inflate the the sizes of their hard drives, at least on paper. This was always an annoyance, but now it's much more difficult to ignore, as it results in large discrepancies with today's enormous hard drives. When is a Terabyte hard drive not a Terabyte? When it's 931 GB.
As Ned Batchelder notes, the hard drive manufacturers are technically conforming to the letter of the SI prefix definitions. It's us computer science types who are abusing the official prefix designations:
| Year Approved | Official Definition | Informal Meaning | Difference | Prefix Derived From | ||
| giga | GB | 1960 | 109 | 230 | 7% | Greek root for giant |
| tera | TB | 1960 | 1012 | 240 | 10% | Greek root for monster |
| peta | PB | 1975 | 1015 | 250 | 13% | Greek root for five, "penta" |
| exa | EB | 1975 | 1018 | 260 | 15% | Greek root for six, "hexa" |
| zetta | ZB | 1991 | 1021 | 270 | 18% | Latin root for seven, "septum", p dropped, first letter changed to S to avoid confusion with other SI symbols |
| yotta | YB | 1991 | 1024 | 280 | 21% | Greek root for eight, "octo", c dropped, y added to avoid having symbol of zero-like letter O |
As the size of the prefix grows, so does the gap between the official and informal meaning of the prefix. And yes, there are larger official SI prefixes beyond these, just in case someone needs more than 1000 yottabytes. Ned noted that one of the SI proposals is for the prefix "luma", representing 1063.
Speaking of impossibly large numbers, if you're like most people reading this article, then you probably arrived here through Google. Google is a tragically but forever misspelled version of Googol:
A googol is 10100, i.e. a 1 followed by 100 zeros. In official SI prefix terms, a googol is approximately a yotta squared, squared. Even larger is the googolplex, which is equal to 10 to the power of a googol (10googol); this number is about the same size as the number of possible games of chess. Even larger numbers have been defined, such as Skewes' number, Graham's number, and the Moser, which I won't even try to describe.
But I digress. When we use gigabyte to mean 230, that's an inaccurate and informal usage. Instead, we're supposed to be using the more accurate and disambiguated IEC prefixes. They were introduced in 1998 and formalized with IEEE 1541 in 2000.
| kibibyte | KiB | 210 |
| mebibyte | MiB | 220 |
| gibibyte | GiB | 230 |
| tebibyte | TiB | 240 |
| pebibyte | PiB | 250 |
| exbibyte | EiB | 260 |
| zebibyte | ZiB | 270 |
| yobibyte | YiB | 280 |
You occasionally see these more correct prefixes used in software, but adoption has been slow at best. There are several problems:
It's good to keep in mind the discrepancy between the decimal and binary meanings of the SI prefixes. The difference can bite you if you're not careful. But I think we're stuck with contextual, dual-use meanings of the SI prefixes for the forseeable future. Or perhaps we're all overthinking this, as Alan Green notes:
Whenever I try to discuss [this] with my friends, they say, "Yotta getta life".
"If you're wondering where 35 megabytes of your 500 Gigabyte drive just disappeared to"
You mean 35 gigabytes, presumably.
Pete on September 11, 2007 2:25 PMYep, I meant 35 Gibibytes! :)
Jeff Atwood on September 11, 2007 2:44 PMYou said it in the middle there: it's us 1024 types who are in the wrong here, and the sooner we give up this particular windmill the better we will all be.
When I'm buying a hard drive, I really don't care that the 1TB drive has 931 "GB" of space. I *do* care that one "1 TB" drive will have significantly more space than another "1 TB" drive, but I can usually find that particular info on the side of the box or the manufacturer's web site after a bit of digging.
I don't know how you go shopping for a new hard drive, but I look at what data I have today, compare that to what I had a year or so ago to come up with a year-growth factor, then look for hard drives which are at about two year's larger than my current needs. Then, I look at what's there, decide the wife would have a fit if I spent $1000 on a new drive, and compromise down.
*EVEN IF* that number that I started out with (how much space I'm using today) was in *bibyte values (as typically reported by the OS), the end decision wouldn't change at all, because anything within, say, 25% of the desired size is just lost in the noise.
Still, though, when I look at how much space I am using, I use the byte counts (OS X gives that in parenthesis right next to the "*bibyte" "friendly" count so it's not like this is much more work to come up with; seems like Windows offers a quick route to the actual byte counts too, right?).
All in all, it seems a moot point. So, in the end, I agree with Alan Green's friends.
Tom Dibble on September 11, 2007 2:48 PMI noticed this disparity first when I bought my first CD-R. Since then, I orient myself on the much more meaningful unit "minutes", usually in CD-Quality audio that can be packed on a CD-R (MPEG-2 compresses movies when it's DVD+/-R(W)).
I've noticed (or maybe this is my imagination), that DVRs use the "minutes of video", too, for their hard drive capacities.
I don't know about you, but I can grasp "30 hours of video" better than "500 GB".
Phillip Gawlowski on September 11, 2007 2:50 PMYou're so right - those binary prefixes sound soo ridiculous. Like teletubbies, if anyone remembers. ;)
Christoph on September 11, 2007 2:51 PMI don't understand the assertion that hard drive makers are pulling a fast one by using 'real' SI units in order to make the capacity seem higher. The only think anyone ever does with that number is to compare it to other hard drive capacities, so what's the problem?
Also, I'm hard-pressed to think of a scenario where you really need to keep in mind the discrepancy between the two definitions. When are you ever comparing hard drive GBs vs. something legitimately measured in GiBs?
Ha! I've never even heard of a kibibyte.
The annoying thing with the power of 2 vs power of 10 stuff is that it didn't used to be that way. The vendors switched at some point to juice their numbers - extremely weasily.
Kevin Dente on September 11, 2007 3:01 PM"It's us computer science types who are abusing the official prefix designations".
That's one interpretation. I often prefer to think that we're challenging SI's jurisdiction. Since metric units of measurement predate any common usage of non base 10 numbers by quite some time, my guess is that the decision that kilo was base 10 was pretty arbitrary anyways.
Why not have the unit's suffix determine the base, instead of the prefix? It's not just a bytes issue -- kibimeters is patently ridiculous too. Bytes should always be base 2, meters should always be base 10, and if not, it should be specified explicitly.
If the above diagram is correct (I'm sure it is), when "giga" was specified in 1960, there was already prior base 2 usage. In other words, they broke pre-existing business logic, so it's their fault!
;-)
Sean Reilly on September 11, 2007 3:17 PM> The annoying thing with the power of 2 vs power of 10 stuff is that
> it didn't used to be that way. The vendors switched at some point to
> juice their numbers - extremely weasily.
I'd like to see proof of this. I don't think there ever was a switch, I think it always was like this. The oldest hard drive add I can find advertises 5 million 7 bit characters. Floppy disk and CDs were in 1000 byte kilo/mega bytes. Except some floppies which were even weirder with 1024 * 1000 byte megabytes. Can anybody produce a hard drive advertisement with 1024 * 1024 megabytes as the unit?
FigBug on September 11, 2007 3:25 PMThe second calculation has "465" twice. The first instance should be "500".
Tim McCormack on September 11, 2007 3:27 PMThe problem has an easy solution:
The OS needs to start using SI prefixes correctly. Linux already does.
$ dd if=/dev/zero of=test bs=1MB count=10
10+0 records in
10+0 records out
10000000 bytes (10 MB) copied, 0.0261481 seconds, 382 MB/s
There's no reason to use powers of two for displaying file sizes. It's ridiculous and makes it more confusing for the user.
Sean on September 11, 2007 3:38 PMIt should be noted that network speed has ALWAYS been in base10.
Your ancient 10baseT ethernet card? That was 10 million bits/second.
a gigabit card is 1000 Mbits, not 1024.
"Why not have the unit's suffix determine the base, instead of the prefix? It's not just a bytes issue -- kibimeters is patently ridiculous too. Bytes should always be base 2, meters should always be base 10, and if not, it should be specified explicitly."
(first, it's not base-2; it's base-1024. What would "deka" and "hecto" map to in "base 2"? "8" and "128"?)
Huh? Why would I want a completely different series of multipliers with the same name as the universal (not just meters: EVERY SI unit of measure uses the base-10 system!) standard? Why propagate confusion? Instead of just learning "kilo- means 1000", the rule is "kilo means 1000 when the base is meter, liter, gram, watt, ampere, joule, [list continues for a page or so and needs to get updated with each new "thing" to measure]; it means "1024" when the base is byte"? That's ludicrous!
First: what reason is there for defining a kilobyte as 1024 bytes instead of 1000? Why not measure in 2^8 increments (256, then 65536, then 16777216, etc), since that reflects the number of 8-bit bytes used to store it? Does anyone use 10-bit bytes anymore? Is, in fact, the ONLY reason we use 1024 as the base for file sizes that it APPROXIMATES 1,000 as a power of 2???
Second: why on earth are OS's still so bass-ackwards and reporting these 1024-power numbers instead of a sensible standard that matches what the rest of the world uses?
I worked on the system information utility for Windows (msinfo32.exe). There were periodic bugs filed of the form "I have a 500GB drive and msinfo only reports it as 465GB!".
You can't please everyone. If it was changed to powers of ten it wouldn't match the rest of the operating system. And if you changed the whole operating system, it wouldn't match legacy systems. It's hard to turn that barge once it has momentum...
Jim Martin on September 11, 2007 3:51 PMJim: you've got to start somewhere. The programs that currently use "KB" to describe 2^10 are wrong. File a bug. Linux is currently in a transitional period. Nautilus has several bugs filed against it for reporting file sizes incorrectly... most GNU programs, like the one I demonstrated above, are correct.
I don't understand why geeks are so set on using powers of two divisions. Jeff's complaint is that he doesn't want to sound stupid by saying "kibibyte" it never even occurred to him to use powers of ten. WTF?
If you're going to use base-10 numbers, you should be using base-10 prefixes. As soon as you start talking about a 0x1f4 GiB harddrive you can start complaining.
Sean on September 11, 2007 4:09 PMHonestly, it never used to be a problem. SI prefixes have always been powers of two for binary quantities (which is only bytes) and powers of ten for decimal quantities.
It's well known by anyone that actually needs to know - I never get confused. Bytes are always powers of two. (Network communications speeds are number of *bits*, not *bytes* and take powers of ten prefixes.)
The worst was the "1.44MB" disks. These are actually 1044 kilobytes. (1044 * 2 ^ 10 bytes, in case people aren't keeping up.)
Hard Drive manufacturers definintely used to be more generous. I remember a 20 MB hard disk drive that had a bit more than (from memory) 21 million bytes capacity - back in those days, they actually made sure they met what it says on the box and you actually got more, no matter how you measure your megabyte.
(And kibibyte? It's stupid. Sounds like the unit of food eaten by an ISO standard cat in a cat food eating time unit. And a bunch of nerds trying to be nerdy.)
shannon on September 11, 2007 4:30 PMSpeaking of google and bytes, google still thinks 1 kilobyte (1 KB, to them) is equal to 1024 (2^10) bytes: http://www.google.com/search?hl=en&q=1+KB+in+bytes
Jacob on September 11, 2007 4:35 PMGoogle's wrong.
http://www.google.com/search?hl=en&q=1+megabit+in+bytes&btnG=Search
That's not correct by anyone's definition.
Sean on September 11, 2007 4:38 PMI can picture future cyber punks and general underground hoodlums now...
"Hey homey, you bustin' some yo yo yo worth of yobibyte warez for me?" 8^D
Sean Patterson on September 11, 2007 4:39 PMPersonally I think zetta and yotta are pretty lame prefixes, so I'm glad I won't have to deal with saying them in my lifetime. Maybe I'm just more familiar with the smaller prefixes, even the relatively exotic peta and exa. When was the last time you actually *needed* to express something in terms of petabytes?
My children can deal with the yobibi and zebibi controversy.
Jeff Atwood on September 11, 2007 4:52 PMShannon: "The worst was the "1.44MB" disks. These are actually 1044 kilobytes. (1044 * 2 ^ 10 bytes, in case people aren't keeping up.)"
I think you mean 1440 kilobytes (1440 * 1024 = 1,474,560 bytes).
Sebastian on September 11, 2007 4:53 PMSean: Yes it is correct, it's just one eighth of "1 megabyte in bytes".
Peter K. on September 11, 2007 5:03 PMPeter: megabit is 1 million bits. It has never meant 1,048,576 bits.
I still don't get why you are so stuck on measuring file size in powers of two.
Is my 3ghz machine running at 3.2 billion hertz? No. It's running at exactly 3.0 billion hertz.
When you start seeing a file that's 1,500,000 bytes as 1.5 megabytes and not 1.43 megabytes, your brain will feel much better.
Sean on September 11, 2007 5:14 PMJeff, You're losing it.
The 1024 vs 1000 issue is so irrelevant. Every hard drive manufacturer uses the 1000 measurement so when you're deciding which drive to buy you can safely compare and know that you aren't getting one product with a smaller capacity than the other.
So what if you don't get a nice round free space number when you install the drive in your PC. The only time the issue might be a problem is if you have exactly 500GB (1024) of data and you try to buy a hard drive to hold it.
This article was just padding.
Jason Stangroome on September 11, 2007 5:34 PMThe problem is not when you compare 2 hard drives.
We have a problem when we have a 500GB (1000 based) and you need a real 500GB (1024 based).
Workers in computer (Admins, programmers...) can deal with that.
But ordinary people are very confused about that.
Memory is one area where the power of 2 meaning makes complete sense. Since it's all manufactured as 2^n, it would just confuse people to call your 1GB RAM "1.073 GB of memory".
coderprof on September 11, 2007 5:58 PMYobibi & Zebibi were the names of Sudam's sons right? =P
`Josh on September 11, 2007 6:01 PMBah!
If you ask me, the standards organisations should be making sure that our conventions are well documented and standardised. Not running around self-importantly defining and redefining things we've used for years. There *is* a reason we use binary measurements.
Ideally they could have gone out and defined "when is kilo binary and when is it decimal?" - after all - that is often confusing. Heck they could have simply defined the standard to say "kilobyte often means this, but sometimes that".
Then they could have run around adding well-defined "kibi" stuff to their hearts content.
Instead they made the confusion *worse* by actively trying to delete the common definition. A bunch of nerds with no understanding of the social outcome of their actions, if you ask me.
Andrew Russell on September 11, 2007 6:08 PMIsn't the important thing that, when we start saying "tebibyte", that our theme song is already mostly written? http://youtube.com/watch?v=19MNzKL5Swk
Geno Z Heinlein on September 11, 2007 6:37 PM"There *is* a reason we use binary measurements."
Andrew: care to fill us in?
Please, just list one benefit for having a file that's 2,500 bytes be represented as 2.44 KiB instead of 2.5 KB.
Sean on September 11, 2007 7:07 PMMost annoying about the SI prefix are the graph legends....
when you see a graph with a little 'm' next to the y axis, is that milli or mega?
The general rule of lowercase being less than one helps, except that 'k' is kilo, meaning 1000 (or preferably 1024 depending on the actual metric). Shame there is always an exception!
Let's not kid ourself: the only reason we all already KNOW that difference is because practically ALL of us got cheated once and then asked someone and learned why we were some MB's short. I'm pretty sure that in a lawsuit it could be reasonably argued that no normal first time customer can be expected to know the difference, and hence it being an intentional misleading or something of that sort.
J. Stoever on September 11, 2007 7:46 PMI work for a networked storage solutions company that does both hardware and software, and I will tell you that the IEEE notation is never used. In fact, I wrote a .NET class to manage storage capacities into the 'yotta' range, and during the work came across much of the research you did. End result: most people don't care since we are talking about 3 orders of magnitude between prefix, even though the discrepancy grows larger.
Kit on September 11, 2007 8:06 PMSean ("care to fill us in?"):
Sure. Pretty much anything that involves working with memory involves measuring in binary units. Because that's how memory gets addressed. A 32-bit computer is so named because it uses 32 binary bits to address... wait for it... 4GB (sigh: GiB) of memory!
As a programmer I have no interest running around talking about 4.294967296 "GB" of memory. Or convoluting my tounge with silly made-up words.
Andrew Russell on September 11, 2007 8:24 PM@sean
I am a different Andrew, but I will answer your question. Most things are stored, accessed, or otherwise dealt with in powers of two. There is a technical reason why you use powers of two, mostly dealing with the fact that you have a binary state (high voltage vs. low voltage) which is used to compute everything. Memory, for instance, is a series of binary elements indexed by an address defined by a series of binary elements. Wikipedia can give you more information (look up a muxer, for instance) but basically everything is stored in a medium (almost everything, anyway) that in the end is a power of two. As was previously mentioned, 1 gibibyte of RAM is an exact number, because that is how it is built (1024 mebibytes). So it makes sense to advertise it as such, rather than 1.024 megabytes. Early geeks just thought it was kind of cool how you could round down the numbers and not lose much, which is what got us to where we are today.
Nowadays, however, the difference between 1000 and 1024 (ambiguous) gigabytes is five movies, or your music library, or your photography collection. There is a practical aspect to the binary notation, it is just unfortunate that there is so much momentum to stick with it.
Hope that helps clear up some confusion about the topic.
Andrew on September 11, 2007 8:27 PMMany fields of study invent their own terminology for their use by co-opting words from general use. Using these terms differently does not make them "wrong", it just makes them technical jargon specific to the field.
Within the field of computer science, one kilobyte = 1024 bytes. This isn't wrong, in fact, the other view (1 KB = 1000 bytes) IS wrong. It's wrong on several levels.
First, it attempts to use the wrong meaning of an overloaded word, rather than the one that is correct in context. You don't complain when a physicist talks about the color or spin of a fundamental particle, even if you know the particle has no "color" nor is it "spinning". The physicist isn't wrong, he's just using terms correctly in a physics context, where their meaning differs from other contexts. Or is the assertion here that physicists can co-opt words for their own meaning, but computer scientists are "wrong" for doing the same?
Second, it mistakes "byte" for an SI unit. According to SI, a kilometer is 1000 meters. This is very true. However, it's absolutely false that, according to SI, a kilobyte is 1000 bytes. SI has no more to say on how many bytes are in a kilobyte than it has to say on how many feet are in a mile, since neither miles nor bytes are SI units. Incidentally, kilobyte has traditionally been abbreviated KB, not the capital K. The SI "kilo" prefix is abbreviated with a small "k", but since the "kilo" in kilobyte is NOT the SI prefix, this is irrelevant.
andrews: you both explained why memory is sized in powers of two (actually it's multiples of bus width, but that's a power of two).
Neither explained why it's better for explorer and other apps to describe filesize and diskspace in powers of two. Why is the average user exposed to this?
Sean on September 11, 2007 9:30 PM"Google's wrong.
<a href="http://www.google.com/search?hl=en&q=1+megabit+in+bytes&btnG=Search">http://www.google.com/search?hl=en&q=1+megabit+in+bytes&btnG=Search</a>
That's not correct by anyone's definition"
Actually what google says: 1 megabit = 131 072 bytes
is in fact correct.... do the math: 1024*1024/8 = 131072
> I don't know about you, but I can grasp "30 hours of video" better than "500 GB".
Yeah, but then you get to the next Monty Python question - are those 30 hours MPEG-2, DivX, DVD-quality, HD-quality at 720, 1080, or...? Same with the "songs" metric for iPods - bitrate isn't taken into account, just the default length & size of a pop song.
As long as we can get 'm to switch after tera, then I'll be happy enough.
Rob Janssen on September 11, 2007 11:49 PMalthough "Yotta getta life" dude
Baha on September 12, 2007 12:05 AMSean: They're not really exposed to it - simply because "that drive has 120 and that drive 160". 160 of what? they don't really know - except one can hold more illegally downloaded movies.
Frankly, users should be seeing size expressed as either "30 hours of video" so they can understand it, or the commonly accepted industry standard (which, sadly, is decimal for HDDs and binary for everything else) so products can be fairly compared (because you just *know* that one vendor's hour of video will be at a different bitrate to another's).
It's actually unfortunate that end-users could end up having to also deal with this messy "rename", along with software developers.
And if I may trigger a linux-Windows flamewar for a moment: This is why Linux won't be a consumer operating system - because it has the attitude that this whole messy rename thing should be exposed to end-users. The same applies to Wikipedia. I'd say it's a form of elitism.
Andrew Russell on September 12, 2007 12:08 AMYes hard drive manufacturers use SI prefixes correctly, as you observed. So do CPU manufacturers, as Sean observed. Did you complain about the GHz of your recent CPU purchase not being binary GHz?
When you hire someone for $100K, you pay them $102,400, right?
When you measure response time in milliseconds, that's how many times 1/1024 of a second, right? You buy a 47K resistor, that means 47 x 1024 ohms? No ... because you said you don't use metric. In your recent power supply purchase, you measured it in BTUs because you don't use metric. 1K BTU is 1024 BTUs, right?
Don't think about petabytes. You don't want to estimate how much spam there is each day.
Catherine Zetta-Yones on September 12, 2007 12:23 AMgi-bE-BYE-TUH
bin-ary
or
gi-bye-bye-tuh?
as in bye-nary
apeinago on September 12, 2007 1:20 AMWhat I don't see explained or knew was: How can 1TB drives have different capacities? I thought they allways sported 10^12 raw bytes
You know what is even more cheatish than the fact above? Some laptops and computers have a hidden "rescue partition" with a copy of the OS installed in it for easy recovery. I still knew a friend of mine got so angry at a computer store that sold him a 60GB laptop that had only 25 MB of free space after a clean installation - appearantly Windows XP + all the "apps" took 15 GB *and* there was a 20 GB recovery partition. After a lot of swearing he got a 80 GB laptop of the same brand though (which still had 50 GB free space - here the rescue partition of an otherwise same install was only 15 GB)
Tijmen Stam/IIVQ on September 12, 2007 1:36 AMThe 1.44 3 1/2" floppy shows the problem best
Unformatted capacity 2MB (as per hard drive so 2,000,000 bytes)
Formatted capacity = 1,474,560 Bytes
or as 1024x1024 = 1.41 MiB
or as 1000x1000 = 1.47 MB
but 1.44M as quoted is the size in units of 1000x1024!
The hardware people love this because it make their drives look bigger
The Software people (especially OS people) love this because it gives them an excuse as to why there is a large difference between the advertised size and the usable capacity, see above the advertised size is 2MB but the normal usable capacity is however you show it less than 1.5MB (There was Microsofts DMF formatted floppies which were 1,763,328 bytes (1.68 MiB) but still way off the total capacity)
I'm just waiting for the hard drive manufacturers to define a byte as 10 bits.
That's essentially what this whole argument comes down to. Bytes are arrangements of bits, and kilobytes are arrangements of bytes. Megabytes arrangements of kilobytes, bytes, and bits; and so on. It doesn't make any sense to arrange bytes in decimal values because they are themselves a binary representation, 2^3 bits.
We could, as programmers, abstract it all to decimal for the end user. The question is whether or not we should, just to legitimize the choice of the hard drive manufacturers (one of the very small number of groups that actually distort these values). The end users don't care a bit until they buy a hard drive that turns out to be smaller than they thought it would be.
Another area in which this is often done is with network speeds, especially in the world of dial-up modems. Unfortunately for most people, these devices rarely deliver even the full speed promised in decimal (because of the network they're on, usually), so it's rarely an issue that people face with their newly purchased modem or NIC.
Vizeroth on September 12, 2007 1:53 AMThe worst is Broadband ....
That Lovely 10Meg Broadband connection how fast is it
10MB per second ...?
10MiB per second ...?
No 10,000,000 *Bits* per second .....
Jaster on September 12, 2007 2:04 AMMagnetic disks tend to store files in multiples of 512 bytes (sectors). Optical discs tend to use 2048 byte data sectors. Ignoring powers of 2, the capacity is always a multiple of 2048 or at least 512. Any disk that's got a capacity below 1 MB is probably using 2^10 even if it says KB.
It looks like that's about where they switch from 2^10 to 10^3: http://www.buildorbuy.org/floppydisk.html
josh on September 12, 2007 2:08 AMAlthough I understood the difference between SI and IEC units,
this still bit me when I was creating disk images for installing
to flash (compact flash, usb keys etc.).
Even though flash is based on powers of 2 logic internally,
manafacturers use _different amounts_ of space for wareleveling etc.
So if you use the max space available for one flash device, the
image may be too big for another flash device.
The only safe thing to do is to only use the SI space.
I.E. if you have a 64MB flash, only use 64000000 bytes of it.
I find the unix units command handy for these conversions:
units -t '500GB' 'GiB'
http://www.pixelbeat.org/cmdline.html#math
jaster, The broadband situation is even worse than that.
Here's a handy calculator so you can switch between MB/s Mib/s ...:
http://www.pixelbeat.org/speeds.html
Given all the confusion here in the comments section of this blog, I think making a distinction between kilo (=1000) and kibi (= 1024) is very useful.
In our compnay, wehave been doing this for several years and it saved us from a few embarrassing errors in our software.
For those that do not like the word 'kibi': now that is what I call a relevant argument. Get over it.
Karel Thönissen on September 12, 2007 3:02 AM@Verizoth
If they could, hard-drive manufacturers would be much more likely to redefine a byte as 5 bits. This gives a bigger number on the front of the box <cynicism>
Cog on September 12, 2007 3:36 AM@Jaster
Wrong answer!
10 megabits/s = 10 485 760 bits/s
It seems totally absurd that to redefine well established units of measurement in order to create metric units. This is like saying that from now on, a mile will be 1000 yards: get over it.
A more appropriate course of action would have been to define new metric units of measurement.
kidebytes?
fgb on September 12, 2007 4:16 AMI remember when drives were defined using power-of-2 designations. The manufacturers definitely changed at some point. Problem is, if one changes, they all have to change, or their drives look smaller in comparison.
The problem, for those who ask, is that computer memory is always defined with the power-of-2 system. So, there's a mismatch. Back when disk drives were close to the size of RAM in your machine it mattered more, I guess.
Thanks for the info, Jeff. Hadn't seen the SI power-of-2 system before. Think I'll start using it just to confuse everybody I know.
A. Lloyd Flanagan on September 12, 2007 5:03 AMHey Now Jeff,
I'm so glad I read this post, I always wondered why there was space missing. Now I know the reason why the drives show up as less space. I discovered your great blog though a shrinkster link on a DNR show not googol.
Thx,
Catto
@Tema
No, he is correct. Network speeds have always been in bits per second, and have never used powers of two. Your old 28.8k modem was 28800 bits per second if it could negotiate that rate over a potentially noisy phone line. And those bits were the signaling speed of the line, of which there was usually framing and error protection and detection overhead. Even with RS-232 signaling for example at 9600 you have an overhead for a start bit and a stop bit (assuming 8 bits per word, no parity, 1 stop bit).
Personally, I think software should start to standardize on using the SI meanings to display sizes of things (file sizes, drive sizes, download speeds, et cetera). Those things are almost never an even power of two. The only place it doesn't make sense to use that notation is total size of memory that is inherently a power of two because of how the hardware is made (e.g., CPU cache, system memory). That's the only place I can think of you'd need some kind of qualifier on the spec sheet or on the package.
Brendan Dowling on September 12, 2007 5:18 AMI can handle the drive-capacity issue, but I wonder why we don't use the SI prefixes for RAM?
Daniel 'Dang' Griffith on September 12, 2007 5:29 AMSean: why list file sizes in KiB, rather than KB? Because even though hard disk manufacturers have squatted on the traditional descriptions of size like dogs in mangers, hard disks are still naturally sized in 512-byte units. So all file sizes are rounded up to multiples of at least 2^9 - if not more, when one considers clusters of blocks.
I think that's what galls me the most - hard drive manufacturers aren't even using the best units for their products in their keenness to pull a fast one on their customers.
Meanwhile, Sean, have you noticed that you're the only person mounting a strident defence of the new way of doing things - almost to the point of telling anyone over 25 that they're brain-damaged...? Methinks thou dost protest too much.
As for the question of Mbits/s, once upon a time there was a unit that naturally encapsulated the "bits per second" measurement; it was called Baud. I remember 300 baud modems; somewhere around the 14.4 era, Kbaud (which as has correctly been stated, was always a decimal measure, having long predated the era of binary computers) suddenly became Kbps. If one were to refer to "gigabit Ethernet" gigabaud instead, the confusion goes away. (No, it's a complex unit - so is the volt, but nobody talks about joules per coulomb.)
So I suggest that the status quo is just fine:
A megabyte is 2^20 bytes, the natural measurement for memory.
A megabyte per second is the natural measurement for data transfer across parallel buses.
A megabaud is 10^6 bits per second, the natural measurement for data transfer across serial lines.
There. What's the problem? What's ambiguous about it? Why should we change the way things have been done for decades just because hard drive manufacturers are greedy? (They've always been greedy. Remember "unformatted capacity", anyone?)
gwenhwyfaer on September 12, 2007 5:55 AMRob wrote:
>Yeah, but then you get to the next Monty Python question - are those 30
>hours MPEG-2, DivX, DVD-quality, HD-quality at 720, 1080, or...? Same
>with the "songs" metric for iPods - bitrate isn't taken into account,
>just the default length & size of a pop song.
Well, usually, it is MPEG-II in movies (since it is the most common format for consumers), and 128kbit/s VBR MP3 (most common, again) for music.
But yes, it still needs qualifiers. Still, it is more "life-like" in a "What can I do with so much space?" kind of way, than the dry, scientific, and very-much-geek-friendly state of bytes.
In a way, it is a usability thing: It puts something in relation to something else familiar. Which is, of course, error prone. No silver bullet, either.
Phillip Gawlowski on September 12, 2007 6:04 AMWhy do I get the feeling you wrote this entire post to deliver the pun at the end?
Yakka on September 12, 2007 6:19 AMFrom my understanding IEC prefixes were developed by Mushmouth formerly of Fat Albert and the Cosby kids during his brief employment at the Institute of Electrical and Electronics Engineers.
Mike on September 12, 2007 6:24 AMBaud is not bit per second, it is symbol per second. That is something completely different if symbols consist of more than one bit.
The confusion is even larger than I thought it was. Our job as developers is to be precise in our specifications, designs and code, so we have to change something. To bad so many here are not willing to see that there is a problem, or expect that the *rest* of the world should change.
Karel Thönissen on September 12, 2007 6:45 AMShannon wrote: "Honestly, it never used to be a problem. SI prefixes have always been powers of two for binary quantities (which is only bytes) and powers of ten for decimal quantities."
I don't think so. If I'm not mistaken, kilobytes/second has always meant 1000 bytes/second, when referring to modem transfer rates.
This issue is that the use of the kilo/mega/giga prefixes is *ambiguous*, even if followed by the word "byte". For anyone who thinks this whole discussion is stupid, then you won't mind if I borrow $1024 from you (a kilobuck), and eventually pay you back $1000 (a kilobuck)?
I work in software development with some very smart people. If I had a file 5,323,123 bytes in size, and someone asked me "How big is that file?", I would be forced to answer "about 5.3 *megabytes*". You know why? Because, in an informal discussion, they would understand "5.3 megabytes" to mean "about 5,300,000 bytes". If I had bothered to do the math and answered (more correctly) "about 5.08 *megabytes*", pretty much no one would think I meant "about 5.08 * 1024 * 1024 bytes". Binary calculations may be easy for computers - not so for human beings. We all know that the proper convention for kilo/mega/gigabyte is binary, but in practice, nobody is shy about using those prefixes in the decimal sense. And nobody says "5.3 million bytes", although you'd think that'd be a reasonably unambiguous alternative.
For those who still say it's not ambiguous - to the average customer it is, for all practical purposes, if they have to remember/understand that:
- Windows/Linux will report memory/file size in binary units
- Hard drive sizes are specified in decimal units
= DVD sizes are specified in decimal units (a 4.7 GB DVD packed full of data will show up as ~4.38 GB in your favourite OS)
And just try explaining to the average person why this is so. I've seen plenty of people asking the following question on various PC/gaming/tech forums:
"I just bought a 250 GB hard drive. How come it only shows up as 232 GB (or whatever) in Windows?"
Among the misleading answers I've seen:
"Windows 'lies' to you about the disk space"
"The hard drive manufacturer 'lies' to you about the drive size"
"250 GB is the 'unformatted' capacity. After you format the drive, you only have 232 GB left" (*)
Just because the industry has been doing the wrong thing for decades, doesn't mean it's correct or user-friendly to continue doing so.
(*) You may be laughing about how bogus this explanation sounds, but I've heard it from professional IT managers. If someone in the industry cannot be bothered to know/understand that GB has 2 meanings, good luck explaining that to the average joe on the street. This is by no means a criticism of them - it is actually an indictment of the tech industry. It's no wonder that "techies" have a rep for poor communication skills, since we feel the need to redefine well-known prefixes with ambiguous meanings.
Will on September 12, 2007 6:57 AMThey should label the HDDs like monitors. 19"(17.4" Viewable), 500GB (465GB Usable).
Mattkins on September 12, 2007 7:10 AMI'm not sure a field that has wholeheartedly embraced silly made-up unit names (byte, nybble), in-jokey recursive acronyms (GNU, LAME), stupid, obfuscatory acronyms (PCMCIA), and the brain-melting stupidity of deliberately meaningless non-acronym acronyms (NT, XP, .NET) has any standing to complain that kibibytes and mebibytes "sound ridiculous." We're already soaking in ridiculous. Pissing on "mebi" et al is like pissing into an ocean of piss.
While it may be natural to use ordinary decimal measurements for storage media like optical and hard disks, solid-state storage like RAM and flash are likely to come in powers-of-two-sized chunks for the foreseeable future. That particular opportunity for confusion isn't likely to be cleared up entirely by some simple policy change.
Western Infidels on September 12, 2007 7:16 AMHow is NT a "non-acronym acronym" ? Looks like a regular acronym to me. It even makes sense, which is something I can't say about every acronym.
J. Stoever on September 12, 2007 8:42 AMAnyone with a programmig history going back at least to the 16-bit machines is likely to
think something like:
1 byte = 8 bits
1 KB = 1024 bytes
1 MB = 1024 KB
1 GB = 1024 MB
1 TB = 1024 GB
If GB is 10^9 bytes for hard drives, then consistency dictates the same for memory. So your memory would actually come in chunks of 1.073741824GB.
"You need to upgrade this machine to at least 2.147483648GB of memory"
For programmers it makes sense with a notation that uses the 1024 bytes = KB shorthand if you do anything related to memory.
It's just a sad day when it got hijacked by the SI standards crowd.
It's not like there are things like centibytes or anything. Or maybe we should create a "decibyte" which then would be a practical 1/10th of a byte, or in other words 0.8 bits.
Yeah. Now we're talking.
Nibbles, here we come.
Andrew Russell> This is why Linux won't be a consumer operating
Andrew Russell> system - because it has the attitude that this
Andrew Russell> whole messy rename thing should be exposed to
Andrew Russell> end-users.... I'd say it's a form of elitism.
Why are you saying this like it's a bad thing? We have a consumer operating system, from Microsoft, and it's a poor product for a premium price. And exposing end-users to knowledge? Oh, no, where will this horror end!
We should expect people to raise their standards, not lower ours. Elitism is good. When did striving to be better informed and more capable become some kind of insult?
Geno Z Heinlein on September 12, 2007 9:16 AMJeff-
Back in the day of FAT16 (Windows 95), "large" (4 gig) hardrives suffered from inefficiency on cluster size. For example, I believe under FAT16 the smallest size was 32K, so if you had a 1K file, it would wasted 31K. FAT32 improved on this, but I still believe space was lost but not as much as smaller (<32K) clusters could be defined.
So, 1 Terabyte drive is for "marketing" purposes by the hard drive maunfacturer. You'll never physically stoe 1 Terabyte.
I'm sure someone can explain the gory details on this better than I can.
Jon Raynor
J. Stoever: Microsoft originally said NT meant "New Technology." As opposed to that Old Technology they usually push, I guess. Later Microsoft divorced itself from that acronym, insisting that NT didn't stand for anything and adopting it as a simple product-line name. The Windows 2000 boot screen even says "built with NT technology," which clearly makes no sense if NT is an acronym.
Western Infidels on September 12, 2007 9:32 AMIs there any other example of an ISO standard that redefines accepted ussage and makes up something totally new to replace it? That strikes me as exactly what standards bodies are *not* supposed to do.
Chris on September 12, 2007 9:32 AMJeff, I think we ought to just forget about making the power-of-two-types use the silly IEC names. It's just not going to happen.
Alternatively, we could go on writing "megabyte", but use the abbreviation "MiB". I wouldn't mind using the abbreviations, but I'm just not going to say "meh-buh-bite". It's just silly.
aggieben on September 12, 2007 9:55 AMGigabytes weren't an SI unit until the IEEE decided to make them one. It's rather insulting, actually - reminds me of the gritty cop shows where the FBI would step into a police investigation and say "You guys go home, let the experts handle this". Except that the FBI actually has that authority, whereas the IEEE just wishes it does.
The meaning of SI prefixes only applies to SI (metric) units, which bytes aren't. A byte is already 8 bits, and it isn't divisible into centibytes or millibytes, so it doesn't even make sense as a metric unit. The composite units were so named because they approximated metric units, not because they were equivalent. That's not "wrong"; it happens in every industry, it's just that the nerds in other industries don't kick and scream anal-retentively about it.
For those claiming that the inconsistency was always there because the network industry used kbps, think again. One of the reasons they used kiloBITS per second was to disambiguate it from storage units. The term was adapted from baud - slightly different meaning, but essentially equivalent by the time of 14400 baud modems, when baud was becoming an awkward measurement anyway. There was a legitimate need to compare bandwidth with storage (the Internet), but it also did not make sense to use powers of 2 because bandwidth was actually provided in powers of 10 (bits). There was no foul play here, just pragmatism.
Memory capacity, on the other hand, is always 2^n bytes. Hard drives are generally multiples of 512, too; when 500 gigabytes is used to mean 500 * 10^9 bytes, it is actually an approximation. The real number might be something like 499,289,948,160 bytes, though it could be more or less depending on the geometry. 500 GB is never quite accurate using ANY convention.
I think it's obvious that the units for memory and disk should be the same, since data is constantly being swapped from one to the other. So let's put the question about why the rules for memory should apply to hard drives to rest.
Of course I know what the proposed solution is. Just have everyone switch to the dorky "bi" prefixes! That's nice, except that every part of the industry EXCEPT for the hard drive manufacturers has been using the same convention for 50 years. You don't just stomp your foot, shake your fist and tell us to mend our evil non-standard ways. Standards should reflect conventions that are already widely used, not fight them. Frankly, I'd rather deal with the hard drive capacity gap than deal with the silly new SI units invented by academic suits with hardly any practical experience.
Aaron G on September 12, 2007 10:13 AMWho can name the bigger number ?
Just a follow-up with Jeff's disgression about really big numbers : some of you might have heard about a family of numbers, called "Busy Beavers" (they were introduced by Rado in 1962).
I think these numbers still hold the record for the biggest number series ever imagined, and the way they are defined - which has a lot to do with computer theory - is fascinating :
Imagine the simplest <a href=http://en.wikipedia.org/wiki/Turing_machine#Informal_description>simple turing machine</a>, which would read its instructions on a tape.
Then, for a given N, feed this machine with all possible programs which can be coded with N instructions on the tape. Out of these programs, some will never end, and some will halt at some point. Out of those which halt at some point, let's consider the one which halts after the longest number of steps (which you could see as processor cycles) : this particular program is called the "Busy Beaver", and we can then define BusyBeaverB(N) as the numbers of steps it takes before it halts.
So, how big is BusyBeaver(N) (or BB(N) )?
As a matter of fact, it is big, very big : the values are known of BB for N=1...4, then the other values (for N=5 and higher) are still unknown and may well be out of reach of any human brain and computer processors. It was proved that BB(5) is higher than 8,690,333,381,690,951 but it might well be much greater.
Now, suppose that one particularily guilted programmer writes a complex mathematical library that can handle any large numbers and advanced arithmetical operators (such as the <a href=http://en.wikipedia.org/wiki/Knuth%27s_up-arrow_notation>Knuth's up-arrow notation<a>). Suppose then that this programmer uses this library to the best possible program that would output the largest number one ever thought of. To be fair, let's also accept that the execution of this program might require billions of years or more.
Turing proved that his result will be far (by galactical dimensions) below the Busy Beaver values : as a matter of fact BB(N) beats any other computable number!
For those who are interested, here is a recommended link for more information about those numbers : <a href=http://www.scottaaronson.com/writings/bignumbers.html>http://www.scottaaronson.com/writings/bignumbers.html</a>
And an extract which explains this in details :
"
Turing proved that this problem, called the Halting Problem, is unsolvable by Turing machines. The proof is a beautiful example of self-reference. It formalizes an old argument about why you can never have perfect introspection: because if you could, then you could determine what you were going to do ten seconds from now, and then do something else. Turing imagined that there was a special machine that could solve the Halting Problem. Then he showed how we could have this machine analyze itself, in such a way that it has to halt if it runs forever, and run forever if it halts. Like a hound that finally catches its tail and devours itself, the mythical machine vanishes in a fury of contradiction. (That’s the sort of thing you don’t say in a research paper.)
"
Whenever one inured to the inconsistent KB/MB/GB definitions used in some computing contexts first hears the kibibyte, mebibyte, gibibyte KiB/MiB/GiB construction, they think it silly. I did, too.
But after a few years of being bitten by related problems, and having to explain/argue the exceptions, and familiarity with the new words/abbreviations, it looks better.
The use of powers-of-2 internally by computers is an implementation detail that only insiders need to optimize for, in their minds and communications. For everyone else, base-10 works better. There's no reason for average users to understand or even see KiB, MiB, GiB names/numbers, in disk sizes, file sizes, bandwidths, clock speeds, etc. Everything can and should be in base-10, shift-units-at-a-glance SI. And the proportion of average users to insiders keeps growing. SI will win.
For Jeff's question about ever needing to use 'petabytes', Many workplaces are now dealing with petabytes of data. We have a few petabytes of spinning disks at the Internet Archive; I know commercial and big-science entities have far more.
And, regarding being "glad I won't have to deal with saying" zetta and yotta, why so pessimistic about the progress of technology and/or your own lifespan?
Permalink for that Ned Batchelder post: http://www.nedbatchelder.com/blog/200709.html#e20070909T081225
Kartik Agaram on September 12, 2007 11:38 AMKarel: "Baud is not bit per second, it is symbol per second."
Indeed. That's why I wrote "naturally encapsulates" rather than "is" and took care to specify a serial line.
Sebastian: Err, yes. You're right - 1440, not 1044 KB. Sorry about that!
Sean: There are very good reasons why RAM is going to be in powers of two - it would be quite a lot of effort to allow for 1000 megabytes of RAM on on DIMM and 1000 on another, compared to 1024 on each. (RAM is addressed by a computer on an address bus. Each line of that address bus is a bit of the address; allocating addresses to RAM dimms thus naturally falls on the boundry of an address bus line. Which translates to a power-of-two in the address space. That's why 1 GB of RAM is always going to be 2^30; because 2*10^9 is not going to divide easily on an address bus. Doing divisions by 1000 is going to add an extra cycle or two to every RAM access plus some extra chips!) Hard drives aren't addressed this way, so that's why they can be sizes that aren't otherwise 'nice' for computers.
Will: Modems always were wierd; mainly becaue they (usually) used 10 bit bytes. Yeah, I was aware of some confusion with communications people, but I got the impression they hardly delt with bytes anyway.
(Meanwhile, we're skipping the octet vs byte debate? 8 bits wasn't always standard, you know! :-)
shannon on September 12, 2007 1:10 PMI don't think it's a "trick" that storage manufacturers use ... it's simply the well-established tradition. And besides, it is correct. Hard to fault them for being right.
I've been writing low-level software that deals with storage (drivers, filesystems, CD/DVD burning, etc) for about ten years now. In order to stay sane, I have become pedantic about it and always specify binary vs decimal KB/KiB, MB/MiB, GB/GiB, etc. In verbal communication I don't bother so much, but you really have to be explicit in code and other written communications.
The only problem I have with the whole thing is that it's not an easy 1:1 mapping. The new prefixes have the virtue of being unambiguous: when someone writes "2 GiB" it's perfectly clear what they mean. But the old prefixes haven't been fixed. When someone writes "2 GB" you have to consider the context and decide whether they mean 2.00 decimal GB or 2.00 GiB = ~2.15 decimal GB. Ugh.
Drew Thaler on September 12, 2007 2:07 PMWhen I was 10, I remember my dad trying to explain to me the relative capacity of 20 MEGABYTE hard drive he got in a new computer. I asked him if it was possible to ever fill up that much space on a hard drive. He said that, practically, it was not possible.
Then when I went to high school, a friend of mine had a father with an Audio/Video production facility, and he told me he had an external 1 GB drive (he pronounced it "Jigga-byte"). I nearly fainted at the sheer magnitude of drive space.
> 19"(17.4" Viewable), 500GB (465GB Usable).
Ah yes, excellent example.
The discrepancy on monitor size only existed for CRTs, because the CRT tube itself was always partially obscured by the bezel of the monitor, and not all the tube could be used for display phosphors anyway. So a 19" CRT tube ends up with 17.4" of viewable space after you factored this stuff in.
Now that we've all pretty much switched to LCDs, this is a moot point. LCD monitors don't use tubes; every inch of the flat panel (well, probably 99% of it) is filled with RGB elements visible from edge to edge. Thus, a 19" LCD is by definition, a 19" viewable LCD. :)
Jeff Atwood on September 12, 2007 4:10 PMBesides, a company that sold good CRT monitors rarely made it even remotely hard (they usually printed it on the box, though generally in smaller print) to find out what the viewable area is on the monitor.
Half the time you have a pretty hard time finding out what the actual formatted size of the hard drive is before you put it in your computer and format it (even if you're using a very common file system).
In my personal use, I rarely run into issues with this sort of thing, except when I have to explain to someone why their new 500GB hard drive can't actually hold 500GB (usually just explaining the powers of 10 vs 2 is enough for them without going into gritty details, since Windows still displays drive space in powers of 2 (ie my primary partition is listed as 89,589,747,712 bytes - 83.4 GB in the drive properties dialog)).
Vizeroth on September 12, 2007 5:14 PMGreat, now we have to deal with imperial metric measurements. d'oh!
In the longer term having a measurement which is 1024 instead of 1000 will seem as silly as having 12 inches in a foot or 3 feet in a yard.
The simplest solution that I can see is that everyone switches to K=1000. Your average Joe wouldn't notice the difference.
-Andrew
Andrew on September 12, 2007 5:49 PMMemory sizes weren't originally binary, they were originally decimal, just like hard disk sizes. 8,000 digits of RAM meant 8,000 decimal digits, and if that's what you had, then 8K wasn't even an approximation, it was exact. 20,000,000 digits of disk storage meant 20,000,000 decimal digits, and if that's what you had, then 20M wasn't even an approximation, it was exact.
Of course you can fit more information into memory by storing values in pure binary instead of in BCD. If you have 4 bits plus parity bit plus some other stuff, and only store digits 0 to 9 plus some other stuff, you're wasting resources. Also computations in BCD are far slower than in pure binary. But it was easier for customers and programmers to use decimal, so computer manufacturers delivered BCD.
Then computer manufacturers decided that customers and programmers could understand binary well enough, so in order to maximize storage capacity and speed with the same amount of resources, they started delivering pure binary instead of BCD.
Well, it looks like they were wrong. Customers don't understand binary well enough. Even Jeff Atwood gets confused. Computer manufacturers should have stuck with BCD.
Catherine Zetta-Yones on September 12, 2007 6:02 PM'Aaron G' writes: "A byte is already 8 bits, and it isn't divisible into centibytes or millibytes, so it doesn't even make sense as a metric unit."
In information theoretic contexts, even bits can be fractional. And when describing very slow links, it could be meaningful to speak of such exotic and peculiar things as centibytes or centibits per second.
Contrived and weird, yes, but not totally nonsensical.
Gordon Mohr on September 12, 2007 6:32 PMAs any sys-admin from the 90's knows, NT is an acronym for "Nice Try".
'They should label the HDDs like monitors. 19"(17.4" Viewable), 500GB (465GB Usable).'
---
That isn't the best analogy.
A) For the 2 reported measurements of the CRT monitor, there's a discrepancy in *what's being measured* ('tube size' versus 'viewable size'), but the unit of measurement (inches) is the same.
B) For the 2 reported measurements of the hard drive, what's being measured (storage space) is the same, but the *unit of measurement* (decimal GB vs. binary GB) is different.
To say a hard drive has 500 GB (465 GB usable) of space is misleading to say the least. I could just as easily turn around and say that *500 GB are usable*, as long as you define 1 GB = 1,000,000,000 bytes. And as we all know, that is exactly what hard drive manufacturers do.
@ Matias > When I was 10, I remember my dad trying to explain to me the relative capacity of 20 MEGABYTE hard drive he got in a new computer. I asked him if it was possible to ever fill up that much space on a hard drive. He said that, practically, it was not possible.
Consider the amount of work you have to do to generate simple text (without all the cruft of Word files) - if you have to do it yourself, 20mb means a lot of typing. If that's what your dad had in mind, his point is quite understandable for that time.
I recall a friend of mine getting one of the first Pentiums (at 90MHz, which wasn't spectacular because our 486 ran at 80) with a gigabyte of harddisk space. I thought the same - how to fill all of that? Luckily, games that came on CD wanting to have their content installed from the harddisk because double-speed drives weren't fast enough solved that :).
Anyway, back to the 20mb drive. Go to a more advanced mode of content generation; image editing. Autodesk Animator makes 320 x 240 GIF files in 256 colors - if the computer could handle this in the first place. I think the time of 20 megabyte drives still had amber or greenscreen most of the time (I only had a C64, never an ST or Amiga). Every file is at most 60kb (if you'd make it random noise; if it contains actual art it's probably less). By the time you've made several of these files, you'd use floppies anyway (because you don't want to completely fill up the harddisk).
Now, shoot a few 4 megapixel pictures and you're through those 20 mb.
Rob Janssen on September 13, 2007 12:28 AMI can't even bring myself to read all the bs in the responses. Here's a quick fix. Look at Western Digital's new drive. It's called a 1TB drive. Here's the physical specs
Physical Specifications
Formatted Capacity 1,000,204 MB
Capacity 1 TB
Interface SATA 3 Gb/s
User Sectors Per Drive 1,953,525,168
see the rest at:
http://www.westerndigital.com/en/products/Products.asp?DriveID=336
The Scientists see units at 10^2; just the way it is. Consumers see units in 10 base units. Advertisers used the discrepancy to pad pocket liners. Western Digital was sued over this and has made the change. Now, we cannot make the Scientists change from counting data in binary units (which is how we got here to begin with), so let's just grow up and get over it.
stoogemaster on September 13, 2007 4:31 AMOne thing that lept out at me from the table in the article is the etymological derivations of the prefixes -- the higher ones, at least.
petta, exa, zetta, yotto = penta, hexa, septa, octo = 5, 6, 7, 8
Now if those prefixes stand for 2^50, 2^60, 2^70 and 2^80 they make sense, etymologically-speaking. 5, 6, 7, 8, see?
But if they stand for 10^15, 10^18, 10^21 and 10^24 there's no obvious relation between the names and the values -- they seem entirely arbitrary, and that's not good for memorability.
(Actually they're not arbitrary, they're a factor of three out, but that's not really going to help them from a mnemonic point of view.)
Considering that not many people who aren't storage nerds use petta- yet, let alone the others, the SI might as well come up with *different* decimal-friendly prefixes and leave the ones that actually refer to the binary powers alone. Surely?
Andrew Clegg on September 13, 2007 6:00 AMI think it's a bit out of line to characterize adoption of the International System of Units as "an old trick perpetuated by hard drive makers."
To a whole lot of buyers of those drives, 500G just means five hundred trillion.
John Pirie on September 13, 2007 9:24 AMMakes me want to pay for the drive in Canadian dollars.
brian on September 13, 2007 11:16 AMTo me, it seems there is a perfectly logical reason why hard drives are measured in derivatives of base-2 - and that is that any and all programs, or any other bits of data, when viewed in their rawest form, are stored in binary on the actual disc. As such, it makes complete and total sense to use base-2, or some derivative of it, to measure the size of the disc.
Oh, and John Pirie? To most people buying hard drives, whether it be by themselves, or as part of a pre-built system, '500GB' means nothing, really, by itself. To them, the only thing it means is that it should be the case that this is how much space Windows says the drive has. Currently, Windows, just like most other operating systems, will report a '500GB' drive as 465GB. In my opinion, drive manufacturers should measure the size of their drives the same way most computers would, simply in the interests of being clear to the end user.
Zmiddy on September 13, 2007 3:38 PMDamn whippersnappers.
Drives have been measured in powers of ten since the dawn of time, back when 32K (yes, K) was a big expensive disk. There's no conspiracy here. There's simply *NO GOOD REASON* to measure hard drives in powers of two, never was, never will be. If you want to blame someone, try DOS. They probably converted by (binary) shifting.
Memory, on the other hand, always grows by powers of two (add an address line, get double the memory.) There it makes some sense to measure in powers of two.
Next you young turks are going to say you want to measure your broadband speed in powers of two! Arrgh!
Now get offa my lawn!
-hans
Aaron G is completely right. The fact is, there is a meaning that most people have been assigning to terms like ‘kilobyte’ for decades. Some standards organization cannot come in and dictate that we all change how we use these terms any more than the French can be told not to use ‘e-mail’. You just cannot dictate human language. It won't change how people use these terms, but it will cause a lot of confusion instead.
I can understand the usability reasons for using decimal notation, but redefining decades-old common terms won't do it. Instead, all it does is cause people either to ignore that “standards” or simply to change to the new funny-sounding binary prefixes. In the end, the only ones using the decimal system are marketers. If they really wanted people to switch to base 10, new terms should have been invented for that instead (such as ‘kidebyte’).
David on September 13, 2007 4:41 PMYobibyte sounds funny in Russian. "Yob" is a root of the Russian equivalent of "f#ck".
Dmitry on September 13, 2007 10:05 PMSextillion sounds funny in English, for the same reason.
Catherine Zetta-Yobs on September 14, 2007 1:14 AM...any more than the French can be told not to use ‘e-mail’...
Which is being replaced by a French equivalent not without success. In the same way that they (the French) forced upon us the metric system, the use of family names, a birth register, etc.
Things like these are possible and done all the time.
Karel Thönissen on September 14, 2007 2:02 AMLet us not forget that screen measurements are on the diagonal.
I still remember taking a tape measure with me to buy a TV, because at that time there was "confusion" over the "proper" way to measure the display's size. IMO, it should always have both dimensions listed, just like the video cards do.
I know using 1024 as 1k etc is part of computing history, it especially makes sence in memory referencing etc. But why the hell does windows (and other programs are just as guilty)count files on hard drives this way. Ever tried comparing file sizes when ones in megabytes and ones in gigabytes across say two different programs and you don't know if there working off 1024 or 1000. You end up having to look at the full number in bytes to be sure your correct.
Pete on September 14, 2007 8:27 AMMainstream computers (this means every PC, Mac or otherwise) are binary computers without exception. We store our data on hard drive that write the data as binary (ones and zeros only please).
The blocks of data that you write to the hard drive are done in blocks which are ALWAYS multiples of 512 bytes. For example, 512 bytes, 1024 bytes, 4096 bytes.
There was an example of the Linux command "dd" use which was:
>The problem has an easy solution:
>
>The OS needs to start using SI prefixes correctly. Linux already >does.
>
>$ dd if=/dev/zero of=test bs=1MB count=10
>10+0 records in
>10+0 records out
>10000000 bytes (10 MB) copied, 0.0261481 seconds, 382 MB/s
>
>There's no reason to use powers of two for displaying file sizes. >It's ridiculous and makes it more confusing for the user.
>Sean on September 11, 2007 03:38 PM
This is a horrible example I/O use. The bs variable should also be one of the above examples block sizes (512, 1024, 4096) and specifically should be the block size of the filesystem to which you are writing. I agree the Linux is literal in its interpretation of MB but you are using the "dd" command poorly, Sean. Linux will do what you are asking, but please don't waste your lovely OS's time with commands like that. :P
Finally when you are referencing data in the processor you use binary to do so and we humans read hexadecimal more easily. In hexadecimal, 1024KiB is 0x400 KiB. 1000 KB is is 0x3E8 KB in hexadecimal.
If you want to use metric, be an engineer. Meanwhile, leave my metrics alone. 1024 bytes is 1 KB. The computer knows this. The programmer knows this. Even web developers use hex. Colors codes are hexadecimal.
Do mechanical engineers want me to redefine 1 meter as 99.53 cm? No, they don't. Leave your decimal out of my pure binary computer.
overlordofmu on September 14, 2007 9:30 AMI think the problem is that in CS, we can use whatever word for whatever purpose and its OK. Now, CS is a large field that has great tendency to overlap other fields like finance (for example). For a computer scientist, kB is ok to be 1024 byte. For a manager, its not. Before, kB were used only by computer scientists, now they are used everywhere.
I personally think that all this is an issue of whether we tell the world about that (like in school when we learn about kilometers) or we change to use the kilo as one thousand meaning.
I also think that 1 kB as 1024 is usefull when programming.
loki.jf on September 14, 2007 12:37 PMYup that happened to me. I bought 2 500 GB drives installed in a RAID configuration and installed my OS expecting to see 1TB and all I got was 931 GB. :(
If the drive manufactures are going to do that then a they should be selling 534 GB drives instead of 500 GB drives.
I WANT MY "1 TB"!!!!
Tim on September 14, 2007 1:39 PMInstead of forcing these ridiculous kibibytes and gibibytes on us, which is impossible for us macho ADA .Net programmers, and has created even more confusion, why can't the HDD manufacturers stay in SI-land without overloading our KB's, MB's and GB's?
My solution is simple: write 1K B (1000 bytes) instead of 1 KB (1024 bytes for programmers and computers, but 1000 bytes HDD manufacturers and buyers).
Everybody can save face: HDD manufacturers won't be lying any more, just by subtly moving a space, and programmers won't have to agree on disagreeing or agreeing with the kibibytes whenever they talk about volumes of bytes.
Joost on September 14, 2007 2:34 PMInstead of forcing these ridiculous kibibytes and gibibytes on us, which is impossible for us macho ADA .Net programmers, and has created even more confusion, why can't the HDD manufacturers stay in SI-land without overloading our KB's, MB's and GB's?
My solution is simple: write 1K B (1000 bytes) instead of 1 KB (1024 bytes for programmers and computers, but 1000 bytes for HDD manufacturers and buyers).
Everybody can save face: HDD manufacturers won't be lying any more, just by subtly moving a space, and programmers won't have to agree on disagreeing or agreeing with the kibibytes whenever they talk about volumes of bytes.
Joost on September 14, 2007 2:35 PMI rather waiting for a price drop in the terrabyte drives. I want to fill a T with my 0wn rainbow tables :)
Joost wrote: "My solution is simple: write 1K B (1000 bytes) instead of 1 KB (1024 bytes for programmers and computers, but 1000 bytes for HDD manufacturers and buyers)."
---
And how you would pronounce "1K B" so it's distinguishable from "1 KB"? Not all communication is written or electronic. Sometimes you actually have to *talk* to people.
"I am a CS major. I am utterly unaware of the world outside my tiny little shell. EVERYBODY IN THE WHOLE WORLD thinks that the SI prefixes mean powers of 2, and there is SO much history behind this usage -- literally DOZENS of years! Nobody uses the peta- prefix except for people talking about HARD DRIVES!"
(I'm a computer engineer myself, see above, just seriously amused by all this.)
Drew Thaler on September 17, 2007 6:33 PMDOZENS == sixteens, right?
Catherine Zetta-Yobs on September 17, 2007 7:00 PM" It's an old trick perpetuated by hard drive makers-- they intentionally use the official SI definitions "
Actually it isn't intentional. There is a good reason why hard disk uses SI while others like RAM uses binary. It's a bit confusing for the customers though. I already wrote about this in my blog.
http://instantfundas.blogspot.com/2007/08/1-gigabyte-is-not-equal-to-1024_22.html
kaushik on September 17, 2007 9:58 PMI don't know is this would be more clear.
I had made a comment to the effect that 1024 bytes is 0x400 bytes in hexadecimal and that 1000 bytes is 0x3E8 bytes in hexadecimal.
Maybe if I were to use binary it would be more obvious.
1024 bytes is 10000000000 bytes in binary.
1000 bytes is 01111101000 bytes in binary.
I call 1,048,576 bytes megabyte or 1 MB.
1,048,576 bytes is 100000000000000000000 bytes in binary.
1,000,000 bytes is 011110100001001000000 bytes in binary.
Our computers run countless billions of binary operations all day long and only convert to decimal when we humans need to see the data. Often it will display in hexadecimal for a kind human willing to meet the computer halfway.
At then end of a day the computer is the final judge and it clearly prefers to think of KB, MB and GB in terms of a binary number in the form of a 1 followed by 10, 20 or 30 zeros respectively.
This lovely machine has been programmed to convert the binary to decimal when needed so lets not force our "arbitrary" metric on it.
I know the pour marketing sod is a soulless bag of crap and is lying his/her booty off on the front of the box with a statement that 1,000,000,000 bytes is a GB.
I mean really now! Are you telling me that you are not skeptical of advertising already. Don't we as a planet take it for granted that all marketing people are earth are liars. They have gotten degrees in deception making and make a living distorting truth for the financial gain of their employer to the detriment of everyone else on earth.
There is no need for this debate. Computer will continue to use KB, MB and GB internally as powers of two. Marketing people will use powers of ten because it is a convenient lie. (They love the convenient ones, as they make their worthless lives easier.) Educated consumers already know the exchange rate of marketing to computer science MBs.
As a final note, I would like to suggest that all marketing professionals commit suicide.
Again, that is simply a suggestions.
It took me about a day to get over the retarded names of the SI units. Then I realized that they are not much worse than the base 10 units.
It's obvious that memory is addressed via bits and therefore the maximum theoretical addressable memory is always a power of two. Memory requirements don't necessarily scale like that.
Also it's funny that Knuth comments on this, where he basically agrees but says the names are too funny to be taken seriously.
http://www-cs-staff.stanford.edu/~knuth/news99.html
Once we slam a spacecraft or two into something in space we'll probably think they are less funny.
papa sleeze on September 23, 2007 11:42 PMI cannot believe the reaction from people decrying the new binary prefixes. Here is the issue.
Point #1. It can make sense to talk about both a decimal gigabyte and about a binary gigabyte. Can people accept that?
Result: We need two different units in order to talk about these things unambiguously. This gives us three options: Use the existing usage to mean the binary unit, use it to mean the decimal unit or define two new prefixes for each 'type' of gigabyte.
Two new units is rather foolish. And whhich sounds more reasonable - to state that the kilo prefix has an exception for certain units, but that there's this new prefix for that unit that maps to the more standard usage of kilo? Or to make it so that kilo always, always, always means 10^3, and create the new prefix to always mean 2^10?
I think you know the answer.
Qmanol on October 7, 2007 4:45 AMQuite relevant to this post.
buddydvd on November 2, 2007 1:27 PMWhat we have:
•kB means 1000 B by official SI definition.
•kB means 1024 B in traditional memory-related computer domain.
•KiB means 1024 B by official IEEE 1541 definition (note the capital "K").
So kB is more or less ambiguous, depending how the context relates to memory:
-RAM uses k=1024, else we get holes in adress space (yikes!).
-Bandwith uses k=1000, because it has no power-of-2 constraint.
-Hard drive capacity uses k=1000, but chunck allocation uses k=1024 to fit nicely in RAM.
-Audio CD uses non-power-of-2 chunks allocation because it is a streaming media (datarate being more important than adressing).
-Flash memory is treated like RAM if it hold a BIOS, like a hard drive if it's assembled into a USB pen.
ect ect
Some suggestions to remove ambiguity:
•State how much your kB is (visible everytime a size is displayed, not buried deep in the doc).
Verbose, but easy fix to add.
•Use "KdB" to mean 1000 B (d standing for decimal).
Non-standard, but as compact as can be.
•State both kB and KiB.
Might look bloated, but is also the most informative.
Use k=1024 only when realy necessary.
Remember the rest of the world uses k=1000, and rightly so.
I see no reason why there can't be an option for using both kilobytes (1000 bytes) and kibibytes (1024 bytes), like the labels here that say something like "1Gal. (3.8L)." Slowly people will begin to understand the relation between the two, like how many people learn that a yard is almost a meter.
As for sounding ridiculous, that's just ridiculous. They may sound funny, but so does the mole (mol) and the joule (J). In fact, my chemistry teacher in high school had us make a mole (the animal) for a grade!
Once again, the drives could use the metric standard and the binary standard, as in "500GB (465GiB)," allowing consumers to see the difference and keep them happier with the manufacturers because they knew the two possible measures that could be used, instead of feeling they were ripped off.
In the programming sense, using the standard 10^x is rather an annoying convention because of the nature of bits - 0 or 1. If they were to somehow come up with a 10 state bit (easily possible with quantum computers,) then I could see the warrant on using the standard metric definitions, but until then, no thanks. This difference in systems - base 2 instead of base 10 - led to the rise of other counting systems, such as octal and hexadecimal (hex). Personally, I like to count memory and the like in hex in the binary notation. In hex, this use of "strange" numbers tacked onto the end disappear. For example:
1024 Bytes = 0x00000400 Bytes = 1KiB
1024 KiByt = 0x00100000 Bytes = 1MiB
1024 MiByt = 0x40000000 Bytes = 1GiB
It also comes in handy to use the KiB notation in small systems, where you need to know exactly how much memory you have left and if it's enough for a 4KiB image.
Oh yes, the reason we use binary measurements is because comuters use binary! Addressing for both RAM and hard-disk is done using the binary/hex system. That being the case, it makes sense to me that they use the binary versions of the prefixes, but that would confuse people. So once again, I think listing both notations on the package makes plenty of sense.
Not to mention, if a byte were a standard SI unit, then it would be made of 10 bits. Then you could have a real decibyte. But naturally, if there was such a change, all the software out there right now would wind up being pretty useless because it isn't built for 10 bit architectures (although that can fairly quickly be remedied).
In the end, I think placing both labels on products will help get people used to the relation of a GB and a GiB. I have started to be able to tell approximate size of large files from one system to another, similarly to the conversion of yards and meters.
Anyhow, that's what I think.
Aylon Tonok on November 20, 2007 8:10 PM@Luc: "We have a problem when we have a 500GB (1000 based) and you need a real 500GB (1024 based).
Workers in computer (Admins, programmers...) can deal with that.
But ordinary people are very confused about that."
I think we're using a very odd value of "ordinary". If you need a 500GiB(base 2) hard-drive (which are pretty rare), surely it's more sensible just to get a 600GB(base 10). Easier to find, and one touch more space.
deworde on November 21, 2007 2:20 AMI think you are right, and I think that we are being short changed. I also bought a 500GB hard drive and found the same problem.
When I buy a car, I expect to get it home with four seats - not three. I also expect to get four wheels - not three.
If computers are logical systems, then we should try to talk about them in the same way. In maths, it is acceptable to round up or down figures accurately, for simple notation. Thus if the drive is 465GB (rounding up or down to suit), then that is what it should be. If all manufacturer followed suit, then there would be no problem and no exaggerations, or of the publics feelings of being short changed.
I think software should start incorporating units as they were fonts. If you are European, you leave of inches and feet units out and you never encounter them anywhere on your system again! Same with bits, bytes and kilo's. If you are nerd, you install a 1024 kilo unit, if you are stock trader you insert 1000 as kilo, if you are drugsdealer you insert K as kilo and if you are blond you insert "big" as kilo. And if this function is not usefull enough to incorporate for the kilo-nerds, then please do it to get rid of those freakin imperial shit. Oh, and make page-sizes font-like too! Trash the letter and tabloid. A0-A6 is all we need.
Blanka on March 25, 2008 3:50 PM"they intentionally use the official SI definitions of the Giga prefix so they can inflate the the sizes of their hard drives"
What a load of crap. Hard drives have been measured in powers of ten since they were first invented; long before the DOSes and MacOSes of the world started reporting sizes in powers of two.
The problem here is Microsoft, not marketing. What conceivable benefit is there to reporting a 100,000,000,000 byte drive as "93 GB" in one place and "95,367 MB" in another place? None. Microsoft's notation is stupid and useless.
Western Digital was absolutely correct in their response to getting sued:
'Surely Western Digital cannot be blamed for how software companies use the term “gigabyte”—a binary usage which, according to Plaintiff’s complaint, ignores both the historical meaning of the term and the teachings of the industry standards bodies. In describing its HDD’s, Western Digital uses the term properly. Western Digital cannot be expected to reform the software industry. ... Apparently, Plaintiff believes that he could sue an egg company for fraud for labeling a carton of 12 eggs a “dozen,” because some bakers would view a “dozen” as including 13 items.' http://paulhutch.com/wordpress/?p=214
Using "G-" to mean "1,073,741,824" is just wrong, plain and simple.
How stupid on April 13, 2008 12:10 AMFor those who haven't yet, you'll want to check XKCD for a definitive standard on the topic:
Just passing on July 7, 2008 7:21 AMWell if it's "tradition" to always use binary prefixes then someone should change the Ethernet spec and other networking standards which have always used decimal prefixes, not binary.. the only thing that's naturally binary is memory (RAM).. hard drives shouldn't necessarily be.
What it comes down to it that consumers are STUPID! They believe Microsoft Windows when it tells them a file's size is 1 GB when really it's 1 GiB.. Maybe the dumbasses should try suing Microsoft for supplying faulty software instead of going after hard drive manufacturers.
KiB MiG GiB TiB on November 10, 2008 6:05 PMThe MBR disk format for hard drives has an upper limit of 2TiB per partition. If you have a disk that's more than 2TiB in size, you need to switch to the GPT format, which most OSes have only recently made available (e.g Windows XP 32-bit doesn't support GPT)
That definitely is a case where the binary/decimal confusion arises; while 2TB drives are pretty rare, it's not hard to build a big RAID array over the 2TiB limit. You do have to keep track of the fact that it's a 2TiB limit, not a 2TB limit. Single HDDs are up to 1.5TB, so it really won't be long before HDD manufacturers make disks that don't work with Windows XP. That will really annoy the anti-Vista zealots.
Richard on December 10, 2008 7:07 AM| Content (c) 2009 Jeff Atwood. Logo image used with permission of the author. (c) 1993 Steven C. McConnell. All Rights Reserved. |