I <3 Steve McConnell*
Coding Horror
programming and human factors
by Jeff Atwood

March 23, 2007

Folding: The Death of the General Purpose CPU

A few recent articles have highlighted the disproportionate contribution Playstation 3 consoles are making to the Folding@Home effort. The OS statistics page for Folding@Home tells the tale:

 TFLOPSActive CPUsTotal CPUs
Windows152160,1731,626,609
Mac/PPC78,77695,435
Mac/Intel92,8647,400
Linux4325,239216,067
GPU437332,228
PS365926,91129,843

There are a couple caveats to bear in mind when reading this chart:

  1. The measurement of FLOPS isn't an exact science. It would be more accurate to compare actual work units returned, but I don't see any way to do that from the folding statistics page.
  2. Current PC and Mac / PPC contributors span the entire gamut of CPUs released in the last seven years.
  3. Folding does cost money, in the form of electricity. Superior clients offer efficiency: bang per watt. You could make a compelling argument that certain clients with low efficiency aren't worth the cost of the electricity they're using. For reference, a PS3 and a gaming-class PC both use about 200 watts of power under load.

The Playstation 3 is indeed dominating the charts; as of this writing, the PS3 is responsible for a whopping 72 percent of the computing power in the entire Folding@Home project.

UPDATE: as of 3/26/2007, the F@H network has arbitrarily halved the TFLOPS score for the PS3.

PS3 folding user interface

It's only a matter of time-- a few weeks at most-- before the PS3 constitutes more than 95 percent of the computing power in the entire Folding@Home network. This doesn't surprise me in the least. The Playstation 3 can harness the considerable power of its specialized Cell CPU to crunch work units far more efficiently than any general purpose CPU ever could.

If you look closely at the chart, you'll see even more powerful evidence of the dominance of specialized processors.

TFLOPS per CPU type graph

GPU clients run on modern, high-end video cards. The GPU on these video cards is even more specialized than the Cell processor in the PS3.

The GPU client is limited to the current high-end ATI X1800 and X1900 video cards at the moment, which are already a generation behind NVIDIA's newest 8800 series. Even so, the GPU clients are almost 2.5 times faster than the PS3. Of course, this performance differential is more than balanced by the fact that PS3 is an easily obtainible (albeit somewhat expensive) consumer item; it's trivially easy to add one to the Folding@Home network, whereas the Folding@Home GPU client is quite immature, and few users have the necessary high-end ATI video cards to use it.

But the real lesson of this chart lies in the OS X / Intel data point. Intel-based Macs are, by definition, based on only the newest Intel processors-- Core Duo or better. Even so, it's an utter blowout:

Intel Core Duo1x
PS37.8x faster
GPU18.6x faster

With these kinds of performance ratios-- and I expect the performance gap to widen every year-- there's almost no point in adding general purpose CPUs to the folding network any more. It's a waste of time, effort, and electricity.

For folding and other distributed computing efforts, it's the death of the general purpose CPU as we know it.

Posted by Jeff Atwood    View blog reactions

 

« Top 6 List of Programming Top 10 Lists What's Wrong With The Daily WTF »

 

Comments

If you own an ATI X1800 or X1900 series video card, try out the GPU folding@home clients here:

http://folding.stanford.edu/download.html

Normally I recommend the console client, but for GPU support, I recommend the graphical client. It's the easiest one to set up. Complete GPU client FAQ is here:

http://folding.stanford.edu/FAQ-ATI.html

If you're curious what the performance/watt breakdown is for the CPU vs. GPU, there's a nice investigation of that here:

http://techreport.com/etc/2006q4/gpu-folding/index.x?pg=1

Also, based on some forum comments by Jason Cross, GPU clients may get even faster:
---
[The GPU client] uses DirectX as the interface, which is causing a lot of slowdown. Word on the street is that there should be a client that uses ATI's "CTM" (close to the metal) programming model which will bypass DirectX, eliminate some of those inefficiencies, and possibly speed up things as much as 2x or more. Hopefully they'll do the same with Nvidia's similar "CUDA" thing. If they can bypass DirectX as a GPU interface and use the DX10 cards, those GPU clients could easily turn in way more than 200 gigaflops each. Nutty!
---

Jeff Atwood on March 25, 2007 12:08 AM

One other bonus the PS3, and consoles in general, have in distributed computing initiatives, is that the option to start working is one of very few options on the console's menu.

Rather than on a PC, where someone has to install the program then remember to run it - there is the option with the press of a few buttons on the console.

If they can get 50% of their installed user-base running the folding program instead of just turning off their machine, they will have a huge processor base to use (hopefully for good rather than evil).

Ian Tyrrell on March 25, 2007 12:58 AM

Also, a gaming console like the PS3 isn't being used most of the day, so it can use 100% of its capacity to fold all the time. A computer is likely to be used continually, and can only get some folding work done when it wouldn't hinder the user.

Jeff, you mistyped "Core" instead of "Cell" in the first link.

Chris Nahr on March 25, 2007 01:13 AM

This gap has always been there. ASIPs (and even more ASICs) have always outperformed general purpose CPUs by huge factors (in the ASIC case a very huge factor, but then that's not exactly a fair competition). In lot of cases it's actually quite easy to outperform modern general purpose CPUs with an ASIC by factors, even on old "cheap" technology.

Chris on March 25, 2007 02:24 AM

This is very definitely the dog's bollocks. I was hoping the PS3 would rock, but I didn't expect it to be providing the bulk of the project's total flops this soon.

I disagree with your conclusion about CPUs being futile, though. Humble as my Athlon-64 may be, it is contributing more to the project with the F@h client running than it would be with the F@h client not running.

Russell Wallace on March 25, 2007 03:06 AM

So? Those graphics cards outperform the CPU at graphics calculations as well.

I mean, it's kind of interesting, but it's an expected result - specialized processors perform better at what they are specialized for. It just so happens that folding is a similar type problem to graphics processing.

Mike on March 25, 2007 03:36 AM

> it's an expected result

In less than a week after the PS3's folding client went live:

- 4,000 PS3s equalled the computing power of the existing 160,000+ CPUs currently in the F@H network.
- the PS3 now constitutes 73.3% of all computing power in the entire F@H network (note that this is up from 72.2% when I wrote the article, only a few hours ago)

That isn't just an expected result, it's an overnight sea change. We're waking up in a whole new world for distributed computing projects.

> Humble as my Athlon-64 may be, it is contributing more to the project with the F@h client running than it would be with the F@h client not running.

And with a $200 video card upgrade, you could be contributing 18.6 times as much ;)

Jeff Atwood on March 25, 2007 03:53 AM

> We're waking up in a whole new world for distributed
> computing projects.

Not really. We're waking up in a whole new world where Joe Average has some serious stream-processing power. Give it two years more and every Intel / AMD PC will have a GPU capable of running Folding@Home. Then you'll see some serious increase in the TFLOPS.

Whether the processing model that GPUs (and other stream-processing hardware such as Cell) provide can be used for distributed projects is still dependent on just what the distributed project is trying to do.

Put another way - I suspect Folding@Home was chosen over other distributed projects because it is a perfect match for the Cell processor, and not because it is the most distributed or something similar.

Either way - I think this proves that we can no longer limit ourselves to one processing model. Instead of trying to shoehorn everything into the one true model we have to provide multiple types of processing hardware so that algorithms can run on the most suitable hardware.

tcliu on March 25, 2007 04:33 AM

"For folding and other distributed computing efforts, it's the death of the general purpose CPU as we know it."

The general purpose CPUs days are numbered (at least for desktop machines) by the next attempt at AMD or Intel to grab the x86 technological lead and introduce asymmetric cores for x86 cpus. The question is at what transistor density/core count this becomes feasible.

Factory on March 25, 2007 05:44 AM

"I mean, it's kind of interesting, but it's an expected result - specialized processors perform better at what they are specialized for."

ATI X1900s are most definitely not specialized for protien folding, yet at the moment they're the most powerful protien folders on the market. (Barring ASICs and $100K HPC clusers.) Your argument only applies to Cell, which was designed for HPCC from the ground up, not 3D rendering. It's a vindication of, for lack of a better name, general purpose stream processing.

Foxyshadis on March 25, 2007 07:17 AM

While this is great news and all, it is absolutly no reason for people to stop using their cpu clients. The GPU/PS3's are only capable of solving 1/8 of the total work units, the other 7/8's of all work units need to be solved by cpu's.

ubergeek42 on March 25, 2007 08:08 AM

From what I understand (and I may be wrong) the PS3 clients can only process certain specific work units, and if that's the case you'll never be able to accurately compare the PS3 client and the PC client, because they're not doing the same work.

Scabies on March 25, 2007 08:23 AM

So Jeff, are *you* folding?
At home?

Haacked on March 25, 2007 08:42 AM

The hardware was designed to stream large datasets to which you are doing a very similar set of instruction, with little to no branch prediction.

The particular instance of that type of problem it was aimed at was computer graphics.

I guess I would say that it was novel the first time someone realized that folding and computer graphics have that sort of similarity in how you solve them. Once someone realized that, though, I'm not surprised at all that it is faster on the GPU.

I don't see how it makes the CPU obsolete unless all you are ever doing is attacking streaming type problems. If you try to do any sort of difficult computation involving lots of branches, a GPU falls flat on its face. I've heard similar issues with programming for the PS3 and Xbox - while the graphics are stellar, any ai and gameplay calculations take a huge performance hit compared to a normal CPU.

So... sorry to be all negative. It IS really cool how well specialized hardware does specialized tasks, and the degree of speed you can get out of them for that tasks... I'm just saying it's not that unexpected, or why would you have that hardware to begin with?

Mike on March 25, 2007 09:23 AM

> ATI X1900s are most definitely not specialized for protien folding, yet at the moment they're the most powerful protien folders on the market.

X1900s are extremely specialized stream-processing units, and protein folding is a very good fit for stream-processing

> I've heard similar issues with programming for the PS3 and Xbox

Should actually only be the PS3: the Xbox360 has a tricore general-purpose PPC processor while the PS3 has a 9core CPU composed of a general-purpose PPC-based PPE and 8 stream-processing SPEs with very high floating-point performances.

Masklinn on March 25, 2007 11:31 AM

The bottom line is the amount of science done.

Are the PS3/GPU work units full work units, or are they "dumbed down" to work around the limitations (?) of the PS3?

That's the real question. TFLOPs are nice for stats-nuts, but for us science-nuts, it's not so important...

Michael Graham Richard on March 25, 2007 12:09 PM

For reference, when I had my Athlon X2 4800+ folding in the evenings from work (around 7pm - 8am), I achieved ~6,000 points per month.

http://folding.extremeoverclocking.com/user_summary.php?s=&u=79665

Compare with the Maryland PS3 team:

http://fah-web.stanford.edu/cgi-bin/main.py?qtype=teampage&teamnum=55312

Several users already have scores of more than 1,800 points, even though the PS3 folding client has only been available since Thursday, 3-22.

Jeff Atwood on March 25, 2007 02:55 PM

A little bit on electricity costs, eg, how much would it cost you if you left your PS3 or gaming PC on 24/7, folding?

http://www.codinghorror.com/blog/archives/000426.html

Our average in this area is 14.28 cents per kilowatt-hour, so..

200 watts * (8,760 hours per year) / 1,000 = 1,752 kilowatt-hours
1,752 kilowatt-hours * 14.28 cents / 100 = $250.18 per year

So folding all the time means you're "donating" $250.18 each year to the folding project. Just FYI.

Jeff Atwood on March 25, 2007 03:02 PM

Hmm. Looks like F@H changed their TFLOPS calculation.

Right now it's showing 520 TFLOPS for 30,253 active PS3s, which conflicts with the numbers I quoted last night: 659 TFLOPS for 26,911 active PS3s.

WTF?

Jeff Atwood on March 25, 2007 03:42 PM

I thought I'd mention something rather important a lot of people seemed to miss:

From: http://folding.stanford.edu/FAQ-PS3.html
"What type of calculations the PS3 client is capable of running?"
...
"In a nutshell, the PS3 takes the middle ground between GPU's (extreme speed, but at limited types of WU's) and CPU's (less speed, but more flexibility in types of WU's)." [WU = Work Units]

I'm probaly wrong, but I think that means that there will be plenty of work for general-purpose cpus from folding@home that gpus and ps3s can't cover; so saying that a cpu on the folding@home project is a waste of energy, may be true, but they probally need what cpus can do that gpus and cell can't process very well at all.

Although I know nothing of it, it's definately something to consider.

Mick on March 25, 2007 04:41 PM

> 1,752 kilowatt-hours * 14.28 cents / 100 = $250.18 per year

In this blog we see people writing theyr software, creating theyr computers.. why you don't create your energy? Solar power is a good solution, a bit too expensive at the moment (with an average of 1.7years to pay the energy fabbrication costs). But those guys are going to revolutionize the way to think energy: http://www.nanosolar.com/economic.htm

Max

MaxL on March 25, 2007 06:35 PM

Apples and oranges.

GPUs and the PS3's SPEs (aka Cells) are very fast at doing specific vectorized calculations seen in graphics, animation, physics, video, etc. But when it comes to generic programming, the vast majority of code, the CPU is still king due to its flexability.

Aside from GUI drawing, most applications won't benefit from a GPU. Having a fancy video card won't make Excel or Word run any faster.

However, many games, graphics/video applications, simulations can benefit greatly from a GPU.

Blaine on March 25, 2007 07:39 PM

If you're interested in GPU folding, read on. According to this thread:

http://www.storageforum.net/forum/showthread.php?p=93429

The ATI X1950Pro offers the best combination of low price and lots of pixel shaders, which is the key determinant of folding perf.

They also recommend 512 MB cards, as non-gaming general video performance can be significantly hindered when GPU folding on 256 MB cards. Here's a list of X1950PRO cards on newegg with 512 MB RAM:

http://www.newegg.com/Product/ProductList.aspx?Submit=ENE&Description=x1950pro%20512mb&bop=And&Order=PRICE

Not bad for under $200. Concidentally, these are the exact same cards I picked for the Hanselman Ultimate Developer Rig Throwdown.

Jeff Atwood on March 25, 2007 10:36 PM

yeh -- as you caught on already Jeff, the figures for the ps3 have been revised, and pretty much halved.
they're still fairly good, just not as outstanding.

some info here:

http://gizmodo.com/gadgets/ps3folding

lb on March 25, 2007 11:30 PM

What's really funny about this is that F@H themselves have to set abitrarily low point values so as to avoid gaming their own system.

Even though GPU and PS3 clients do FAR more computations, they can't award them scores relative to their computing power-- otherwise people would have no incentive whatsoever to run the slower clients. From the PS3 FAQ:

--
The PS3 is outrunning all the rest of the FAH client types. Should I stop my existing PC/GPU/... FAH clients?

No, the other clients are valuable to us too and we have chosen a points system to try to reflect the relative merits of each different platform to our scientific research. For example, the SMP client has been producing some very exciting scientific results and continues to be very important in our work. By supporting machines with lots of different functionality, we have a very rich set of hardware on which to run calculations, allowing us to tailor calculations to the hardware to achieve maximum performance.
--

Jeff Atwood on March 26, 2007 12:24 AM


"UPDATE: as of 3/27/2007, the F@H network has arbitrarily halved the TFLOPS score for the PS3."

Given that I'm reading this at 08:57 BST on March 26th 2007, and there is currently nowhere on the planet that has reached March 27th yet, I'd say that the folding project has taken a leaf out the 'Dune' novel by Frank Herbert and has started folding space to effect time travel.

;-)

Bob Armour on March 26, 2007 12:59 AM

Programmers are so very intolerant of each others' mistakes. All the more reason to love 'em. :)

Jeff Atwood on March 26, 2007 02:16 AM

> there's almost no point in adding general purpose CPUs to the folding network any more.

in the ps3 faq they state that you should still use norem clients too, because the ps3 client can not perform all needed calculations.

Ralph on March 26, 2007 02:31 AM

If a card with a suitable GPU is 18x faster, and costs the same as 4/5 of a year's electricity, then maybe you could buy one and run your client for 1/10 the time, and feel virtuous on both folding work and energy fronts (though I don't know how much energy or other resources the card takes to manufacture), and get a spiffy new card in the process.

Pete

Pete Kirkham on March 26, 2007 04:49 AM

Just a clarification, Intel Macs can also be Minis with the Intel Core Single running at 1.5GHz (I think that's the slowest Intel Mac there is).

Ryan Collins on March 26, 2007 10:01 AM

Mike wrote:
The hardware was designed to stream large datasets to which you are doing a very similar set of instruction, with little to no branch prediction.

I would say then that more than anything, this is a vindication of old-school vector processing :)

JCMay on March 26, 2007 10:01 AM

Ars Technica just wrote something about this:

http://arstechnica.com/news.ars/post/20070326-why-the-playstation-3-owns-the-pc-in-fh.html

Michael Graham Richard on March 26, 2007 10:14 AM

Well, as others have already said, you're really comparing apples and oranges (general-purpose vs. highly specialized processors). But, stream processing does have a big future. Look at this Wikipedia article for more info: http://en.wikipedia.org/wiki/Stream_processing

Especially at the bottom of that page there are a number of interesting links. For example:

Stream processing for the Masses? I don’t think so!
http://www.thinkingparallel.com/2006/09/22/stream-processing-for-the-masses-i-dont-think-so/

Note that the Cell processor (the processor of the PS3) isn't really a stream processor like a GPU is; it is actually a PowerPC CPU with eight floating-point co-processors.

Jesper on March 26, 2007 11:24 AM

"The measurement of FLOPS isn't an exact science. It would be more accurate to compare actual work units returned..."

That's one heck of an understatement.

The "flops" measurement is utterly meaningless, because it's based pretty much entirely on Sony marketing. Sony has understood for years that if you make big impressive-sounding statements about the raw power of your console, people who really should know better will almost always take them at face value.

Jeff on March 26, 2007 12:04 PM

It would not be better to report workunits in Folding. Folding consists of the client and the different "cores", the latter are downloaded by the first as needed. These cores do the actual calculations and they are rather different from each other. One kind could take an hour an another several days, and a different CPU models are stronger on different cores.

That's the reason Folding started counting "points" in the first place, which is a sort of manually weighted score for each CPU (since they do different work). So it is only natural it takes a while for them to adapt their score calculations to the new PS3 architecture.

So I suspect the GPUs and CPUs don't always do the same kind of work, making comparisons somewhat an apples and oranges kind of thing.

Jonas B. on March 31, 2007 03:56 AM

How does folding proteins make the PS3 specialized? Lots of users are using these for other things, some people are even playing video games on this designed-for-protein CPU.

The 8086 is a badly designed and outdated standard that only remains the standard because volume of production and software compatibility prevent a better designed CPU competing. CISC is otherwise dead.

While you could say the PS3 is more specialized since it performs well only when running in parallel, I would say an 8086 is even more specialized since it is designed primarily around being able to run existing software with performance being secondary.

Interestingly, 8086 seems to be going this direction as well. Dual core is almost standard, quad core is common, and it seems the number of cores is going to keep increasing. Soon 8086 will have the similar limitation of only performing well in parallel.

If Linux even becomes more mainsteam than windows (don't laugh if you haven't tried Vista) or if .NET and Java and other interpreted languages overtake compiled to machine level code (more likely) I think we can finally burry 8086.

Brad B on April 10, 2007 10:26 AM

> CISC is otherwise dead.

Yes. Except for the fact that it's the most common processor architecture in the world today, CISC is otherwise dead. :)

Architecturally, modern x86 chips are RISC cores that emulate x86, and have been for a long time.

Jeff Atwood on April 10, 2007 01:49 PM







(hear it spoken)


(no HTML)




Content (c) 2008 Jeff Atwood. Logo image used with permission of the author. (c) 1993 Steven C. McConnell. All Rights Reserved.