December 29, 2004
Back when I was writing for Tech Report, I had an epiphany: the future of CPU development had to be multiple cores on the same die. Even in 2001, a simple extrapolation of transistor counts versus time bore this out: what the heck are they going to do with those millions of transistors they can add to chips every year? Increase level two cache to twenty megabytes? Add to that the well known heat and scaling problems of Intel's "more Ghz, less work per Mhz" Pentium 4 architecture and you've got a recipe for both lower clocks and lots of transistors. When you can't go forward, go sideways: more CPUs on the same die.
Unfortunately, as Chris Sells points out, our current languages are extremely poorly suited for the kind of development necessary in a world where CPUs don't get faster:
Because CPU speeds have topped off recently even though I/O speeds continue to increase, Herb Sutter posits that the Moore's Law free performance lunch is over, i.e. no more getting faster software by waiting for the next gen of faster hardware. Instead, we'll have to write our apps to be a lot more concurrent to take advantage of hyper-threading, multi-core CPUs and multi-CPU machines if we want our apps to continue to run faster.
What this means to me is that we'll have to have much better language-level support for concurrent programming. What we have now sucks. Rocks. Hard.
Does it suck rocks? Hard? Rick Brewster recently wrote this about the threading code in Paint.NET:
[Algorithms for splitting rendering work between processors], as well as the thread synchronization that goes with the progressive effect rendering, is easily the most complex code in Paint.NET. It's worth it though because this gives us a huge performance boost when rendering effects.
You can make a very strong case that, as developers, we're pretty screwed if the only way to get more performance out of our apps is to make them heavily multithreaded. Writing software is hard enough as-is without adding a pinch of "the most complex code in our app" throughout.. your app. The best treatment of the perils of threading I've found is in Dan Appleman's book Moving to VB.NET: Strategies, Concepts, and Code:
In his classic science fiction novel The Stars My Destination, Alfred Bester describes a psychokinetic explosive called PyrE -- so powerful that a single grain of it can blow up a house. And all that is needed for it to blow up is for anyone to just think at it and want it to explode. The hero of the story has to decide whether to keep it locked up and secret or to spread it around the planet leaving the fate of the world in the hands and thoughts of every single person on earth.
Which brings us to multithreading.
It's a useful technology -- one that has the potential to improve your application's real (or perceived) performance. But it is the software equivalent of a nuclear device because if it is used incorrectly, it can blow up in your face. No -- worse than that -- used incorrectly, it can destroy your reputation and your business because it has nearly infinite potential to increase your testing and debugging costs.
Multithreading in VB.NET scares me more than any other new feature. And as is the case with a number of new .NET technologies, the reason for this has to do with human factors as much as with technology.
Several months before the .NET PDC preview,' I was doing a session with Bill Storage at a VBits conference. I asked the audience, which consisted of fairly experienced Visual Basic programmers, whether they wanted free threading in the next version of Visual Basic. Almost without exception, their hands quickly went up. I then asked how many of them actually understood what they were asking for. Only a few hands were raised and there were knowing smiles on the faces of those individuals.
I'm afraid of multithreading in VB.NET because Visual Basic programmers have little in their experience to prepare them for designing and debugging free-threaded applications VB6 provides enormous protection (along with severe limits) in its implementation of multithreading. The only way to use free threading safely is to understand it and to design your applications correctly.
Again, I stress, design your applications correctly. If your design is incorrect, it will be virtually impossible to patch up the problems later. And again, the potential cost to fix threading problems has no upper limit.
I've always felt that it's my responsibility as an author to not only teach technology but to put it into context and help readers choose the right technology to solve their problems. Because multithreading is such a serious issue, I've decided to take a somewhat unusual approach in teaching it. Instead of focusing on the benefits of multithreading and why you would want to use it, I'm going to start by doing my best to help you gain a healthy respect for the technology and the kinds of problems you will run into. Only towards the end of this chapter, once you understand how to use multithreading, will I discuss scenarios where it is advisable to use it.
This book is written for VB developers, however, I believe Dan's cautionary tale applies to virtually all developers currently working in Windows. Programming with threads is hard because:
- our current programming models don't deal with concurrency well
- most of the programming we do is linear in nature
- programmers have a hard time thinking in terms of events than can interrupt each other at any time
While threading can-- and should-- be made easier in .NET 2.0, I seriously doubt programmers can adapt to a concurrent world without deeper, more radical changes.
Posted by Jeff Atwood
Interesting side note; BeOS had a completely asynchronous API, which is cited as one of the reasons for its *failure*:
BeOS has a very elegant API, really a pleasure to work with, but it is not as powerful than any of its competitors. Additionally, there are no good development tools for BeOS, no good visual GUI designers, no full-featured debuggers, no profilers... Also, under BeOS you constantly need to take care of multithreading issues and write your code around the fact that everything is so multithreaded on BeOS that could create deadlocks where you would least expect it. Writing small apps for the BeOS is a joy, writing anything more complex or serious though is a real pain.
I've been hearing for the last few years how BeOS has an amazingly responsive GUI because of the way it is written. You almost never wait for the GUI to respond like you do under Windows or OS X.
That seemed too good to be true; something was obviously missing from the picture. Finally, after reading this article on OS News I understood the BeOS thing: it's not fast, it simply has a largely asynchronous API. This provides a good GUI experience but is a pain in the ass to program.
At least with traditional programming languages. But say you used a language based on the join calculus like the Polyphonic C#... Hm...
Come on: concurrency is a solved problem. So the next step is to make it easy to do right without a comp-sci degree.
I don't agree that it's solved; even if it was, how do you deal with the order-of-magnitude increase in difficulty? Clearly we need a new paradigm.
The good news is, it'll take a solid 3 years before multi-cpu machines* are common.
* beyond hyperthreading, which is only good for 15 percent performance increase in the most absolutely optimal of programming conditions. Hyperthreading is a "mini-me" CPU that isn't capable of full computing tasks. Interesting commentary from Raymond Chen on this here:
Okay, I take back most of what I said. I thought it would be easier to abstract away some of the complexity of concurrency that the BCL leaves uncovered, but my attempts in the past week to build a novice-friendly class library for multithreading proved fruitless. Irsquo;d forgotten how difficult it is to solve even application-specific threading problems, such as implementing the asynchronous methods BeginDoMyWork() CancelDoMyWork() and EndDoMyWork() alongside the synchronous method DoMyWork().
I think that's what Chris was getting at: we don't have the right toolset to solve the concurrency problem. Someone like Anders Hejlsberg has to attack it and change the framework.. possibly even come up with a new language entirely.
All we can offer is band-aids like BackgroundWorker. Which is certainly better than nothing, but it's nowhere near the paradigm change we're gonna need three years from now if these predictions pan out!
It's odd to see a discussion like this and not see anyone mention Erlang. http://www.erlang.org
My personal opinion is that Erlang is too "functional" to pick up the necessary programmer support... although I think the "the general public never really masters good multi-threaded programming and it ever remains a thing for the masters" outcome probability is non-zero and in that case something semi-obscure like Erlang could still win out.
But I suspect that if nothing else, we're going to end up with the basic Erlang model: Extremely cheap messaging, extremely cheap threads (want 5000? go for it), and absolutely no modifying the data in one thread from another, at least logically. (For performance reasons behind the scenes there may be "shared data", but it'd be copy-on-write.) Might as well chuck in the network-transparency on the messaging while you're at it.
Technically, there's no reason that these three characteristics can't exist in a "conventional", non-functional language, which is why I don't think Erlang is going to be the Big Winner since that new language is likely to get a lot more support from the general programming public. But the relationship will be clear.
But I don't think you're going to be able to patch your way to it from current languages anytime soon. Dynamically scheduling your heavy threads onto multiple processors under varying loads and trying to share data space the entire time is just fundamentally going to be too much to deal with. Much better to break the task up into lots of small parts and let the computer manage scheduling. Erlang proves its possible, at least if you build it into the language from day one, which was really the hard part of believing this.
I think one of the biggest hurdles will be to get the general business developer (i.e. average VB, .NET developer) to think about their application in terms of concurrency. They need to start identifying how their application can perform multiple business tasks in parallel. If they decide that identifying these concurrent processes is too difficult or that it takes too much time, they will then just rely on designing the application as they always have: serially.
Multi-threaded, concurrent programming is the hardest programming I have ever done, and I took the course (literally - CSC375) from a master - Doug Lea, who wrote ALL of the java.util.concurrent classes. But once I "got" it, it changed my entire outlook on developing software.
Also, for what it is worth, the java.util.concurrent classes can be translated into .net(for the most part) as the source is available. As a bonus, the javadocs that Doug wrote are excellent.
I don’t have any expectation that anytime soon, a massive breakthrough will occur that will make parallel programming much easier. It’s been an active research project for many years. Better tools will help and somewhat better programming methodologies will help. One of the big problems with modern game development with C/C++ languages is that your junior programmer who’s supposed to be over there working on how the pistol works can’t have one tiny little race condition that interacts with the background thread doing something. I do sweat about the fragility of what we do with the large-scale software stuff with multiple programmers developing on things, and adding multi-core development makes it much scarier and much worse in that regard.
That's from John Carmack, hardest of the hardcore C programmers, and the author of Doom, Quake and other games.
Concurrent programming is difficult because the thread model sucks. The thread model is popular because it maps well to the underlying hardware.
The solution is to change to another paradigm - by switching to a language that integrates it. One option is Erlang. Another is occam, based on the CSP model (http://www.wotug.org). It is extremely easy writing concurrent software with occam - and it's not even functional (although I regard this as a drawback rather than desirable).
A concurrent programming language implements the hard parts of concurreny once and for all. All the programmer has to do is to adobt the habit of writing easily understandable concurrent programs instead of sequential ones. This shouldn't be too hard - most problems aren't strictly sequential by nature. And the "hard part" of concurrency isn't actually so hard to implement if you have a good underlying model.
Of course, limiting yourself to one model of concurrency prevents you from doing certain things that might be crucial to performance in certain situations. But this is little different from when you have to switch to assembler from within C. Anyway, unless you have a very good understanding of what you're doing, limits are actually a good thing.
How about a web survey hosted on stackoverflow.com that tracks what proportion of developers are doing web/desktop/embedded development?
Sure, its only likely to be roughly accurate, but it'd be interesting statistics just the same ....
It occurs to me that one of the advantages of functional- or functional-supporting languages is that functional programming is very much about defined interfaces. In the case of concurrent processing, the protocols for interchanging data and how they operate can be analysed easily in terms of functional concepts. On that basis, I think that good concurrent primitives and support are most likely to be handled best by those who have a good backing in functional programming, in the same way that functional programming concepts help very much in object oriented or procedural development.
I think concurrency is simply taught incorrectly. Event based code execution is not that complicated, it's trying to make multiple threads somehow finish at the same time that is complicated (in fact, nearly impossible).
I think the general consensus is simply that imperative programming is fundamentally the wrong paradigm for dealing with concurrent applications. Functional (and semi-functional — Erlang) languages are making leaps and bounds in this area (such as Erlang's actor model, or Haskell's STM), while ‘traditional’ imperative languages seem to have reached a local maximum. Concurrency in these models does still require some thought, but overall I would say that there are a lot less opportunities for horrible bugs to slip in.