March 9, 2009
The performance of any computer is akin to a shell game.
The computer performance shell game, also known as "find the bottleneck", is always played between these four resources:
At any given moment, your computer is waiting for some operation to complete on one of these resources. But which one: CPU, memory, disk, or network? If you're interested in performance, the absolute first thing you have to do is determine which of these bottlenecks is currently impeding performance -- and eliminate it. At which point the bottleneck often shifts to some other part of the system, far too rapidly for your eye to see. Just like a real shell game.
So the art of performance monitoring is, first and foremost, getting your computer to tell you what's going on in each of these areas -- so you can make your best guess where the pea is right now.
My previous performance drug of choice was Task Manager, or its vastly more sophisticated bigger brother, Process Explorer. But now that I've discovered the Reliability and Performance Monitor, I can't stop watching it. It is my crystal meth. While the previous tools were solid enough, they both had one glaring flaw. They only showed CPU load and memory usage. Those are frequently performance bottlenecks, to be sure, but they're only part of the story.
The Reliability and Performance Monitoring tool, while continuing the fine Microsoft product tradition of absolutely freaking horrible names, is new to Windows Vista and Windows Server 2008. And it rocks.
Right off the bat you get a nice summary of what's going on in your computer performance shell game, with an overview graph and high water marks for CPU, Disk, Network, and Memory, along with scaled numbers. Eyeball this one key set of graphs and you can usually get a pretty good idea which part of your computer is working overtime.
There are also collapsible detail sections for each graph. On these detail sections, bear in mind the numbers are all live, and the default sort orders tend to bring the most active things to the top. And they stay at the top until they're no longer using that resource, at which point they disappear. The detail sections are a quick way to drill down into each resource and see what programs and processes are monopolizing it at any given time.
The CPU detail section gives you a moving average of CPU usage, which is much saner than Task Manager's always shifting numbers. Admittedly, this section isn't radically different than taskman -- and it's functionally identical to the Unix
top command. But the moving average alone is surprisingly helpful in avoiding obsessing over rapid peaks and valleys.
The Disk detail section shows which processes are reading and writing to disk, for what filenames/paths, and how long it's taking to service those requests -- in real time. I generally alternate between read and write sort order here, although sometimes response time can be informative as well.
The Network detail section shows which processes are sending the most data over the network right now. On a public website, this gives you an at-a-glance breakdown of which IP addresses are hitting you the hardest. In fact, while checking this, I just laid down another IP ban for some random IP that was scraping the heck out of our site.
The Memory detail section shows the five most essential metrics for memory usage in real time. Hard Faults are, of course, forced reads from disk into memory -- something you want to keep a close eye on. And Working Set is the best general indicator of how much memory a process is actively using to do its thing.
The computer performance shell game is nothing new; it is as old as computing itself. And it is a deeply satisfying game for those of us who love this stuff.
I thought I knew how to play it, until I discovered the Reliability and Performance Monitor. Now that I have a utility like this to let me suss out exactly where that performance pea is, I realize how much I was missing.
Now, on to three card monte. Watch my hands closely!
Posted by Jeff Atwood
A cool tool for performance monitoring is Spotlight by Quest. Spotlight for Windows is now a freebie download. It has helped me out several times... It has also wasted hours of my life :(
The screen i/o is a huge bottleneck! Especially writing to a textbox.
I once increased the file processing speed of a file parser (read line from file, insert to database) by nearly two times by changing the onscreen log/status to fill up the current view without scrollbars then clear it, repeatedly. Performance was doubled again by disabling the onscreen log altogether.
Console I/O is a bottleneck....on Windows. It is blistering and an absolutely trivial part of program execution time on other operating systems.
In Vista, it's under a big button on the Task Manager Performance tab labelled Resource Monitor....
And yes, it is truly addictive, especially the Disk activity list... which process (PID) is pounding which disk file... ooooh look there goes Norton Ghost! Disk *and* Network! :)
The performance monitor completely misses the *other* memory bottleneck: memory latency. On modern systems, cache-efficient data structures can be a lot faster than ones that require less CPU instructions but more access to main memory. This overhead shows itself as CPU load even though the CPU sits idle. With multicore where many cores compete for the memory bus things get even more interesting.
Now Jeff, you didn't put those IP addresses out there on purpose while subtly hinting at the ban as an effort to make the offender famous now are you?
@Charles - agreed. Lack of morning coffee has the biggest impact on performance.
I'd just like to point out the obvious factor left out here - that performance monitoring should be part of development and should be done pre and post change in order to determine the performance of a change.
Sure you can try to monitor and deal with individual applications, but when developing you should be aware of your program's impact - both on resources and other applications.
Let's not forgot the other key monitoring variable and that's monitoring frequency. On the *nix side of the world I'm amazed how many people run sar with the default monitoring interval of 10 minutes/sample, thinking anything more frequent than that is going to hurt them and don't realize [or care] that the resultant data is mush. While it may be nice to summarize the day with 144 data samples, it's next to worthless for any real analysis.
When I wrote collectl (thanks for the earlier plug), not only did I make the default monitoring interval for the daemon 10 seconds (that's 8640 samples/day), the default interactive sample is 1 second and you can even go sub-second when you need to and yes, there are times you need to. And like other posters noted it like many other tools DOES show cpu, disk, network, memory and more. How about Infiniband? While it IS a network of sorts most tools don't show it. See http://collectl.sourceforge.net/ to read more.
That also raises another point - the infatuation people have with graphics. While they certainly DO have their place to give you a high level view, you still need to occasionally drill into the raw data. Collectl gives you both, the ability to look at text and the ability to save the data in a format suitable for loading into a spreadsheet or pumping through gnuplot.
At the very least if you really do insist on runing sar, please lower your interval to 10 seconds! I promise, it won't hurt. Or at least drop it to a minute.
btw, Nice post Jeff. I'd like it if you write more about performance and reliability :-)
Actually, this utility will never show you a thread exhaustion problem. CPU, Disk I/O and working set can all be fantastic, but if IIS is just plain out of threads to handle the requests, you're hosed and your response time will show that. To diagnose that, look at Http queue length.
Perfmon has been the utility of choice for a long while in diagnosing performance problems on Windows. It has *WAY* more information than the utility mentioned here. Netstat also helps.
To be fair, Process Explorer does show disk performance as well, although I'm not sure if it's as good as RaPM does it.
cool tool! (-:
I get a nice rare item because of you.
Thank you, Jeff.
It is a very useful tool for young jedi knight just like me.(-:
Aha, thank you for pointing this out!
For those of you curious how to access it, run perfmon /s.
I love how MS decided to include this in Home Premium, rather then have users shill out money for the Ultimate edition to get it. (I'm looking at you, lusrmgr.)
You forgot the 5th bottleneck: Bad Code. Processes that loop excessively, mismanage memory, read/write disk badly or use the network poorly will appear to be the wrong bottleneck, when the real problem is a poorly written program.
But it's not available for XP, is it? Oh well, Windows 7 coming along shortly...
Now this looks good damn good. Much better than Task Manager and Process Explorer. I wish they would make a version for Windows XP users though. I haven't migrated to Vista concerned about the heavy DRM and the performance of the operating system.
Someone had to say this:
But in *nix land we've had text based tools like truss (that attaches to running processes and lets you see what they're up to) for about 20 years, no joke, plus top and so on. The GUI tools are there now too, of course.
The tool is renamed, at least in beta, on win7. It's called Resource Monitor there.
of course for gamers you have to worry about Bottleneck number 5 (GPU) as well.
That's a pretty easy one to detect, though. Do you still have all your arms and legs? If yes then you didn't spend enough on a GPU :p
Um, isn't perfmon in XP pro and windows 2003 as well?
One thing we need is a smarter Task Scheduler. Not a lot smarter, but enough to monitor resource consumption levels and sequence jobs based on system-wide thresholds and queues allowing at least hi/low prioritization of jobs. Start time alone is not enough.
There are lots of tasks that can stand to be batch processed, and this is something that can be used to better exploit multi-core/CPU systems.
Working against system-busy level setpoints such work could be done by serializing tasks while allowing parallel processing based on resource availability. Those setpoints could also be used to avoid starving interactive processes and those directly serving near-realtime needs (such as web servers, etc.), suspending batch processes as need be.
Sort of reinventing mainframe-style scheduling.
You guys should see some of the cool performance monitoring tools on linux and BSD/OS X.
Dang. Now I wonder if Apple borrowed the idea from Microsoft - after all, the task manager in OS X is called Activity Monitor and not at all iBusy, iWhatsRunning, or even FindAHog Pro.
Yeah, but does it have the minimizing/maximizing/close buttons on the LEFT? No, I don't think so. I'll wait for Windows 8.
Jeff, Drug analogies? Really? What's next, are you smoking crack? You can do better than that...
Another bottleneck, one I'm not surprised you missed out on given how dismissive you have been of quad cores, is synchronization overhead when trying to achieve parallelism.
Thread-switching overhead can show up as excessive kernel time when you know there shouldn't be excessive time spent in the kernel (e.g. in device I/O, low-level encryption, etc), while blocking overhead shows up as low CPU usage even though there's plenty of work to do and no excessive strain on memory or I/O.
imissmyjuno: But Activity Monitor shows you your computer's activity (so I'd say the name works). And it's really just a nice GUI for other tools (like 'top'). In fact, nothing that's core in OS X is i-Anything. The add-on tools (iWork, iLife, iDisk etc) have the i-name scheme. OS X is pretty i-Void when you get down to it.
Following a previous comment by DAVE, if you live over in the *nix side of the world, we have a cornucopia of performance and monitoring tools at our disposal: sar, uptime, vmstat, swapon, ps, performance analysis script (by redhat), top, dstat, /proc/interrupts, iostat, /proc/meminfo, /proc/slabinfo, date, ntop, unixbench, iozone, bc, and HP's collectl.
Using any combination of these tools, I think one could easily fall into analysis paralysis, if one was so inclined. Surely no one around here is obsessed with gigantic data sets..
There's an additional factor beyond bytes-per-second to consider when monitoring disk performance, and this is where average response time comes in to play.
Consider two systems, one is showing 600 Mbytes read per minute, and the other showing only 60.
At a glance one might assume the system pulling 600 Mb/min is under more disk strain, but what if those 600 Mb/min are coming sequentially of the disk? Any disk you'll find can easily stream 10 Mb/sec sequential without a blink. This system might not be under disk load at all.
Now the system pulling 60 Mb/min (or 1 Mb/s) on the other hand could be thrashing insanely across the platter trying to service a heavy random load, and working as hard as it possibly can.
This is where response time comes in to play. Of course there are (always) other factors involved, but with server type workloads an overloaded disk subsystem will tend to show higher IO response times, even while suffering lower total throughput.
What about the load average?
A system with low cpu, network or disk usage can have an high load average in any of these three aspects if understimated and so can have poor performance while appearing free of duty.
Lack of morning coffee is my biggest performance bottleneck...
This is all well and good for a closed source system where the solution to a bottleneck is to upgrade the offending hardware (or maybe if you're lucky you can play with some configuration item).
But I want to emphasize that if you are *developing* the code in question, there are much more effective tools for profiling your code directly. This will find the specific system calls which are at fault, so that hopefully you can optimize your code better...
Three cheers for gkrellm, which I've been using on Linux for years now. Without it, I feel naked *because I don't know what my computer's doing*.
As for a smarter task scheduler: There's one simple thing that could help a lot: Replace the current priority system with one that's more informative.
Tasks can indicate what sort of CPU priority they should get but that's it. That's useless for a background task that uses lots of disk activity.
How about making priorities: (System), Realtime, Interactive, Batch, Background.
These would apply to *ALL* activities by the thread, not just CPU time. The System one would be special, only the OS could use it and it's only used for things that need to be able to interrupt high priority tasks. (Example: The task manager--otherwise you would have no way of killing a realtime task that went wild.)
Tasks of higher priority execute before tasks of lower priority even if that means starving them. Unlike the current system it would apply to more than just the CPU. When a program does something that initiates disk activity that activity gets tagged with the priority of the task.
Thus you could kick off large disk-intensive tasks at low priority without making a dog of your system.
Don't discount Internet connection speed as a bottleneck either.
While your processor operates at a rate of billions of instructions per second, typical Internet rates are in the low millions, roughly around 10,000X slower.
The hard time I am having with all of this progress is that my old 1982 CP/M floppy based Osborne computer is able to bootstrap its OS and load Wordstar and my document in under 15 seconds and with all of the modern wonderful advances we are not able to get anywhere close to that...
Oh, speaking of Osborne.
I was googling a few facts and discovered he just passed away, my condolences to his family....
David E., all the modern wonderful advances are also going towards making software easier to write and manage, it's not all about performance.
Am I misremembering things? I haven't worked in the Windows world for several years, but I thought all this stuff was available in perfmon since the first version of NT?
Jeff, there is no top command in Unix. It's a Linux command.
Oh crap, ignore the above comment.
Ayende Rahien (go ahead, google him) recently maintained that screen I/O, network I/O, and disk I/O are the main bottlenecks, the first being the surprising one. I've seen this, though: in python scripts I've written for solving Project Euler problems, it's often the case that print statements take a large amount of the runtime. Strange but true.
bahh, Vista only.
(Btw, your captcha text is still 'orange')
'perfmon' is part of the MMC all the way back to the NT days. It is available in XP.
If you like numbers, give that one a shot. You can even set alerts for resource limits and set it up to log/send message/run program.
Wow, you dedicated an entire blog to a 1990 invention. MicroFanBoy syndrome is working overtime with all those lawsuits against free software now isn't it?
Take your blinders off Jeff, the special gifts from Ballmer are getting to your head a bit much.
Maybe this is a silly question, but what's w3wp.exe?
I've never seen this process, but TBH I've never administered a live public website.
Also, I thought the network detail section is per process and not per IP. Was it change or did I simply seriously miss out there?
Did you check that the IP you're banning isn't google's spider or some such?
w3wp stands for www worker process- it's the processed used for an asp.net application.
The CPU detail section gives you a moving average of CPU usage, which is much saner than Task Manager's always shifting numbers.
If you are still using XP and hate the jumping process feature, enable the CPU Time column and sort by it. You can group all of the heavy CPU use processes together and get a good idea of what your system is doing over time.
Try I/O Read Bytes, I/O Write Bytes, and I/O Other Bytes (network) to see over time usage of hard drive and network resources. You can't see the files that are being accessed but it will give you a little better picture of what is going on. If you see something posting massive read byte usage and it isn't your virus scanner or search indexer, you need to figure out what it is asap.
The first thing I do on every new XP install is add Available MBytes to the default counters in Performance Monitor, and change the scale of Average Disk Queue Length. I then add perfmon to my startup folder under Start Menu/Programs. I generally have perfmon, taskman and Process Explorer open at all times, so when the pc does start slowing down, I can very quickly check what the bottleneck is. Filemon usually comes next, leading to my killing the offending process (often virus scan or SMS agent service).
I did my Master of Engineering thesis on performance monitoring and root-cause analysis, and I also enjoy this field. Its amazing how little even knowledgeable and experienced computer users know about how their computers work.
Thanks for a great blog Jeff, I've been reading it for years, though this is the first time I've commented.
Looking at your graphs, I would say you would utilize the memory a bit more.
You can try using some web accelerator like Squid. (http://www.squid-cache.org/)
See if it alleviates a bit of your Disk/CPU.
Does anyone know of similar tools for other OSes?
This is *great*! I've been looking for something *exactly* like this....
Poor Jeff for having been limited to Process Explorer (and, *cringe*, task manager). As others have already mentioned, NT has had much better stats available for years, through perfmon. Try start-run, perfmon.msc :)
I'm guessing from a previous post that you use My Yahoo! Checkout what they have done to that zoomed in and cropped picture of the walnut shell. Maybe I fail the ink blot test, but for a second I wondered if this was going to be NSFW...
Yes, i 'm addicted to performance-monitoring too. And yes, the vista-tool is great.
Think first, then measure.
Does anyone know something that does a similar set of things but (besides Process Explorer) that works on XP?
To all the *NIX heads saying you should see what we've got on unix...the stuff Jeff is showing is only what's baked in for all users. As Z pointed out serious perf investigation will involve windows performance counters (baked into the OS) ETW (also baked in) and tools like xperf. For example xperf will let you look at CPU utilization on a per-thread basis and load and look at the CPU utilization down to the method level in windows.
As at least one other person has pointed out; perfmon has been around for quite some time, and if you are running on Windows OS prior to Vista / 2008 you should check it out. It's not quite as visually appealing as what you describe, but it is useful for the overview, and gives access to a reporting engine for graphing results.
Well, one guy already mentioned perfmon /s. You can then add a new graph for each resource on your computer you want to watch.
One more thing that a process could be waiting on is any arbitrary shared resource that is controlled by a synchronization primitive.
For a database that could be a row- or table-lock. For an application it could literally be anything.
That's where things start to get complicated.
Cool!!! this stuff is better than the unix one!
If I'm not mistaken, I think perfmon.msc is available in XP as well. You just have to enter it from a DOS window or the run command.
Please add your site at http://www.sweebs.com. Sweebs.com is a place where other people can find you among the best sites on the internet!
Its just started and we are collecting the best found on the net! We will be delighted to have you in the sweebs listings.
This is wow and new in Windows? Bwahaha! :D
Ok to be serious, a momentary picture is good but a good capacity management over time is more valuable.
** At any given moment, your computer is waiting for some operation to complete on one of these resources. **
You missed one: the user. Oftentimes, there's nothing to do at all.
Remember to turn on the extra columns in Task Manager for such nice things as GDI objects, handles, and threads. Very useful.
Explanation of above bug: I failed to enter my name. When the post was presented to me for resubmission it had already been htmlified internally. The unaccepted comment should have been dehtmlified before repopulating the comment box (or not htmlified until accepted), but it was not. The undehtmlified text was rehtmlified into the hash you see. The correct link is http://www.youtube.com/watch?v=7jKuHiY397U
@Jeff - I must admit I've skipped Vista after toying with it for a while, but have been appreciating the same sort of graphing and detailed output Windows7 offers. There's some really nice desktop widgets you can get too which will report this stuff all the time. Unfortunately, they seem to introduce a brief lag each time they report when I'm trying to play some of my more demanding games and so have to be turned off during that period. Generally though, they're extremely handy :-)
@BTTF - Some sort of Task Tuner would be pretty cool to adjust priorities in the system to whatever your current high priority task is. Something like that would be really great when windows is booting up and all you want to do is check your email; set it on the top of the stack and check your mail while the rest of your system comes up.