April 22, 2008
Lately I've been delving into the WordPress ecosystem, as it seems to be the most popular blogging platform around at the moment. I've set up two blogs with it so far. In the process, I've gotten quite comfortable with the setup, interface, and overall operation of WordPress.
I've been thoroughly impressed with the community around WordPress, and the software itself is remarkably polished. That's not to say that I haven't run into a few egregious bugs in the 2.5 release, but on the whole, the experience has been good bordering on pleasant.
Or at least it was, until I noticed how much CPU time the PHP FastCGI process was using for modest little old blog.stackoverflow.com.
For context, this is running on a Windows Web Server 2008 virtual machine with a single core of a 2.13 GHz Xeon 3210 entirely dedicated to it.
This is an incredibly scary result; blog.stackoverflow.com is getting, at best, a moderate trickle of incoming traffic. It's barely linked anywhere! With that kind of CPU load level, this site would fall over instantaneously if it got remotely popular, or God forbid, anywhere near the front page of a social bookmarking website.
For a bare-bones blog which is doing approximately nothing, this is a completely unacceptable result. It's appalling.
As evidence of what a systemic problem this is, there's an entire cottage industry built around shoehorning better caching behavior into WordPress. Take your pick: WP-Cache, WP-Super-Cache, or Bad Behavior. The caching add-ins don't work very well under IIS because they assume they're running on a *NIX platform, but they can be coerced into working.
Does it work? Does it ever. Here's what CPU usage looks like with basic WP-Cache type functionality enabled:
I'm not alone; just do a web search on WordPress CPU usage or WordPress Digg Effect and you'll find page after page of horror stories, most (all?) of which are solved by the swift and judicious application of the WP-Cache plugins.
It's not like this a new issue. Personally, I think it's absolutely irresponsible that WP-Cache like functionality isn't already built into WordPress. I would not even consider deploying WordPress anywhere without it. And yet, according to a recent podcast, Matt Mullenweg dismisses it out of hand and hand-wavingly alludes to vague TechCrunch server reconfigurations.
A default WordPress install will query the database twenty times every time you refresh the page, even if not one single element on that page has changed. Doesn't that strike you as a bad idea? Maybe even, dare I say it, sloppy programming?
I understand that users may have umpteen thousand WordPress plugins installed, all of which demand to change on every page load. Yes, the easiest path, the path of least resistance, is to mindlessly query the database every time you're building a page. But I cannot accept that a default, bare-bones WordPress install hasn't the first clue how to cache and avoid expensive, redundant trips to the database.
It's frustrating, because caching is a completely solved problem in other programming communities. For example, the .NET framework has had page output caching and page fragment output caching baked into ASP.NET for years.
I sure am glad I started this blog in Movable Type way back in 2004. Their classic static rendering blog engine approach may be derided today, but I shudder to think of the number of times the Coding Horror webserver would have been completely incapacitated over the years by the naive -- no, that's too tame -- brainlessly stupid dynamic rendering approach WordPress uses.
What I just don't understand is why, after all these years, and all these documented problems, WordPress hasn't folded WP-Cache into the core. If you're ever planning to have traffic of any size on a WordPress blog, consider yourselves warned.
Update: Matt Mullenweg kindly responded to this post and offered his recommended MySQL configuration optimizations. I definitely agree that the Query Cache is extremely important to performance, and for some reason it defaulted to off (zero size) on my installation. You may also want to look into innotop and mysqlreport to ensure that all your MySQL caches are functioning at appropriate levels. Also, thanks to a few commenters for letting me know that one of this year's Google Summer of Code projects is integrating caching into the core WordPress code. It is badly needed.
Posted by Jeff Atwood
I totally agree with this article. I did a test a few months back of Drupal and Wordpress. I used Drupal 6 and WP 2.3.3 for the test and the same base data set (content, categories, authors and comments) for each. I also enabled the core caching for each system (in WP that was the file based object cache, which is now out of the core), and found Drupal was delivering more than 4 times the requests per second.
Now that Wordpress has the new UI, hopefully the developers will spend time in core optimization. It seems that a lot is lost by focusing on keeping more plugins backwards compatible, but you get to a point when you must cut the losses and have the plugin developers update their code.
What's funny is the Wordpress pundits go on and on about how it's an advantage to serve up dynamic pages and how blogs that serve up cached pages are lame. Obviously most of them have not enterprise level websites.
Ask yourself this, is a database really required for a blog?
An average engineer adds complexity, a good one removes it.
What is wrong with static HTML?
I agree agree with Simon. The blog should be static pages that get refreshed every once in a while. It could be after every blog post or on a timer (like every 30 seconds). Put a cache header on the page to say not to cache this page, then all the load is on the web server servers static page with some load on the database for querying the content and rendering the web page (rendering could occur elsewhere).
I once worked on a content site that we updated nightly. It was very fast because the pages were static. Version 2.0 used dynamic queries for everything and it made the site much slower (user complained all the time about the performance). I beleive the better solution would be to keep the static nature of the site but refresh or content more frequently or refresh sections when the content was updated instead of a totally dynamic site.
Blogging and blogs are content and mostly static, so a dynamic retrieval engine is probably not needed. Every time you make a call into the database it is costly regardless of OS, platform, or RDMS.
blog articles themselves may be static, but blog comments are not.
therefore a db is needed.
In some cases a database is appropriate. Doing something like Reddit, Slashdot or Digg without a database is just silly.
However, for most blogs, you still don't need a database even if you allow comments. Why not just update the static HTML file with the new comment?
Using a database should not be automatic. They're hard to set-up, hard to design right, use lots of memory and CPU and are difficult to scale.
Even for sophisticated blogs, you probably want to write the changes to a database then update the static files as necessary. You don't want the brunt of your web-requests hitting application code unless it is required.
I love all of the comments about running PHP on Windows being the problem. 20 database hits per page? I'd be hung if I wrote code that did that. There is no way to reconcile the fact that this is not necessary and expensive.
Jeff, I'm not sure if you are aware, but running PHP on windows is not only slow, but emboldens the terrorists.
Blah blah blah rabble rabble Linux rabble Ubuntu blah blah rabble blah Windows Server!!?!?!?? Blah blah corporations blah blah blah inferior blah freedom blah blah borg blah blah blah fascists rabble rabble!
Blah blah STOP USING WINDOWS NOW!!!!!
Now I'm going back to slashdot.
I used to have a number of sites running WordPress on various shared servers (Linux), and when traffic spikes, CPU usage goes through the roof (WP_Cache has a problem in 2.5 with .gzip; I haven't figure that out yet), Shared Hosts get panicky, and close down sites-- pretty much like everyone else's WP site with traffic.
I decided to dump WP and move all sites to ExpressionEngine (running on Linux and Xserve), which has a highly effective multi-level caching system. Rock solid, even with regular DIGG hits. EE has a price tag but the benefits out weigh the dollar cost. Love it.
I understand that it's beneficial to have full control over your blog if it is a major part of what you do, but for most people blogger or something should suffice.
I believe thinkchristian.net is on wordpress. This would explain its sluggishness.
WP2.5 removed the native file-based object cache. There is a caching API in place and it does reduce a lot of DB calls during each page request, but the data is not stored anywhere after that.
As far as the object cache being better than wp-cache, that depends. If you aren't relying on a lot of truly dynamic data and have a small user base, then wp-cache is the way to go. If you need dynamic abilities that change every few page requests, then the object cache is the best method. Also if you do things like required registration for comments on a larger blog, wp-cache will kill your server with invalidating old cache files for each user once a new comment is posted. I found that out the hard way on a site with 6,0000 users that averages over 1,000 comments a day. Now I use my own caching for non-logged in users that works off the same premise as super-cache. Logged-in users get dynamic pages (with object cache using an APC backend) and non logged-in users get the static file (through mod_rewrite). Works out really well that way.