Over the last three years, this site has become far more popular than I ever could have imagined. Not that I'm complaining, mind you. Finding an audience and opening a dialog with that audience is the whole point of writing a blog in the first place.
But on the internet, popularity is a tax. Specifically, a bandwidth tax. When Why Can't Programmers.. Program? went viral last week, outgoing bandwidth usage spiked to nearly 9 gigabytes in a single day:
That was enough to completely saturate two T1 lines-- nearly 300 KB/sec-- for most of the day. And that includes the time we disabled access to the site entirely in order to keep it from taking out the whole network.* After that, it was clear that something had to be done. What can we do to reduce a website's bandwidth usage?
1. Switch to an external image provider.
| Size of post text | ~4,900 bytes |
| Size of post image | ~46,300 bytes |
| Size of site images | ~4,600 bytes |
The text only makes up about ten percent of the content for that post. To make a dent in our bandwidth problem, we must deal with the other ninety percent of the content-- the images-- first.
Ideally, we shouldn't have to serve up any images at all: we can outsource the hosting of our images to an external website. There are a number of free or nearly-free image sharing sites on the net which make this a viable strategy:
I like ImageShack a lot, but it's unsuitable for any kind of load, due to the hard-coded bandwidth limit. Photobucket offers the most favorable terms, but Flickr has a better, more mature toolset. Unfortunately, I didn't notice the terms of use restrictions at Flickr until I had already purchased a Pro account from them. So we'll see how it goes. Update: it looks like Amazon S3 may be the best long-term choice, as many (if not all) of these photo sharing services are blocked in corporate firewalls.
Even though this ends up costing me $25/year, it's still an incredible bargain. I am offloading 90% of my site's bandwidth usage to an external host for a measly 2 dollars a month.
And as a nice ancillary benefit, I no longer need to block image bandwidth theft with URL rewriting. Images are free and open to everyone, whether it's abuse or not. This makes life much easier for legitimate users who want to view my content in the reader of their choice.
Also, don't forget that favicon.ico is an image, too. It's retrieved more and more often by today's readers and browsers. Make favicon.ico as small as possible, because it can have a surprisingly large impact on your bandwidth.
| Post size | 63,826 bytes |
| Post size with compression | 21,746 bytes |
We get a 66% reduction in file size for every bit of text served up on our web site-- including all the JavaScript, HTML, and CSS-- by simply flipping a switch on our web server. The benefits of HTTP compression are so obvious it hurts. It's reasonably straightforward to set up in IIS 6.0 , and it's extremely easy to set up in Apache.
Never serve content that isn't HTTP compressed. It's as close as you'll ever get to free bandwidth in this world. If you aren't sure that HTTP compression is enabled on your website, use this handy web-based HTTP compression tester, and be sure.
It is great. Until your ealize just how much bandwidth all that RSS feed polling is consuming. It's staggering. Scott Hanselman told me that half his bandwidth was going to RSS feeds. And Rick Klau noted that 60% of his page views were RSS feed retrievals. The entire RSS ecosystem depends on properly coded RSS readers; a single badly-coded reader could pummel your feed, pulling uncompressed copies of your RSS feed down hourly-- even when it hasn't changed since the last retrieval. Now try to imagine thousands of poorly-coded RSS readers, all over the world. That's pretty much where we are today.
Serving up endless streams of RSS feeds is something I'd just as soon outsource. That's where FeedBurner comes in. Although I'll gladly outsource image hosting for the various images I use to complement my writing, I've been hesitant to hand control for something as critical as my RSS feed to a completely external service. I emailed Scott Hanselman, who switched his site over to FeedBurner a while ago, to solicit his thoughts. He was gracious enough to call me on the phone and address my concerns, even walking me through FeedBurner using his login.
I've switched my feed over to FeedBurner as of 3pm today. The switch should be transparent to any readers, since I used some mod_rewriteISAPIRewrite rules to do a seamless, automatic permanent redirect from the old feed URL to the new feed URL:
# do not redirect feedburner, but redirect everyone else RewriteCond User-Agent: (?!FeedBurner).* RewriteRule .*index.xml$|.*index.rdf$|.*atom.xml$ http://feeds.feedburner.com/codinghorror/ [I,RP,L]
And the best part is that immediately after I made this change, I noticed a huge drop in per-second and per-minute bandwidth on the server. I suppose that's not too surprising if you consider that the feedburner stats page for this feed are currently showing about one RSS feed hit per second. But even compressed, that's still about 31 KB of RSS feed per second that my server no longer has to deal with.
It's a substantial savings, and FeedBurner brings lots of other abilities to the table beyond mere bandwidth savings.
There's a handy online CSS compressor which offers three levels of CSS compression. I used it on the main CSS file for this page, with the following results:
| original CSS size | 2,299 bytes |
| after removing whitespace | 1,758 bytes |
| after HTTP compression | 615 bytes |
We can do something similar to the JavaScript with this online JavaScript compressor, based on Douglas Crockford's JSMin. But before I put the JavaScript through the compressor, I went through and refactored it, using shorter variables and eliminating some redundant and obsolete code.
| original JS size | 1232 bytes |
| after refactoring | 747 bytes |
| after removing whitespace | 558 bytes |
| after HTTP compression | 320 bytes |
It's possible to use similar whitespace compressors on your HTML, but I don't recommend it. I only saw reductions in size of about 10%, which wasn't worth the hit to readability.
Realistically, whitespace and linefeed removal is doing work that the compression would be doing for us. We're just adding a dab of human-assisted efficiency:
| Raw | Compressed | |
| Unoptimized CSS | 2,299 bytes | 671 bytes |
| Optimized CSS | 1,758 bytes | 615 bytes |
It's only about a 10 percent savings once you factor in HTTP compression. The tradeoff is that CSS or JavaScript lacking whitespace and linefeeds has to be pasted into an editor to be effectively edited. I use Visual Studio 2005, which automatically "rehydrates" the code with proper whitespace and linefeeds when I issue the autoformat command.
Although this is definitely a micro-optimization, I think it's worthwhile since it reduces the payload of every single page on this website. But there's a reason it's the last item on the list, too. We're just cleaning up a few last opportunities to squeeze every last byte over the wire.
After implementing all these changes, I'm very happy with the results. I see a considerable improvement in bandwidth usage, and my page load times have never been snappier. But, these suggestions aren't a panacea. Even the most minimal, hyper-optimized compressed text content can saturate a 300 KB/sec link if the hits per second are coming fast enough. Still, I'm hoping these changes will let my site weather the next Digg storm with a little more dignity than it did the last one-- and avoid taking out the network in the process.
* the ironic thing about this is that the viral post in question was completely HTTP compressed text content anyway. So of all the suggestions above, only the RSS outsourcing would have helped.
Posted by Jeff Atwood View blog reactions
« Your Code: OOP or POO? Using Amazon S3 as an Image Hosting Service »
Well said!
Jon Galloway on March 6, 2007 12:30 AMinteresting observations.. for those of us who don't consume the amount of hosting bandwidth of this magnitude... what's the dollar impact of a 9gb day?
(oh yeah.. good move disabling comments at the height of traffic on that day!)
Dave on March 6, 2007 12:38 AMVery useful and in-depth. Thank you very much for sharing this! I would say that optimizing CSS/JS would be very minimum if HTTP compression is enabled, but I'm not sure (would really appreciate some statistics).
Hafiz on March 6, 2007 12:50 AMIn your page size comparison you're forgetting about the effect of blog comments, which probably account for > 80% of your text bandwidth on popular posts!
Not sure what the answer is - maybe page the comments rather than presenting the whole list, maybe don't show any comments for the primary link - but it's a big part of the problem.
And yes, HTTP compression is a big part of the solution.
- Roddy
Roddy on March 6, 2007 01:03 AMAre you going to consider chopping your posts into comment and non-comment pages, like most blogs?
The most visible RSS change is that it no longer updates at precisely 11:59 every night, now. ;)
Foxyshadis on March 6, 2007 01:21 AMAnother useful condent provider network is Amazon S3. We use a combination of Limelight (expen$ive CDN) and Amazon S3 on our site.
- Dave
David on March 6, 2007 01:23 AMUnfortunately, we are at the mercy of poorly coded aggregators.
The polling nature of RSS is unfortunately a huge bandwidth leech. One protocol that can help stem the tide (for a short while at least) is to implement "RFC3229 for feeds" as described by Bob Wyman (http://www.wyman.us/main/2004/09/using_rfc3229_w.html).
This is a HTTP delta encoding protocol, but applied to RSS and ATOM feeds. I spent a lot of time implementing this (http://haacked.com/archive/2005/07/01/Potential_For_A_Subtle_Bug_in_RFC3229_Implementations.aspx) in Subtext and testing it in RSS Bandit.
But like many of my noble but lost causes (http://haacked.com/archive/2007/03/02/A_Comparison_of_TFS_vs_Subversion_for_Open_Source_Projects.aspx), I think adoption rate is too poor to really make a difference. At least RSS Bandit is a good citizen regarding in this regard.
I'll keep banging this drum, but will aggregator developers listen?
Phil Haack on March 6, 2007 01:55 AMAnother way is of course to write shitty articles that nobody cares to read ;-)
Denis
As a precursor to outsourcing your image serving, I would have thought that you could do much to reduce the size of your image files. That 46,300 Bytes for the image seemed a lot to me. Clearly, outsourcing your images altogether is a total fix, but one day those nasty T&C may come back to bite you. Plus, smaller images will be served faster wherever they are retrieved from.
I would strongly recommend Macromedia Fireworks (now owned by Adobe, I think) and Adobe ImageReady to allow you to see the same image with different image compression levels side by side. Mind you, from past posts, I realise that you know all about image compression...
Presumably, you now have an extra step in your blogging process in terms of uploading images to your image server. I'd be interested how this effects your blogging...
Nij on March 6, 2007 03:31 AMThese are all very good suggestions, most of which I have been coding into a CMS I'm working on. One thing that you have not mentioned though is to enable content caching by making sure that all of your page elements are served with 'Last-Modified' and 'ETag' headers, and then responding with an HTTP '304 Not Modified' when the client is smart enough to do a conditional get -- that brings the bandwidth for page refreshes as close to zero as possible. You can also add an 'Expires' header with a time value in the future and get the clients (and/or intermediate caches) to not even go back to the server for fresh copies of content. 'Expires' is most appropriate for images, but also for javascript and css files. If you add a future Expires header to the content pages themselves, then people will not see comments right away, but you could use a five minute offset or some other very low value for content that is routinely modified.
zachofalltrades on March 6, 2007 03:50 AM> As a precursor to outsourcing your image serving, I would have thought that you could do much to reduce the size of your image files. That 46,300 Bytes for the image seemed a lot to me.
> Mind you, from past posts, I realise that you know all about image compression...
He could be much better though. E.g. the codinghorror-bandwidth-usage.png image in this post is 5593 bytes.
Opening it in Photoshop and then saving it for the web by reducing the number of indexed colors from 64 (!) to 8 brings it down to 3633 bytes, and running the output PNG into PngOptimizer (as simple as dragging & dropping it) further reduces it to 2992 bytes.
A near 50% size reduction in around 50 seconds...
Masklinn on March 6, 2007 04:04 AMThree naive questions from a desktop application developer:
1. Is it possible to exclude inefficient RSS feeds?
2. Is it possible to throttle the polling frequency of the feeds, e.g., set a cap of one poll per minute?
3. Is it possible to have a "no images" version of your posts, and restrict RSS access to only those versions?
What about Amazon S3?
jeka911 on March 6, 2007 04:54 AMIt's a great article, BUT...
I think you should've included some conditional statements instead of suggesting to blindly perform these actions to instantly save bandwidth.
For example, you posted an article a couple of years back (yes, it's really been that long ago...) about being careful with HTTP compression:
http://www.codinghorror.com/blog/archives/000059.html
These are probably perfect courses of action for your specific configuration.
There could be a lot of pro IT people trying to enable compression on an IIS 5 server right now...
Do your homework people!
Thanks for the great articles!
Roger on March 6, 2007 05:05 AMWe hit a huge bandwidth bottleneck at my previous employer, and the first thing we did was turn on IIS compression. That worked pretty well, but we noticed significant improvements when we switched to x-compress (www.xcompress.com). It caused lower CPU usage on the server, and seemed to compress the pages more.
Danimal on March 6, 2007 05:05 AMI second Dave's idea. S3 is a great, inexpensive service to use for hosting media. I use it for hosting podcasts I set up for my church.
Michael A. Vickers on March 6, 2007 05:06 AMWell, your RSS feeds seemed to work OK. I was reading it in Google Reader and it took a few seconds before it sunk in. (yes, it's early).
Eric D. Burdo on March 6, 2007 05:15 AMIt's pretty common for shared hosting providers to offer more than a terabyte of bandwidth a month for ~$7.00. That's no excuse to slack on the optimizations but you can serve 10 gigabytes every single day for a month and still not even use a third of your allowance.
Just stick with one of the big providers that don't particularly care that your site just got on Digg or Reddit or Del.icio.us as long as you're within the bandwidth limit as long as your site isn't using 100% of the cpu on their multi-processor-multi-core enterprise blade systems. (Smaller hosting companies will shut you down a dime because of "cpu usage", larger hosting companies give you a lot of latitude).
Cool article though, lots of good stuff.
Patrick Hunlock on March 6, 2007 05:18 AMI would also update your ISAPIRewrite to include feedvalidator user-agent.
FeedBurner's FeedMedic "Validate your Source Feed" links points to feedvalidator, so in case you would like to check your feed for errors, feedvalidator should get the real source, not the feedburner's version.
German Rumm on March 6, 2007 05:38 AMAn alternative javascript compressor that I've used is Dean Edward's Packer(http://dean.edwards.name/packer/). On the pre-filled example from the JS Minifier page the result was worse, but I just tested some code from a personal project and the results are as follows:
Pre-Compression Size: 5257
JS Minifier Size: 2695
Packer Size: 2134
I just wanted to throw another tool out there in our quest for smaller bandwidth bills.
Josh Bush on March 6, 2007 05:52 AMJeff,
Have you looked into AllYouCanUpload for your image hosting?
http://allyoucanupload.webshots.com/
http://www.techcrunch.com/2006/05/29/cnets-allyoucanupload-is-disruptive/
Matt Blodgett on March 6, 2007 05:59 AMI'm also a fan of Photobucket. I'm a little intrigued about Imageshack, especially after looking at the "common questions" to which you linked and spotting your error. According to that, their hourly limit is 100GB, not 10GB.
Brother Erryn on March 6, 2007 06:00 AMDon't forget your ISP. I use roadrunner and there is a 25MB limit for 'users home page', which I store only images for my web page.
Mat on March 6, 2007 06:14 AMFeedburner has worked very nicely for me. I have over 10,000 subscribers to my feed at macmegasite, so it's made a huge difference in my traffic. The ads in my feed also pay very nicely. I like your mod_rewrite code better than what I'm doing now - specifying a private feed for feedburner and redirecting the public feeds to feedburner. I'll probably have it use the user_agent instead.
Mike Cohen on March 6, 2007 06:28 AMnice article, has giving me something to think about myself.
gregf on March 6, 2007 06:38 AMHi Jeff,
I had some initial fears in using FeedBurner as well. I went with their "my brand" feature which lets me use my own URL via a CName record for about $3/month. This way if they ever go belly-up or get sold/etc I can simply remove the CName and handle the feed on my own again.
-Scott
Scott Watermasysk on March 6, 2007 06:45 AMHow about "none of the above" and ditch the T-1 and get 10/100/1000 fiber? We went from a T-1 ($600/month) to 10 megabit Fiber, unlimited transfer, for $700. Easily paid for itself. Linux ISO downloads in 5 minutes!
indy on March 6, 2007 06:58 AMFlickr should work for you... the price of putting a few bits of anchor taggage around the image (to comply with their TOS) is minor compared to offloading the image bandwidth.
Aaron B. Hockley on March 6, 2007 07:09 AMI've used Django (the python web framework) for a while, and they have a simple template tag called {% spaceless %} which automatically removes all unneeded whitespace and line breaks from your html without having to un-uglify your code every time you want to edit it. It's the bees knees!
Dan on March 6, 2007 07:16 AMI would love for one of the Amazon adopters to contact me at michael@sophio.com to help with integration of our software (paying gig).
Michael Birnholz on March 6, 2007 08:01 AM>Three naive questions from a desktop application developer:
>1. Is it possible to exclude inefficient RSS feeds?
>2. Is it possible to throttle the polling frequency of the feeds, e.g., set a cap of one poll per minute?
Not really. The problem is not the feeds(which are just XML files on a server). The problem is inefficient feed readers(programs that pull the feeds and possibly way too often). There's probably no reliable way of identifying them. If they set the user agent header(part of the http message that identifies the program making the request), you can detect which one is making it, but a poorly coded reader probably didn't bother to set this optional setting and even if you could, you probably don't want to tell a user which feed reader they're allowed to use.
You could track the frequency of traffic by IP and not serve up the feed more than once a minute to any given IP, but I'd imagine proxies would mess up that method since they are requesting on behalf of many different users. You could use an expires header that tells the proxy not to re-request for some amount of time, but that only works if they bother to honor it. And blocking that re-request may hurt the end user experience because even a well written feed reader might be blocked because of a poorly written one on the same proxy.
>3. Is it possible to have a "no images" version of your posts, and restrict RSS access to only those versions?
RSS readers only access an XML file that lists a number of articles and some text content. They shouldn't be requesting any images in the content, at least not until someone tries to actually view the content, in which case you would want them to be able to load the images anyway.
This particular feed could probably shed 90% of its size if the whole article weren't present in it though. Most feeds I've seen only have a short description. I'm sure including the whole article was a conscious choice though because the owner didn't want users to have to navigate to his site to read the post.
Lee on March 6, 2007 08:07 AMJeff, glad you looked into my comment on the oop vs poo comment, hope it didn't come off as overly sarcastic.
Darren Kopp on March 6, 2007 08:10 AMFlickr's Terms of Use indeed say, "professional or corporate uses of Flickr are prohibited", but Yahoo!'s don't. Flickr was acquired by Yahoo and accounts now fall under the Yahoo terms: http://www.flickr.com/terms.gne
Robert Brewer on March 6, 2007 08:16 AMNice tutorial. Usefull if bandwith really is a big problem.
Ivan Minic on March 6, 2007 08:17 AMI think S3 is a pretty good solution. I just completed a test for http://www.famundo.com using it for all static content and the results are great. We are going to deploy this into production in a few days.
You can even serve compressed jc/css directly from S3, but there are some limitations with it. You can see my post about doing it here: http://devblog.famundo.com/articles/2007/03/02/serving-compressed-content-from-amazons-s3
It has code sampled on how to upload compressed assets from S3.
The big lure of S3 is the price. Even if you transfer 30GB a month, we're talking a few dollars only. Nothing else even comes close. And it is a pretty stable netwrok.
Guy Naor on March 6, 2007 08:18 AMThis is good stuff, Jeff! Thanks for the tips. The HTTP compression stuff was news to me -- I didn't know that was built into Apache and IIS. My bandwidth is creeping up there so I might have to look into that, along with hosting my images elsewhere (it's kind of a chore, though). Thanks!
vemrion on March 6, 2007 08:18 AMHowever nice external image hosters can be, I did notice that all of the images on the post I read yesterday were ugly "this user has exceeded bandwith" images. It makes the article rather hard to track.
JD on March 6, 2007 08:18 AMI've always been curious as to how much bandwidth savings you get for doing HTML, CSS and JS compression in addition to HTTP Compression. I'm no compression expert, but it seems that HTTP Compression would be doing roughly the same thing.
Phil Scott on March 6, 2007 08:30 AMLiveJournal.com also has very innexpensive image hosting, $25 per year. (I don't think there is a bandwith limit, at least none that I've hit. lol)
Jim on March 6, 2007 08:32 AMI've never load tested my own little web server, so I guess I won't say anything but that I get 400 GB in transfer from Network Solutions for 15 or 20 bucks a month. And I'm just buying as a consumer, so I always thought I was grossly overpaying in the name of having a name which would care if I griped loud enough.
I might guess that if you called up your hosting company and said, "Um, yeah... I run a blog that's mostly text and I went through 9 GB of transfer yesterday. It turns out that my userbase is heavily biased towards highly skilled programmers, and I'm wondering if you would like to put a link to my hosting company on the lower left hand side of my page's sidebar in exchange for some leniency next time I go overboard on the hosting bill?" your bandwidth issues might not reappear.
I'm reading Founders at Work; generally it appears that on the Internet popularity is a problem which solves itself.
Dylan Brams on March 6, 2007 08:39 AMThere is also the ETag (If-None-Match) and Last-Modified headers, but again this depends on the feed readers being aware of them. I think setting up Apache to handle Etag is particularly difficult when on an NFS filesystem, alas. At least it was when I last looked, or I misunderstood something.
hgs on March 6, 2007 09:07 AMI'm using a Pro Photobucket account for the simple reason that the interface for uploading/editing images is so much better than what comes with Wordpress. Pro was well worth not having to worry about months like February when I hit digg/reddit/stumbleupon/lifehacker/problogger multiple times.
I also have a Pro Flickr account but that just used for photography. I don't find that the tools are as good as photobucket for blogging.
I use a free FeedBurner account for RSS. It's great. FeedBurner is *such* a good product. Feed flares, reader chicklets, email subscriptions, good stats. Amazing tool.
http://feeds.feedburner.com/engtech
engtech on March 6, 2007 09:12 AM> especially after looking at the "common questions" to which you linked and spotting your error. According to that, their hourly limit is 100GB, not 10GB
> I did notice that all of the images on the post I read yesterday were ugly "this user has exceeded bandwith" images
That's because I exceeded the 100GB/hour limit on imageshack.us a number of times.
It's why I switched from imageshack.us to a paid Flickr account..
Jeff Atwood on March 6, 2007 09:14 AMOh, and the "worrying about handing over my feed to FeedBurner" issue can be solved with buying a paid upgrade that let's you use your own domain name for the feed.
engtech on March 6, 2007 09:14 AMThis is a slightly different issue, and I'm not sure what blog software you're using, but a lot of people recommend wp-cache for Wordpress blogs that are getting hit by huge traffic spikes.
http://mnm.uib.es/gallir/wp-cache-2/
Makes Wordpress as efficient as a static site.
engtech on March 6, 2007 09:19 AM> Have you looked into AllYouCanUpload for your image hosting?
This service looked good until it forcefully converted the 5kb PNG in this post into a 22kb JPG.. (!) WTF? Most photo sharing sites respect the original file format.
Also, Masklinn, this PNG requires more than 16 colors due to the ClearType font aliasing. We've covered this topic before..
http://www.codinghorror.com/blog/archives/000464.html
But yes, OptiPNG is pretty cool, and I do periodically run it on my /images folder.
Great article.
I've toyed with the idea of outsourcing the image hosting on my site, but I'm concerned that if anything should happen to the hosting service (going out of business, change of TOS, etc.) it would be a nightmare to find another host, re-upload years worth of images, and change all the links on my site. Yes, Flickr seems like a pretty safe bet as far as Web 2.0 goes, but who can really guarantee it will be around in 2010?
Call me a control freak, but I'm keeping my images on my own server for the time being. It's worth the extra few dollars per month to me.
Podophile on March 6, 2007 09:49 AM
I can't recommend Cachefly enough... http://www.cachefly.com
Jason on March 6, 2007 09:53 AMThis causes a problem with software such as DotNetNuke in which downloads (such as a zip download) is piped to the response through the ASP.NET runtime.
For example, using Response.WriteFile.
Phil
Phil "doofus" Haack on March 6, 2007 09:57 AMThe only problem with using Flickr as your image server is that more and more corporate filters are blocking it. I see nothing but red x's.
Mike on March 6, 2007 10:07 AMI've gotten an email about the flickr block, too. For those of you behind a corporate filter, does ANY of this stuff work?
http://www.flickr.com/
http://www.imageshack.us/
http://www.photobucket.com/
http://www.webshots.com/
[update from reader: ALL of these are blocked, with a weird oversight in the case of imageshack: they don't block the imgxx.imageshack.us servers ]
Looks like Amazon S3 is the only viable choice now .. I've updated the post to reflect that choice.
Jeff Atwood on March 6, 2007 10:21 AMThanks Jeff, good ideas. I just added some new TODO items to the web publishing system I'm working on -- access policies that restrict too-frequent requests from the same source, and automatic whitespace stripping when you edit a node.
I would guess most RSS readers also are sending proper headers with their requests so the website only has to send them the RSS document if it changed-- are they?
Reed
I would think that you could improve much further simply by restructuring your blog. There is no need to post several blogs on one page. Why show all of the images and everything for blogs that are several days old? Just show a header if you need to. Or add a table of contents. Also, as someone else suggested, paging the comments would help tremendously (but at the cost of usability). Finally, I come to your blog often but don't always read the article itself if it doesn't interest me. Having a table of contents and forcing people to actually navigate to the blog would save bandwidth from people like me who only need to see a header and a short description to know that I am not interested. Why send me pictures, article text, and several old blogs if I'm not going to even read it?
Matt on March 6, 2007 10:34 AMSay, what about coral cache?
Linking through Coral: http://www.codinghorror.com.nyud.net:8080/blog/images/coding-horror-official-logo-small.png
All you do is use their domain. After a few requests it ought to cache it for you. Downsides are that it is slower than a direct request, and like the other offloading solutions, you are dependent on their DNS system to work.
Reed Hedges on March 6, 2007 10:39 AMsee http://www.coralcdn.org btw.
Reed Hedges on March 6, 2007 10:41 AMFor photos I really like photobucket and imageshack. But I think a good lesson is never blog on a quasi program due to peoples desire to see if they can do it.
Brandon on March 6, 2007 10:54 AMi'm going to second (or third) some of the commenters here and say you should also explore caching if you're concerned about rapid-fire access. if the bottleneck is processor speed you can just do internal caching, but if the bottleneck is purely the number of http requests you should check out external caching. external might be pretty pricey though... haven't used it personally, just for work.
Jenn on March 6, 2007 11:04 AMHuh, I didn't even notice the feed switchover until I checked the logs of the feed client I've been writing in my free time:
I, [2007-03-05T17:01:07.650092 #30592] INFO -- : fetching feed http://www.codinghorror.com/blog/index.xml
I, [2007-03-05T17:01:08.479549 #30592] INFO -- : Feed http://www.codinghorror.com/blog/index.xml had status code 304
I, [2007-03-05T19:09:09.354200 #30592] INFO -- : fetching feed http://www.codinghorror.com/blog/index.xml
I, [2007-03-05T19:09:09.920126 #30592] INFO -- : Feed http://www.codinghorror.com/blog/index.xml has permanently moved to http://feeds.feedburner.com/codinghorror/
I, [2007-03-05T19:09:09.933376 #30592] INFO -- : fetching feed http://feeds.feedburner.com/codinghorror/
I, [2007-03-05T19:09:11.381360 #30592] INFO -- : Got http://feeds.feedburner.com/codinghorror/ (id 210) with status OK
I, [2007-03-05T19:09:11.381550 #30592] INFO -- : Parsing fetched feed http://feeds.feedburner.com/codinghorror/ (id: 210)
I, [2007-03-05T19:09:12.351278 #30592] INFO -- : Updating http://feeds.feedburner.com/codinghorror/
I, [2007-03-05T21:10:59.955918 #30592] INFO -- : fetching feed http://feeds.feedburner.com/codinghorror/
I, [2007-03-05T21:11:00.132586 #30592] INFO -- : Feed http://feeds.feedburner.com/codinghorror/ had status code 304
I suppose enabling compression would also be a good thing to do (it's on my todo list). With the checking for whether a feed's updated though, it hasn't been as high a priority.
If you're worried about feedburner, one thing I've seen a lot of sites do while tailing my logs is to just do a temporary redirect for feeds, either a 302 Found or a 307 Temporary Redirect. While it won't completely eliminate your bandwidth usage, it should allow you to switch underlying external providers without having to pay for any "professional" accounts on any single provider's service.
emag on March 6, 2007 11:30 AMCaching needs to be considered, if it's not already done, at every level: web server, php, database, etc.
Does the network, at every network interface, have traffic shaping/TOS enabled? There should be no possibility of an external influence "taking down" the internal network.
softdev on March 6, 2007 11:54 AMI would also suggest Joyent's BingoDisk (http://www.bingodisk.com/) for your list. Low annual fees and no bandwidth charges or limitations. (Although they do have a somewhat vague warning against abusing the bandwidth.) They specifically mention that it is useful for hosting podcasts, images, etc. though.
scotje on March 6, 2007 12:06 PMIsn't just buying a virtual host with a 1000 gigs of Transfer (or even more) at a hosting provider better? You say - switch to an external image provider. I say - why not just switch to a hosting provider?
Are there any advantages of hosting the blog yourself specially when hosting providers are offering competitive rates and giving you Remote Desktop access to your virtual servers with 1000 gigs of data transfer or even more at very reasonable rates (some of them are less than 50 bucks a month)?
I've never hosted out my house so I wouldn’t know but I’ve never had problems with a lot of good hosting providers. But then again, I’ve never hit crazy traffic like your site does ;)
So why do you host the site yourself? Are there any added advantages of doing that?
rajiv on March 6, 2007 12:14 PMI'd just like to put a little warning here for people that are planning to mess with the IIS metabase. If you're not careful with it, Notepad will change every X hundreth character to a "?" totally corrupting the file. Saving it as UTF-8 (which isn't the default, nice.) may help, but using WordPad instead will definitely work.
HitScan on March 6, 2007 12:15 PMGoodness, that is a fierce amount of traffic for a blog!
Maybe you could further limit the bandwidth use by requiring people to solve FizzBuzz before they're allowed subscribe to RSS? ;)
Rick Brewster on March 6, 2007 12:19 PMIf you use PHP's Smarty Template manager (http://smarty.php.net) it has a built in "strip" function that can strip white space from your HTML.
Bill on March 6, 2007 12:22 PMIt should be noted that Flickr is a _photo sharing_ site and not an _image hosting_ site. There is a big difference in what those words mean and what Flickr's purpose is.
Matt on March 6, 2007 12:30 PMDreamHost offers Files Forever if you have an account with them.
http://wiki.dreamhost.com/index.php/Files_Forever
Logan Lindquist on March 6, 2007 12:36 PMExcellent information Jeff! Thanks for sharing. -RR
RRanger on March 6, 2007 12:54 PMYou might also find the book "Speed up your site" handy. its got a lot of good advice as well as having a complementary URL that can review your site for you:
http://www.websiteoptimization.com/services/analyze/
Randy Peterman on March 6, 2007 01:06 PMI was taught a dozen years ago to REDUCE image size whenever possible. I see you used a png in your example, but I was able to take that 46k png and make an acceptable 18k jpg. The book that I used years ago, Designing Web Graphics by Lynda Weinman was a big help - Lynda.com. Sadly today, with bandwith at home being nice n fat and print graphic designers working on the web, everyone just puts up the biggest, fattest image, without thinking about reducing every image to that 'sweet spot' of small file size and still maintain a good looking image. cheers.
joe on March 6, 2007 01:12 PMRe: Lee on March 6, 2007 08:07 AM
Thanks for the answers, Lee. I can code up just about any kind of OO app conceivable, but this web thingy gets me all confused, what with the intertubes and all that stuff. My limit is hacking HTML and a bit of JavaScript.
woah, thanks for the link to websiteoptimization.
I have a ton of 17k thumbnails in my sidebar on every page that I thought were a lot smaller.
I'm doing 210kb for a page that has nothing but icons and sidebar images.
engtech on March 6, 2007 01:14 PM> was able to take that 46k png and make an acceptable 18k jpg
If by "acceptable" you mean "full of nasty compression artifacts". The image you're referring to has strong, delinated areas of color, like a comic strip. Thus, PNG is a better format in this case:
http://www.codinghorror.com/blog/images/gag-fake-dog-poo.png
I've written all about JPG in the past, so believe me, I know the tradeoffs:
http://www.codinghorror.com/blog/archives/000464.html
http://www.codinghorror.com/blog/archives/000629.html
9gb in a day is still not that much. If you have good hosting, like dreamhost.com where you get 1tb of bandwidth a month, and it grows, you are not worried about 9gb in a day.
bryan on March 6, 2007 01:20 PMGood advice!
I am using gzip to compress my html and javascript files at: http://www.bizdiggers.com, and the size has been reduced about 70%. You can also try it. I also tried to gzip the CSS file, but it doesn't display properly at some computers (maybe the brower issue)
Interesting. When I used the compression tool you linked to against lazycoder.com/weblog, it reported around 86% compression of the returned page vs. 68% when I run it against codinghorror.com/blog. I've got a lot more going on my page vs. your minimalist theme. I'd think mine would be harder to compress. I wonder where the big differences were? I'd think images, but I've got a lot more images.
Scott on March 6, 2007 01:47 PMHow about swooshing your Website http://www.redswoosh.net/
Matt on March 6, 2007 01:47 PMThought I'd mention a bug a while back with various versions of mod_deflate for apache. Reloading the service to update configuration changes could sporadically cause pages to hang when trying to load related css files. Just putting it out there in case someone happens to runs into it
doubledjd on March 6, 2007 01:51 PMGood Tips, however, I find it unprofessional to host images on 3rd party sites. One reason being if they go down, or quite commonly, are slow!
Design House Hervey Bay on March 6, 2007 02:10 PMYou could further optimize image size by using pngs for some images. PNG compresses simple images much better than jpeg, sometimes I see a reduction in size of more than 50% depending on the kind of image.
Using advanced jpeg tools, like gimp or photoshop can also help you trim precious kb's off jpeg photos as well.
Eric on March 6, 2007 02:13 PMYou know how some loser-websites block certain browsers (look at them with Firefox and you get "You appear to be using Netscape. Please upgrade to Internet Explorer 4!"). Is it possible to do a similar but less intrusive thing with RSS readers? That is, keep a list of known "evil" RSS readers, and serve up a little preamble at the top of each article reminding the user of their free-as-in-beer alternatives. If RSS feeds serve up an identifier like browsers do, it should be vaguely possible...
Eric TF Bat on March 6, 2007 02:21 PMDoes the mod_deflate works under SSL ( mod_SSL )? I enabled mod_deflate and mod_ssl but log does not write any compression ratio, it just put "-". The HTTP(s) response header however shows "Content-Encoding: gzip"
S Mandal on March 6, 2007 02:34 PMgreat article but you forget to add in that by compressing the data on the fly it will sure increase the load of your server (possible a lot) Might not be a problem for some but when you own your own server load is a big problem when squeezing out all the recourses.
danny on March 6, 2007 02:38 PMYour math is no good !
"..completely saturate two T1 lines-- nearly 300 KB/sec-- for most of the day..."
A T1 is 1.544 mbps or ~193 kBps , two of them = 386 kBps. Transactional TCP headers were really 86 kB/s or nearly 22% of your bandwidth ?! That seems high even for small pages.
Also I imagine you're paying a small fortune for a real T1 from Telepacific Communications to your location in Richmond, CA ( Vertigo Software ).
Shop around for a cheap hosting solution, I suspect it'll be safer to put the page on a hosting site rather than deal with it on the company network. Or buy a Fiber link that Verizon is offering out here on the east coast : verizonfios.com , the $50/month offering gives 5 mbps upstream. The $189/mo offering gives 30 mbps down / 5 mbps upstream.
Just a thought.
theorem on March 6, 2007 02:50 PM
One thing you forgot to mention is that while compression does greatly increase speed of download for a page; if the user has a crappy proxy or has a not-so-perfect link in general, pages will inevitably drop key bytes. Not so much of an issue when it's actual HTML or CSS but it's quite a big deal when it's 30 bytes of compressed data.
thereign on March 6, 2007 03:46 PMI just add few bytes to your bandwidth :-)
Here's some better advice: if you're concerned about 9gb of traffic, you should probably switch web hosts. Bandwidth and storage prices drop every year--if your plan isn't automatically upgraded or priced down, that's a strong sign you're with the wrong people. My host has a limit of 70GB per day (2,100 GB/month) for less than $7/month and continuously gets better as time goes on. Features Ruby on Rails, PHP/MySQL, etc. I would strongly recommend:
http://www.servage.net/?coupon=bonusbytes
aaron on March 6, 2007 04:38 PMI'd highly suggest *against* using flickr.
Personally, I love flickr, for photos. Using flickr to store random charts and graphs feels like mudding the waters so to speak. It also means that if enough sites are using flickr for off-site storage and even a few of them are dugg hard enough the entire flickr user-base will have to suffer. Flickr also is known to go down from time to time as well so that's another reason not to use it for site-image linking.
On the other hand we've done some research as well and S3 comes out on top. It's a perfect fit for what you're proposing and it's also the most cost effective.
Using flickr is like using a big rock to pound a nail, sure, it works but it's not really what it's for, while S3 is looking to be a rather nice hammer to pound the storage nail.
Shawn Oster on March 6, 2007 04:52 PMI suppose this does not matter in some situations, and especially if the in/out stats are gathered from the router, but...
If you want accurate numbers make sure to use mod_logio, as the bytes out field under the regular mod_log_config can report figures that can vary 10%-50% on successful, complete, requests.
Apache on March 6, 2007 05:21 PMMany hosting firms offer 1tb/month transfer nowadays, there would probabaly be problems if all the sites hosted on each server actually uses this amount, but i'm sure theyre using at least t3 its probabaly best getting a dedicated server hosted somewhere if you want your own server unless you either manage the transfer really well or are not expecting too much traffic. Your own T3 or OC3 would probably be fairly pricey!
James on March 6, 2007 06:46 PM@thereign
"pages will inevitably drop key bytes"
I think you misunderstand how TCP sockets work. If data is dropped either the connection is lost - or the data is retransmitted - this happens regardless of whether the packet represents compressed data or not
Of course with uncompressed data there is a higher probability of loss (more packets transmitted). Moreover compressed data transmits less packets overall, increasing data-transfer rate by spending much less time in acknowledgement cycles. (there to prevent data loss in the 1st place)
stephbu on March 6, 2007 08:48 PMThanks a lot for these tips, I've been using photobucket for my images ...
Rekzai on March 6, 2007 08:51 PMWouldn't it be great to just have a faster/bigger pipe and not care about bandwidth? I hope that bandwidth issues die like modems issues died sometime before I die...
Dave Parker on March 6, 2007 09:53 PMHosting images on a different server might have a minor performance hit on the client side due to the additional domain reference.
Brian Cantoni on March 6, 2007 10:22 PMKudos on switching to S3. I hear it is a great service. I do have a comment though:
"the ironic thing about this is that the viral post in question was completely HTTP compressed text content anyway."
Since you are now using Amazon's S3 have you thought about putting your semi-permanent static files up there (eg. css/javascript)? IIRC they set HTTP headers to suggest the data remain cached forever, so both you and your readers win.
Cheers!
Erik Karulf on March 6, 2007 10:38 PMI've had success with S3 -- it's very easy to setup, especially with the S3 firefox plugin (https://addons.mozilla.org/firefox/3247/).
Also, I threw together some quick numbers here in case anyone wants to compute their bandwidth costs: http://tinyurl.com/yr6rl6
One last comment on compression -- apparently this behavior is buggy in some older browsers, so be careful (http://www.thinkvitamin.com/features/webapps/serving-javascript-fast).
Great article.
Kalid on March 7, 2007 12:39 AMOne simple optimisation, that most people forget: Make sure to set the Cache-control HTP header on all static content - typically .js, .css, .gif etc.
A response with the header
Cache-control: public,max-age=7200
lets the browser cache content locally for 2 hours - it doesn't even need to send a request with If-modified-since. It can cache it, even if it normally wouldn't (for example if using https). Incidently, this also makes your web site much more responsive for people on high-latency connections.
Thanks Jeff for this helpful information, especially for the CSS compressor and HTML compression hints; I never heard about them before reading this post. Great writing!
Brian on March 7, 2007 08:21 AMI started page-making for the web back when we were all using dial-up, and not only dial-up, we were using 14.4 at the highest speed, and I personally was using 2400 baud. I had an AOL connection and maintaining a website that was easy to use and small in footprint was highly desirable. Most of my images clocked in at around 4kb, no more than five total per page at the most. The page formatting and design were specially crafted for 640x480 resolution, and I managed to keep things small. I, of course, had a little-visited site, but I never exceeded my 2MB file-space capacity given to me by AOL's FTP space, and I managed to have close to 20 pages.
The newer, younger generation doesn't know about optimization. I was also a web monkey for a company for a long time and they could not fathom why I'd delay a page's launch. I'd explain carefully, in 1996, "We can't let a 50 MB page go through. I won't even let a 1 MB page go through. Give me time to re-tool it so we're not making visitors wait a week for the page to load."
Ultimately, that company failed spectacularly. Small wonder. Anyhow...
Optimization GOOD. 15 MB JPEGs BAAAAD.
Jae on March 7, 2007 08:44 AMFor compressing PNG images, PNGOUT (http://advsys.net/ken/utils.htm) is the best utility I've found. One sometimes has to fiddle with the (many) command line parameters, but since it only overwrites the file if it can make it smaller, you can make a script that runs a number of common combinations to find the best one.
"pngout /c3 /f0" compressed the images in this article as follows:
bandwidth-usage 5593 -> 4295 (77% of original)
official-logo 1973 -> 1867 (94% of original)
text 1463 -> 1294 (88% of original)
Based on how many comments are on that linked article you might want to limit comments to 100 or something. Or even better, if an article is being requested more than X times per second, don't display any comments and link to another page with them.
I also remember reading an article years ago about yahoo changing their images directory to be named 'i' instead of 'images', this alone samed them gigabytes in bandwidth because they serve up so many pages, crazy but true.
Other tips:
use relative links
shorten massively used directory and file names
Don't overlook the cacheability of your site as a way to improve performance. Take a look at this excellent article and the cacheability engine that can test your web site for cacheability.
http://www.mnot.net/cache_docs/
Kuerwen on March 7, 2007 02:25 PMThanks for the tips! This is a very useful guide you've put out.
DaXtermGuy on March 7, 2007 05:44 PMFirst of all, kudos for having this problem to begin with. Great content here that's highly deserving of the attention.
That said, you missed one. Most blogging software will already create well structured HTML that utilizes CSS for positioning and decoration, but not all of them do. If you are still using tables for positioning and have inline application of style elements (which, in really poorly formatted sites are repeated over and over instead of relying on styles created in CSS), that will cause your HTML size to be much bigger than necessary too.
The best argument I've seen in awhile for well structured HTML that relies on CSS for the positioning and decoration is this:
http://accessites.org/site/2007/02/graceful-degradation-progressive-enhancement/1/
It goes a step further and addresses accessibility, but the general rules for HTML structure stated here are a great guideline.
Pete on March 7, 2007 07:24 PMI was hoping you'd have some more tips for this, 200gb just isn't what it used to be. I need as many tips as possible, I've cut load my remote hosting my images on a paid image service but I'm stilling getting killed here.
Walumba on March 7, 2007 09:39 PMHere is another tip/hack that I thought of after reading this article:
1.) For your images, host multiple pictures and use a banner rotation so one image can be grabbed from different image sharing sites, not just one.
2.) Take one large image and split/cut it into multiple pieces and then recombine those pieces so as if they appear as one image ;)
kualla on March 7, 2007 11:44 PMOne thing you can do with bad RSS readers, assuming they set correct UserAgent header, is to redirect them to a bandwidth capped server.
I have a server that on IP x serves pages as fast as possible, but on IP y it servers them at speed of 4KB/s, enough to keep clients downloading the material, but slow enough to keep them hogging all the bandwidth.
Actually I have a set of IPs on that server and each IP serves data at different rate, going from full speed to half, quarter, 1/8th and then to that 4KB/s. This way I can prioritize and manage how the hosted sites use up my limited upload capacity.
Raynet on March 7, 2007 11:46 PMIf you only had a nickel for each time someone reads your blog. . . .
John A. Davis on March 8, 2007 12:29 PMNice article. I especially liked the ideas around hosting images and impact of that external to your site.
Also worth mentioning is http headers on certain resources on your site to tell the browsers to not hit your static content so hard.
Check my post at
http://jacdev.blogspot.com/2006/12/tips-for-aspnet-website-performance.html
Jac on March 13, 2007 10:50 PMHow can I futher reduce the size of 304 / 404 etc, as 404 response is not compressed by IIS ?
Vinod on July 25, 2007 02:28 AMIf it's that important to you to reduce the size of a 404 page why don't you replace the error page with "404 not found".
Thanks for this, I have been looking at reducing my bandwidth for some time and did not think of some of these ideas. Not serving images AT ALL is a totally new concept to me, brilliant!
Donetsk on September 26, 2007 04:23 PMGreat insight here! I didn't know you could TOTALLY outsource your images. I thought you could only post certain images up at places like Flickr, and have a link from your site to Flickr. Shows you how amateur I actually am! Outsourcing your images completely, fantastic idea! It makes perfect sense! And if I was getting hits like you... there's no doubt I'd be doing that as you are.
Thanks for the excellent idea, will definitely be looking more into it.
Great article! Thanks for the tips. I didn't know about the HTTP compression until now...
Sotek on October 24, 2007 01:49 AMHi guys im in need of some help just read the artical and some of the post. id like to implment some changes to my site to save bandwidth. Also tips on how to make my pages load quicker. Ive no idea how to start. My site is getting arround 500 unique hits per day and expect to rise to 1000 at least by this time next month. im roughly using about 5gb bandwidth a day and have 3600Gb a month to play with but has might sites growing with popularity i need these changes.
What is this HTTP compression how do i turn it on and what are the bad points for doing this?
Where is the best place to put my images if not on my hosting server? i want it to be cheap as possible but fast and reliable
Ive also got 2 virgin media accounts which as space 50GB i think. currently i dont use these but are these a good place to store my pictures?
why not automatically redirect only everyone came from slashdot/digg/fark/etc...
http://wiki.dennyhalim.com/htaccess
so, that would automatically 'protect' your web from any dugg even before you know it.
thats nothing I got 30GB in 5 HOURS! My server overloaded by those blasted Asians upload their porn
Kyle Richelhoff on January 9, 2008 03:35 PMI'd like to plug a CSS minifier I came across recently that gets quite good compression. It seems to take a different approach than just removing whitespace,etc. Anyway, you can read about it
http://www.artofscaling.com/css-minifier/
only downside is that it isn't an online minimizer, need to run a jar
chris on January 22, 2008 03:41 PMOutsourcing can be a very complex and complicated. Every facet of the exercise must be carefully considered and properly executed. There is very little margin of error, if the full screen for the value.
However, this does not need to trauma, yet another adventure of blind research. The potential benefits are well documented, and the outsourcing strategy is now sufficiently mature to the path to innumerable times zertreten were before.
But how do you ensure that the lessons of others (sometimes the hard Tour), for a good cause? How do you ensure that you do not have to reinvent the wheel again? How, you are exercising in any case most effectively and efficiently as possible?
Michel john on March 24, 2008 05:12 AMI have a question - Where do All these companies that sell "bandwidth" buy theirs from? If I want to skip the middle man, what do I do?
JustWannaKnow on April 16, 2008 11:04 AMAlso worth mentioning is http headers on certain resources on your site to tell the browsers to not hit your static content so hard
Oyun | http://www.aylak.com
Oyun on April 30, 2008 10:30 AM| Content (c) 2008 Jeff Atwood. Logo image used with permission of the author. (c) 1993 Steven C. McConnell. All Rights Reserved. |