April 6, 2006
I like to periodically watch the HTTP traffic on my server. I can see what I'm actually serving up over the wire, and how much bandwidth I'm using.
That's how I noticed that I've become somewhat popular with direct-link image bandwidth thieves. In other words, people who thoughtlessly (or maliciously) embed these IMG links in their web page:
That means the image qbert_regex_16.png is served by my webserver to every user who happens to request this myspace profile page.
Warning: like all myspace pages, that page is
- Not really safe for work
- Incredibly, mind-bendingly ugly
- Filled with thousands of images, animated images, flash, MIDI samples, embedded MP3s
- Utterly and completely incomprehensible
In short, a trainwreck. Every time I visit myspace, I feel a little bit stupider, ala Billy Madison:
Principal: Mr. Madison, what you've just said is one of the most insanely idiotic things I have ever heard. At no point in your rambling, incoherent response were you even close to anything that could be considered a rational thought. Everyone in this room is now dumber for having listened to it. I award you no points, and may God have mercy on your soul.
Billy Madison: Okay, a simple no would've done just fine.
I have no idea why myspace is so popular. I guess the best I can hope for is that those damn kids stay off my lawn.
Anyway, back to business. The most common technique for blocking direct image links is to check the HTTP referer header. Here's the complete HTTP header set of an image request that just came through:
GET /blog/images/logitech_g15_keyboard.jpg HTTP/1.1
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
Prior to serving up the image, we should check the Referer HTTP header, and make sure it's either:
- In a list of known whitelisted referring domains
If it isn't, we will serve up either a 404 error, or a "hey, stop stealing our bandwidth" image of some kind. Because I'm a nice guy, I chose this image:
All this can be done through incredibly powerful URL Rewriting, which has been standard on Apache for some time. There's a nice walkthrough on how to set up image link blocking in Apache on Tom Sherman's site.
Unfortunately, IIS 6 doesn't have native support for URL Rewriting*, but there are any number of third party ISAPI filters that can do it. The one I use is ISAPI Rewrite. It's very similar to the Apache version, in that it is driven by the httpd.ini file in the root of each website. I struggled a bit with the rules, but thanks to a helpful forum post, I realized that I needed to put all the whitelisted domains on a single line to get a boolean "or" that included the empty referer case, like so:
# Block external image linking
RewriteCond Referer: (?!http://(?:www\.codinghorror\.com|www\.bloglines\.com|www\.google\.com)).+
RewriteRule .*\.(?:gif|jpg|png) /images/block.gif [I,O]
So, as outlined above: unless the referer is blank, or in the whitelist, they get shunted to the blocked image.**
Take that, 26 zillion myspace users.
* I'm pretty sure URL Rewriting will be in IIS7, since they're finally getting around to making a really good copy of Apache's modular architecture in version 7.
** This is done at the ISAPI level, so unlike the cheesy ASP.NET "URL rewriting" solutions, it also works on generic URLs, not just URLs that end in .aspx or some other extension that is sent to the ASP.NET handler. This has long been a pet peeve of mine, but it's really the fault of IIS. And it's changing in IIS 7.
Posted by Jeff Atwood
Thanks for adding bloglines to the list. There's probably a couple more you will have to add over time.
Nice article...I've long used Apache and now am trying to figure out how to do things that used to be easy on IIS for work. This will help.
Thought you might want to know that images are rewritten on when seen through bloglines. Additionally, I am switching to http://rojo.com for RSS feeds - it's the best feed aggregator I've come across yet, it would be nice if you could add that to your list. I always read Coding Horror, you're my absolute favorite .Net related writer!
(I thought it might be that I often leave the www our of the url since that is kind of redundant, but in either case, still not seeing your images on Bloglines.)
Please add newsgator.com to the list.
I understand the cookie approach, but describe the GET/HEAD approach?
Also, I added live.com and newsgator.com to the whitelist based on some additional sniffer trace monitoring.
Also, I found a nifty tool that lets you tests whether or not your anti-hotlink approach is working on your server:
Be *sure* to clear your browser cache before running the test; stuff on disk will always show up.
Looks like the various anti-hotlink alternatives are also enumerated on that site:
They sell a product that generates random URLs on the server side which are only valid for a fixed amount of time, eg, "ColdLink". Interesting.
I'm glad I'm not the only one who's had to resort to image blocking because of those damn MySpace users.
Mate, feel you pain. You might get a chuckle out of the following article:
Andrew, that's hilarious, LOL!
I am switching to http://rojo.com for RSS feeds
I'll add that to the whitelist later tonight.
Been an avid reader for a while. Just noticed the added image-parsing required to post.
While I understand the need to avoid spamming the board, have you considered that a blind person will now require the aid of a friend to post a comment to your blog? The solution is very far from perfect. I can't give you a better solution off the cuff, but you should be aware that it does cause problems for some users.
This can be done with IIS 6
I've set it up at http://www.safecam.org.uk/ to stop other sites nicking the photos and maps.
I can't remember exactly what I did off the top of my head, if anyone is interesting, I'll dig out my source.
Wow, I really should have turned down my speakers before clicking the myspace link....actually...I should have just not clicked the myspace link. Nothing good can ever come from that place.
Thanks for the great image blocking technique.
While understanding the reasons for this step it is also bad for some users. From now on I can see just WTF images in my own feedreader.
Ulrik Jensen: Why would a blind person care about imaging blocking posts?
For that matter, what percentage of CodingHorror readers are so blind that they have to use a screen reader as their only possible means of surfing the web? I will go out on a limb and say very few are.
My father-in-law is just about completely blind, but can see shapes out of the corner of one eye and he is STILL able to browse the web and look at images. Of course, he has a special magnification utility that goes far beyond the one built into windows, but so would anyone else that can barely see.
I'm more concerned about all the poor lynx users http://lynx.browser.org/ :(
Oh, and why is the captcha always "orange"? That's not very hard to defeat lol.
It works for pretty much all the standard spam-bots that are out there, which is pretty much all this site gets. It works for now, and is easy to habituate. I'm guessing if anyone bothers to "break" it, he'll change it to random words.
For IIS you can use a href="http://www.isapirewrite.com/"isapi rewrite/a, there is a free "lite" version available that works like magic.
Another way they seem to be able to reach you is using a redirect from google images, so be careful with what you add to your accept list.
Yes, I've had to do the same for coinop.org - funny that my q*bert pictures also get leeched. The other culprit (myspace is bad, yes) is ebay - people selling "emulator paks" while theiving other people's code are also likely to thieve on the bandwidth as well. For those I usually replace them with a funny custom image involving a baby and excement and then report them to ebay for having offensive images. Then again I'm vindictive.
I have a custom image deliverer that can scale up and down images and it also checcks to make sure the referer is me. It catches 99% of the links and returns an "image missing" - figure that will confuse people and waste their time.
MySpace is popular because it's chaotic and allows you to do what you feel like without much structure. You can do what you want where you want to do it. It's like IM gone mental, with the output stored for future reference.
Friendster was more structured and lost popularity for that reason, as well as having a hostile administrator and slow system response for a long period of time... but it was a lot more structured.
Anyone who thinks that the up-and-coming generation are tech-whizzes who can do great things with technology should take a look at MySpace as a counter-example. They're just consumers of what's put in front of them, and that's about the extent of it.
actually...I should have just not clicked the myspace link. Nothing good can ever come from that place
The other culprit (myspace is bad, yes) is ebay
And online forums. Some guy in the UK made that Q*bert image his forum avatar, so it showed up in every post he made.. :P
From now on I can see just WTF images in my own feedreader.
As long as your feedreader (I assume a Windows app?) is sending blank referers, it will work. I only disallow unkown referers, not blank or empty ones. There's should be no "referer" for a Windows app to use, as it's not coming from a website!
Right now the whitelist is:
- my site (duh, that'd be funny ;)
- (blank referer)
If it *is* sending a referer, let me know what the URL is and I will happily add it to the whitelist for you.
I cannot WAIT for IIS 7 to be released and adopted widespread!
All the hoops I have to jump through with Subtext to allow you to create a blog in a "virtual" subfolder WITHOUT setting up a virtual directory in IIS and without mapping * to aspnet_IIS.
This would allow you to create a URL like http://example.org/MyBlogFolder/ without having a physical (or even virtual) folder named "MyBlogFolder".
In the end there's no way to do it without either mapping * to aspnet_IIS or using a custom 404 page (which is the choice I made).
Ideally, I want my URLs to be really pretty. Like ponies.
I cannot WAIT for IIS 7 to be released and adopted widespread!
It's gonna be a while. All versions of Vista come with IIS7 (as we found out at Mix), but those are all desktop operating systems. Are you gonna install Vista on your hosting services' servers? That's what I thought.
We can develop against it. But we'll all be waiting for Longhorn server before we can use IIS 7 for real, production websites. I have no idea when that will be out!
mapping * to aspnet_IIS
I do not think you should ever map * to the ASP.NET handler. Stated another way: I think this is a really bad idea.
There's no perfect solution right now, but that particular "solution" is gonna cause problems.
You should get a copy of ISAPI Rewrite and do this the right way. Obviously the subtext project can't make this a requirement, though, but as a personal workaround engine, it's nice.
Jon Galloway, stop thinking outside the box. Put yourself BACK in the box, man!
But seriously. I am a huge fan of Coral. I am not a huge fan of becoming dependent on another website for core functionality.. eg, Feedburner (RSS feed), Flickr (images), etcetera.
Wow... how many RSS reading site owners are gonna be on your whitelist? I hope it doesn't get too long to parse...
And yes, that's a reversed invitation to put mine up aswell.
how many RSS reading site owners are gonna be on your whitelist
The use cases for web sites that tend to be aggregated is definitely different than a traditional website. I think either..
A) You're a giant RSS aggregator, so you'll be on a limited whitelist.
B) You're a small RSS aggregator, so you need to write image retrieval code that passes in blank referers.
I'm not the only site that blocks unknown referers from retrieving images! As you know, all it takes is a few idiots to ruin it (free, unlimited remote image linking) for everyone.
It's interesting to see that I'm not the only guy out there using ISAPI Rewrite. I've found it to be very, very useful. You can pull off some truly neat tricks with it. For example, http://www.practicelink.com/jobs/ This entire directory tree more or less just runs off of one .aspx page. I've got ISAPI Rewrite set up to map all requests that match /jobs/.+? over to my aspx page, while the user (or search engine, which is the real idea) is none the wiser.
I've also started getting into making it so I can add some virtual directories via ISAPI rewrite via some other aspx page. The page just generates the appropriate regexes and ISAPIRewrite code and uses filestreams to update the httpd.ini file. Since ISAPI Rewrite requires no IIS restart or anything like that after you update its httpd.ini for what you changed to take effect, this works like a charm.
Your image is too small for the teenyboppers on Myspace and the like to notice.
I would change your image to be *much* bigger, but use some sort of graphic format like gif that doesn't increase the filesize much. If you make the image say 800*800px then it will be noticed and get removed.
Rick Scott: I do agree that the problem is probably very limited. And the fact that the image is the same every time (so far) does make it easier.
However, this is a site that focuses a lot on usability, with which I feel accessibility is pretty tightly connected, so I think it is relevant to consider that the solution, although widely used, isn't anything near perfect.
It's been a pet peeve of mine since I had to help a blind friend sign up for at site that used this technique. There has to be a better way of protecting against spam-bot, although I am not myself smart enough to find one.
You could add the Yahoo! mail beta RSS reader, too.
Moreover, you could just output some more innocuous placeholder, or maybe nothing and let the browser fall back on it's broken image link. I wouldn't mind clicking through to your site to see the images, so long as the replacement images isn't painful to look at.
Bloglines isn't working for me, I still get the wtf pics.
"I do not think you should ever map * to the ASP.NET handler. Stated another way: I think this is a really bad idea."
Is there a particular reason for this? With ASP.NET 2.0 and IIS 6 the ASP.NET handler is designed to be usable in this way and can pass back requests to IIS (so, for example, you can use Forms Authentication to protect ALL resources on your website (such as images) and not just aspx/ascx/etc. files).
I don't really know much about URL rewriting, but have been looking into it for a web app I'm working on and would appreciate any input. I was going to do a wildcard mapping to the ASP.NET handler, but will have another look if this is not a good idea.
re: bloglines, the issue is not having www. as Jeff's rewrite (all of the) require www.bloglines.com. We'll need to rewrite the rewrites. :)
This kind of things works, BTW, re the www. or no www. thing. Fixed my bloglines and yahoo mail problems.
I think I'm going to go with (anything.)domain.com .. for all the whitelisted domains. I just haven't had a chance to update the rules yet. But I will!
I was going to do a wildcard mapping to the ASP.NET handler, but will have another look if this is not a good idea.
For one thing, this doesn't work for folders, eg, http://mywebsite.com/myfolder/
It's also unnecessary overhead for serving up basic files like CSS and images.
Hi! Is it possible to add livejournal.com to the whitelist? I have your blog syndicated with the RSS feed to my friends page...
Scott Hanselman: Oh! Finally I know what happened - at the beginning of April I though that somebody hacked your blog :)
Um, so far when viewing from google reader, I still get WTFs.
I see that google.com is whitelisted, but as I am in canada, I use google.ca
Can you please whitelist that one as well? (And I guess for other international users you may have to google.co.uk, google.??)
OK I added livejournal.com and I modified the google check to
(anything).google.(2 or 3 characters)
Pesky canadians.. ;)
I know this is an older thread, but could you add the newshutch.com feed reader to your whitelist?
I added newsalloy.com and newshutch.com to the whitelist.
Wow, great site. I came over from spcr through a link to the blog about quiet computing. Good stuff. Myspace is pretty bad, I don't think it was built for so many users, there's always errors and maintainence going on. Its simply a poorly done, chaotic, but very open forum. I'm bookmarking this page instead of going to your main page just because of the Billy Madison quote.
I added any URLs beginning with "localhost" to the whitelist. Per one of the SharpReader developers, the IE ActiveX control always sends locahost:port as the referer when requesting images..
Worked like a charm
I think direct linking is a sin. So I always use my photobucket account to host my images I use on forums or in a blog I can't upload images to.
I thought it was just common sense that you get hosting that is your own anyways because I always feel very guilty when I get lazy and go direct link a image without uploading it to my photobucket account.
Even I know this even though I am a n00b and I total idiot and I lack common sense in most everything else ; including properly commenting on a blog entry.
That's sticking it to the 122 viewers of the thread where I direct linked an image of the g-15 from you site. Woosh!
I have this problem and aren't myspace customers friendly when you tell them to take it off!
Is there anyway of doing this with HTML? I dont have a database system running, Im oldschool it that regard with hand written HTML, I know Im pre-historic..
Can you help or point in the right direction?
Relgolook is a productivity application for Microsoft outlook users. Relgolook information management provides organize and archive emails and information and reduce attrition
Okay, I'll get mostly back in the box. But one more thing to think about - if _you_ coralize all your image links, everyone who copies your image links gets the coralized copies. Instead of spending your time chasing ISAPI rules, you could change your blog rendering code to coralize your image links when it writes them out, so you could turn it off with a config setting.
And yes, I should probably spend more time writing my own stuff and less time commenting on yours.
How many different inline linking attempts do you get?
I mean, would it not be simpler to blacklist instead of whitelist? That would solve the problem of other on-line aggregators, other search engines , and so on.
Not realistic if you get many attempts from many different domains, but if there are only a few big ones, then letting a small number of image hits happen once in a while may not be too bad if it allows all legitimate linking through...
Hello, I am using Ionic's ISAPI Rewrite filter and have a question about using it. If a url has 10 parameters how do you get the 10th one, or 11th one etc...?
Using $10 does not work, it returns the value of $1 appended with a zero on the end...
If you are looking for cheap prescription drug pharmacy, I would recommend you all to shop at eshoprx.com They are reliable, fast and believe me CHEAPEST.