I <3 Steve McConnell*
Coding Horror
programming and human factors
by Jeff Atwood

May 28, 2008

Designing For Evil

Have you ever used Craigslist? It's an almost entirely free, mostly anonymous classified advertising service which evolved from an early internet phenomenon into a service so powerful it is often accused of single-handedly destroying the newspaper business. Unfortunately, these same characteristics also make Craigslist a particularly juicy target for spammers and evildoers. Who knows; maybe it's karma.

I consider Craiglist a generally benevolent public service. Perhaps that's why I was so profoundly disturbed by John Nagle's wartime narrative of the raging battle between Craigslist and spammers.

Spam on Craigslist has been a minor nuisance for years. Not any more. This year, the spammers started winning and are taking over Craigslist. Here's how they did it. Craigslist tries to stop spamming by:

  1. Checking for duplicate submissions.
  2. Blocking excessive posts from a single IP address.
  3. Requiring users to register with a valid email address.
  4. Using a CAPTCHA to stop automated posting tools.
  5. Letting users flag postings they recognize as spam.

Several commercial products are now available to overcome those little obstacles to bulk posting. CL Auto Posting Tool is one such product. It not only posts to Craigslist automatically, it has built-in strategies to overcome each Craigslist anti-spam mechanism:

  1. Random text is added to each spam message to fool Craigslist's duplicate message detector.
  2. IP proxy sites are used to post from a wide range of IP addresses.
  3. E-mail addresses for reply are Gmail accounts conveniently created by Jiffy Gmail Creator (ed. note: this does not break Google's CAPTCHA, as you can see in this screenshot.)
  4. An OCR system reads the obscured text in the CAPTCHA.
  5. Automatic monitoring detects when a posting has been flagged as spam and reposts it.

CL Auto Poster isn't the only such tool. Other desktop software products are AdBomber and Ad Master. For spammers preferring a service-oriented approach, there's ItsYourPost. With these power tools, the defenses of Craigslist have been overrun. Some categories on Craigslist have become over 90% spam. The personals sections were the first to go, then the services categories, and more recently, the job postings.

Craigslist is fighting back. Its latest gimmick is phone verification. Posting in some categories now requires a callback phone call, with a password sent to the user either by voice or as an SMS message. Only one account is allowed per phone number. Spammers reacted by using VoIP numbers. Craigslist blocked those. Spammers tried using number-portability services like Grand Central and Tossable Digits. Craigslist blocked those. Spammers tried using their own free ringtone sites to get many users to accept the Craigslist verification call, then type in the password from the voice message. Craigslist hasn't countered that trick yet.

Much of the back and forth battle can be followed in various forums. It's not clear yet who will win.

I've used Craigslist quite a few times in the past, mostly to sell things that are too unwieldy to ship, with generally positive results. But that's the "for sale" section, and the spammers seem to be concentrating on the personals and services. I was curious about this, so I delved into the local personals section in what I guessed to be the most popular category. (Note to my wife: this is research! Research! I swear!)

Almost immediately I found a personals ad with the following "image":

Craigslist anti-scam image

It's an encoded wartime transmission from someone battling Craigslist spammers. It ends on this dire warning:

99.9% of the ads these days are fakes. Sad but true. REALLY, ALMOST ALL THE ADS ARE FAKE!

But is it true? I saw some obvious spam in the personals section -- all of which had been flagged for removal by the time I clicked on it -- but certainly nothing to corroborate this 99.9% claim. I did a few unique term searches on random personals (my favorite at the moment is "no murderers please!"), and they came up unique.

Clearly, there's a war on, and there have been casualties on both sides. Even if the spammers aren't winning, every inch they gain further undermines the community's trust in Craigslist and devalues everyone's participation.

This is a topic I am acutely interested in as we build stackoverflow.com out. Like Craigslist, stackoverflow will offer a rich experience for anonymous internet users. We will not require you to create an account or "login" to answer or ask questions. We'll even track your reputation and preferred settings for you, as long as you allow us to store a standard browser cookie. While it's true that we'll initially be a low-value target due to limited traffic and a specialized audience, that will inevitably change over time. So you can expect some of the same measures on stackoverflow that Craigslist and Wikipedia use to mitigate anonymous evil:

  • Some form of CAPTCHA.
  • The ability to temporarily "lock" controversial questions so only registered users can edit or add responses.
  • An automatic throttle if we see rapid, bot-like actions from your IP address.
  • Some basic heuristics to detect "spammy" content, such as too many URLs.
  • An easy way for users with sufficient reputation to undo vandalism by reverting to an earlier version.

The community itself can also assist. Every question and answer on stackoverflow can be rated Digg style; if a given bit of content rapidly accrues a large number of downmods, it is likely to be spam or inappropriate content, and will be automatically removed or directed into a moderation queue.

Don't get me wrong. I've been humbled by the quality -- and the sheer size -- of the community that has grown up around this blog. I expect the overwhelming majority of people who participate in stackoverflow.com will be absolutely upstanding internet citizens. Wikipedia is a living testament to the fact that goodness vastly outnumbers evil. We good guys can win, if we've had the forethought to put some controls in place first.

Allowing anonymous users write permission creates a volatile situation where a dozen sufficiently motivated spammers can easily poison the well for thousands of typical users. These spammers don't give a damn about the community we're building together. All they care about is getting paid by posting their links anywhere and everywhere they can. They'll run roughshod over as many websites and pages as possible in their frantic, abusive pursuit of money. If I didn't so desperately want to choke the life out of each and every one of them, I might actually feel sorry for the poor bastards.

But here's the problem: following the rules and being a good citizen is easy. Being evil is hard; it takes more work. Sometimes a lot more work. The bad guys get paid to learn about their exploits. Are you willing to educate yourself about the complex evil that a tiny minority of powerful users are prepared to unleash upon your site? As with so many things in life, this is best illustrated by a scene from Spaceballs:

HELMET So, Lone Starr, Yogurt has taught you well. If there is one thing I despise, it is a fair fight. But if I must than I must. May the best man win. Put 'er there. (offers to shake his hand)

Dark Helmet, from Spaceballs

LONE STARR goes to shake his hand. HELMET takes the ring off LONE STARR'S hand.

HELMET The ring. I can't believe you fell for the oldest trick in the book. What a goof. What's with you man? Come on. You know what? No, here let me give it back to you. (offers the ring back)

LONE STARR goes up to get the ring back. HELMET throws it in a grate. The ring goes in the grate. LONE STARR tries to catch it and falls to the grate.

HELMET Oh, look. You fell for that, too. I can't believe it man.

LONE STARR gets up and runs to a corner.

HELMET So, Lone Starr, now you see that evil will always triumph, because good is dumb.

As the good guys, we can't afford to be ignorant of the spammers' techniques. If that means spelunking through the grimiest corners of some scummy black hat forums, then so be it. I'll tell you this: I've never nofollowed a single link on this blog until today. The most effective way to fight the evil spammers is to understand them, and the first step toward understanding evil is openly linking to their tools and methods, exposing them to as much public scrutiny as possible.

When you design your software, work under the assumption that some of your users will be evil: out to game the system, to defeat it at every turn, to cause interruption and denial of service, to attack and humiliate other users, to fill your site with the vilest, nastiest spam you can possibly imagine. If you don't do that, you'll end up with something like blog trackbacks, which are irreparably busted at this point. Trackbacks are the source of countless untold hours of institutionalized spam pain and suffering, all because the initial designers apparently did not ask themselves one simple question: what if some of our users are evil?

When good is dumb, evil will always triumph.

Websites that allow users to post content will always be vulnerable to the actions of a handful of evil, spammy users. It's not pleasant. It is a dark mirror into the ugly underbelly of human nature. But it's also an unfortunate, unavoidable fact of life: some of your users will be evil. And when you fail to design for evil, you have failed your community.

[advertisement] Dashboard for Data Dynamics Reports introduces new controls designed to create dashboards that inform without wasting space or confusing users.

Posted by Jeff Atwood    View blog reactions

 

« It's Clay Shirky's Internet, We Just Live In It Strong Opinions, Weakly Held »

 

Comments

Spam is a severe problem but I have noticed on a few occasions were otherwise secure systems have had holes in their spammer protection. I used to run a forum which revieved about 200 posts a day and the spam protection hadn't failed. However I noticed a few months later that the software's knowledge base feature (which barely anyone used or looked at) had been overrun with spam as the captcha was missing on submit link for the knowledge base. I ended up removing the section altogether as it wasn't any use but if it had been a more public section such as the downloads db it would have caused a lot more trouble. (It was a gaming forum and it hosted a good few modifications)

Tony on May 29, 2008 04:13 AM

That's SO right! It's about time people realised that exposing the evil is the best way to fight it. Be aware of the techniques so you know what to design for. It'll also keep the elitists busy coding up new tools and researching techniques that circumvent the new knowledge granted the majority of the public.

They'd have to tire sooner or later. And if they don't, at least make them work for their money!

Josh Smeaton on May 29, 2008 04:24 AM

The thing I’ve always thought about spam is that whilst software struggles to recognise it, humans can almost always spot it immediately. So I figure your best bet is to make it as easy as possible for humans to flag spam. I speak from no experience.

I do, however, have experience of using SpamSieve on my Mac. It does the heuristic thing of learning from what I flag as spam. Very few false negatives or false positives, although admittedly my spam traffic is peanuts compared to what a popular forum might receive.

Paul D. Waite on May 29, 2008 05:02 AM

Your suggestions (and more) have either already been implemented to no effect by Craigslist, or were not applicable in the first place.

What is a developer to do when there's nothing left to fight with and/or your resources are far outmatched by the spammers?

BSD on May 29, 2008 05:13 AM

Well Jeff, with all that knowledge, when will you change the keyword to enter? ;) It has been the same ever since my first comment to the page. I could write a SPAM tool right now. Actually I don't even need a tool for that. A simple BASH shell script with curl to post text to the page in a for-loop will do :P

Those CAPTCHAs are getting more and more useless. The better OCR software will get, the more useless these will get. And one day the only way to make them unreadable for OCR software will be to make them unreadable for human beings, too. Despite that, they don't work well for people with disabilites.

The biggest problem is: I'm a spammer and I want to spam a forum with CAPTCHAs that no OCR software can handle. No problem. I make a simple porn page and ask people on every access to the page to first solve the CAPTCHA. In fact this is not my capture, but the one of the forum I want to spam. That way people are helping me to solve the CAPTCHA and spam the forum. Pretty easy, isn't it? Would also work on your page. The problem here is, that the CAPTCHAs don't tell people where they come from (what page or service they try to secure). A capture should contain the URL of the page to that it belongs to!

Still, CAPTCHAs are not the way to go. Anyone thinking about alternatives to these? It must be something a human being can easily solve, but that's almost impossible for a computer to solve. Not a trivial task.

Mecki on May 29, 2008 05:15 AM

I have to ask .
I mean( I know it is going to sound insanely naive) I get that spam is incredibly lucrative but how exactly
what is the revenue model( not just craigslist but mail, trackbacks,comments whereever they spam)
I dont know anyone who responds to a spam ad
who is the customer in all this money that spammers are making
clearly theres tons and tons of money in it for it to be such a concerted effort but um how
Thanks

Rahul Chandran on May 29, 2008 05:16 AM

If you're going to include CAPTCHA verification on stackoverflow, you should think about using http://en.wikipedia.org/wiki/Recaptcha . It's as great way to combine the verification with getting actual work done.

ballmer on May 29, 2008 05:19 AM

re: Mecki's question...I wonder if something subject-matter specific would solve the "mechanical turk pr0n site" way to get around captchas? If you're posting in a vaguely .net related forum then "which of the following words is not a reserved word in C#" - that kind of thing. Might also help improve the signal/noise.

JosephCooney on May 29, 2008 05:25 AM

@Rahul Chandran:

Spam is so insanely cheap to produce that it just takes one or two idiots per million clicking on it to make money.

Rhywun on May 29, 2008 05:39 AM

> re: Mecki's question...I wonder if something subject-matter
> specific would solve the "mechanical turk pr0n site" way to get
> around captchas? If you're posting in a vaguely .net related forum
> then "which of the following words is not a reserved word in C#" -
> that kind of thing. Might also help improve the signal/noise.

That's actually an excellent idea!

It also helps to determine which pr0n sites (if any) are used as to bypass the security. Not quite sure what good that'll do but anyway...

Rix0r on May 29, 2008 05:40 AM

This project from Microsoft Research looks promising, using images of cats, instead of words, to identify humans: http://research.microsoft.com/asirra/

Darrell Mozingo on May 29, 2008 05:42 AM

I need toily!

TJ on May 29, 2008 05:53 AM

Mecki: your request is not entirely accurate. You want a problem that is simple to solve for humans, simple to verify for machines and hard to solve for machines. Otherwise, you could require mathematical proofs to post things. Those proofs are surely machine-safe, however, you cannot verify them mechanically in general.

Furthermore, given a problem with the above characteristics, you will also want a large set of possible answers to lower the probability of just guessting the captcha right. Just asking does "this calculation equal 2?" will stop like half the spam, because you can just guess. (Assuming a uniform distribution of expressions that equal two and others). The larger this set is and the better your distribution of answers is (that is, the more it looks like a uniform distribution), the more spam will be stopped.

Even more, you have to prevent that those problems are farmed, that is, placed on some other website and solved by humans again. I think you need to include the URL in the answer in order to prevent this.

Taking all of this together, Id say its a damn tough job to find such problems that are solvable by a great number of users.


On the other hand, I like the idea of having some server-side AI that tracks what posts are marked as spam by enough users and marks things that look similar as "possible spam". Judging from the learning rate of such agents with a single input, this could work, as it distributes the work of training such an agent to a large number of users (and they don't even need to know it)

Hk on May 29, 2008 05:54 AM

Jeff,

Instead of the Spaceballs bit, this might be better (from Star Trek, "The Omega Glory", Episode Number 52, Season Number 2, First Aired March 1, 1968). Dr. McCoy says:

"Spock, I've found that evil usually triumphs, unless good is very, very careful."

And, like Rahul Chandran, above, I just can't wrap my head around how anyone makes money with spam. I mean, it's OBVIOUSLY spam. Who's going to click on it?

David A. Lessnau on May 29, 2008 05:54 AM

I think that to solve the spam problem once and for all(or at least make it very unprofitable), is to educate users. Educate every user of every computer, how to recognize spam and scams. The government should do this, pay for it, do a literal blitzkrieg of advertising to educate consumers not to buy anything from spam.

Its the only 100%(or as near as possible to make it unprofitable).

Zeroth on May 29, 2008 05:54 AM

Hi Jeff,

just a question about stackoverflow, you comment here that you are planning to use a cookie to store and evaluate user reputation. But will this work even when I access the site from different computers?

Bruno on May 29, 2008 05:55 AM

The hackers will always win...

http://news.slashdot.org/article.pl?sid=08/05/28/1522250&from=rss

bryan on May 29, 2008 05:57 AM

Ah, but Lone Starr wins in the end, so I'm optimistic. :-)

Anyway, I like the idea of subject-related captchas but I doubt if they'll work in practice because the questions have to come from some sort of limited catalog. If (when) stackoverflow get's popular enough, specific attacks will spawn.

I'd rather opt to treat all anonymous posts with the most suspicion possible short of being rude. And to counterbalance this, make it easy to create an account and sign on (OpenID, anyone?). Perhaps this even can be combined. Creating simple subject-specific question/answer pairs should be very easy for most users and this could be used to constantly permute the catalog of available captchas.

Konrad on May 29, 2008 05:58 AM

What about something like when a user does a post it sends an e-mail to their account with a one-time link in it that they need to click on to activate their post?

I know you can check e-mail via programming to find the link to click back to, but it makes certain they have a valid e-mail address at least.

HB on May 29, 2008 06:17 AM

->JosepfCooney
The problem with question is, that there is a limited amount of these. Sure, questions are a good way to secure a page. For a human being, a simple question like "Which animal can fly? A lion, a bear, a monkey or a bird?" is trival to solve. For a computer, this is impossible, unless it has complete understanding of the English language, can understand the meaning of the question and find the correct answer. If you can write such a program, you would not use it for a spamming tool, you would sell it to Microsoft for 10 billion dollars :P Finally you can just tell your computer what you want it to do and it will understand you. Combine this with speech recognation and you have a Star Trek like Computer: "Computer, ... do this and that and finally ...". The problem is: If your database contains 200 questions, one day someone has collected them all together will answers, place all that into a database and a tool can easily detect the right question and look up the answer in a database. Such a scheme will only work if you update the questions in very short intervals. Intervals that are short enough to avoid that spammers can ever keep such a database up-to-date.

Mecki on May 29, 2008 06:19 AM

It seems to me that we are going about fighting spammers completely the wrong way. I don't believe there will ever be a way to completely block one group of people from using an open forum while still allowing everyone else. No matter how ingenious the filter/protection it will always be circumvented eventually, because there is money to be made in it. So instead remove the cause. Do not let anon posts include links or email addresses. Remove their ability to make money off of spamming your sight.

Cary Clark on May 29, 2008 06:23 AM

If you are going to work with ratings ( brownie points ) etc :

* New users can only post to a firehose section, regular users must want to go the firehouse and mark posts as spam or real, if the post is marked real it goes to the proper forum/board.

* Once 1 post is marked as real, that user can post 1 message per day
* Once 2 more posts have not been marked as 'not spam', that user can can help in the firehose section ( mark 1 message per day as non-spam/spam )
* Once a 'real' marked post is marked as 'spam' because it got approved by a spammer user, both the approver and the poster get rating 0, all their posts go to the firehose again.

Then of course you could go further and say that admins can give special powers to known users to have unlimited 'spam/not spam' voting power.

Either way, you cant solve the spammer problem with technologie only.

Cheers,
T.

Konijn on May 29, 2008 06:29 AM

> This project from Microsoft Research looks promising,
> using images of cats, instead of words, to identify humans

isn't that too easy to guess?
ignored case characters + numbers, 5 characters = (26 + 10) ^ 5 = 60.466.176 possibilites.

telling 12 images into 2 categories: 2 ^ 12 = 4096 possibilities.

keppla on May 29, 2008 06:30 AM

If I might make a suggestion--for a CAPTCHA you can't do much better than thephppro's text-only CAPTCHA. No website I've ever built with it has yet been broken.

http://thephppro.com/products/captcha/

Combined with a spam-fighting CodeIgniter plugin I've written, it seems to be amazingly effective. This plugin uses several techniques to fight spam:

An encrypted timestamp (combined with something unique about the user--perhaps an IP or user agent?) placed in a hidden field--if the form is older than one hour or so, or the IP/user agent doesn't match, block the submission.

A text field or textarea with an easily-readable, spam-worthy name, such as "comment" or "post", placed off the viewable area via CSS positioning--if it's filled in, block the submission.

A text or audio-based CAPTCHA--if there's no image to use OCR on, it's a little difficult to break it!

WesleyC on May 29, 2008 06:32 AM

->Darrell Mozingo
This idea is stolen by MS. I have this scheme already in action (for posting to a blog) almost a year ago and it was digged on DIGG.com (that's how I found it in the first place). Already at that time I found the weak spot: Your database will have images of how many different cats? 100? Okay, if you know the MD5 checksum of each of the cat images, a spam bot can take all images, calculate the checksum, verify against a database and has the cat images. So you would need at least some random data in each image that changes every time the same image is displayed. Even then a patter match algorithm would work (I know a nice tool that finds duplicate images on your HD, even if the dupe has a different resolution, some text written on it not found in the original, some colors changed, and so on; it still knows it's basically the same image - and the failure rate of this tool is below 5%). Also you lock out blind people completely; how can they recognize a cat?

-> Hk
I pretty much agree with most of the things you wrote.

More generally spoken, the question is:

What is the real solution?

1) Avoid spam getting posted by some complicated CAPTCHA like scheme?

2) Don't care for spam getting posted, but have a computer find out what is spam by some super clever application (however this might work)

3) Don't care for either and hope users will mark spam posts as spam.

(3) is no good solution IMHO. Think of a side getting 10'000 spam posts a day compared to 200 user posts a day. You expect the 200 users to do all the work to tag the 10'000 spam posts as spam?

(1) is the problem I fail to see an ultimate solution for.

(2) would be perfect, but I fail to see how the application can really recognize at least 99% of all spam posts.

Spam is also a very subjective term. What I might see as acceptable might be tagged spam by someone else. Otherwise I'd say the solution is (4):

4) Outlaw spam all over the world, punish spammers hard and make sure this law is enforced by all means.

Laws are not always the way to go. Laws can't solve all problems of society. However, in some cases it has already worked. A lot of people all over the world already got arrested for spamming and had to pay high fines. However, since Internet is worldwide, as long as there is at least one country that will not act against spammers, spammers will simply spam from there.

Mecki on May 29, 2008 06:35 AM

Craigslist personals are targeted because there is a large number of desperate and stupid (a very bad combination) people on it. Do you really think spam is going to be a problem on SOF?

Spoon on May 29, 2008 06:36 AM

Caveman throws rocks at another caveman.
That caveman responds by wearing a thick animal hide for protection.
First caveman invents sharp pointy stick to stab through hide.
Second caveman invents shield to protect against sharp pointy sticks.
First caveman invents club to bash through shield.
Second caveman invents armor with extra padding to protect against club.

Thousands of years later:
First caveman invents long range missles.
Second caveman invents interceptor missles.
First caveman invents bomber aircraft.
Second caveman invents anti-aircraft.
First caveman invents stealth aircraft.
Second caveman invents radar rewritten to detect stealth.
First caveman invents nuclear ICBMs.
Second caveman goes about inventing a "Star Wars" shield.

And so it goes. And so shall the spam wars go.

The question is whether today's status quo is closer to throwing rocks or firing nukes. I suspect we're still fairly young in the evolutionary process.

J13 on May 29, 2008 06:39 AM

@Mecki: "For a human being, a simple question like "Which animal can fly? A lion, a bear, a monkey or a bird?" is trival to solve. For a computer, this is impossible"

True, but a even the dumbest computer has a 1-in-4 chance of guessing it at random.
For such questions to work they have to be more open-ended, rather than selecting from a limited choice. Which just ends up frustrating genuine users.

The other approach is to go for simpler multiple choice questions with far more possible answers. This at least reduces the hit rate from guessing. (e.g. show a 20 x 20 grid of coloured squares and ask the user to click on the red one. Reduces the hit rate from 1-in-4 to 1-in-400).

Graham Stewart on May 29, 2008 06:41 AM

Possibly objectionable to users (as it would require more effort), but could small culture-tuned rebuses be used to represent CAPTCHA phrases?

MDN on May 29, 2008 06:52 AM

I've been studying PHP's CURL (Client URL) library, some python, and ASP.NET's HttpWebRequest to figure out how form bots work. I think you have to create a few bots yourself to thoroughly understand how they work and what countermeasures will defeat them. For example, JavaScript validation will not affect a bot. You also need to understand cookie jars, user agent spoofing, and referer spoofing.

But I still don't understand how IP proxy sites can be used by bots and this is important because many people still think IP address blocking is effective.

Robert S. Robbins on May 29, 2008 06:55 AM

Every ecology develops parasites. Unless we, as a culture, dedicate ourselves to tracking down and neutralizing anyone who games a system, the parasites win and flourish.

What we can do now is to develop resistance mechanisms - something that's already happening.

Wait until we start throwing AIs into the mix. That's when things will *really* get interesting.

ThatGuyInTheBack on May 29, 2008 06:55 AM

I used to run a dating site, and I used many techniques which I developed myself. The legality of some may be questionable. My favorite was "poisoning" an account: the user would have no clue that anything was wrong, they could post and interact as normal, but noone would see what they wrote (except for other poisoned people). Also "discourage" mode would randomly delay a user's page loads and discard a percentage of posts - a way to get people to WANT to leave, rather than wanting to get back in with another account. I also assigned each user a hidden "risk" based on a variety of factors like ip country, which was offset by "trust" gained by being a decent member for a while. I never took the easy/pseudo way out by banning entire countries or ip blocks.

I think the best way to handle these things by far, is to not let the enemy know that you're on to them. Don't give them a reason to upgrade their weaponry. Let them waste their time and get shoddy results. However, this is deceptive and could be illegal. It might not fly in a big corporation.

I also developed a system to automatically poison people if they send messages that exceeded a risk threshold for words like "nigerian prince" and "millionaire". I had 100,000 members and I kept things pretty clean. Never had need of a captcha, maybe because signup involved some javascript & custom image clicking.

French Horn on May 29, 2008 06:58 AM

I have experience in implementing anti-spam filters.

What works:
* throttling individual IPs, network blocks, unique e-mails, e-mail domains, usernames, etc. with different limit for each (and per hour, per day). Trending is really powerful.

* banning of IPs (look at X-Forwarded-For too, and see XFF project).
Spammers eventually run out of open proxies and cheap VPSes.
Unfortunately you have to keep large whitelist and take off bans when IP stops spamming (because of hijacked windows machines spamming from average joe's IPs)

* statistical (bayesian) filtering does work well if you use 2 or 3 word sequences. If you have a lot of incoming ham and spam, occasional spammer trying to game filter won't skew it and it even might learn to recognize these obvious attempts.

sblam on May 29, 2008 07:02 AM

I have to ask why this blog seems to get so few spam comments when the CAPTCHA word is always 'orange'. :|

Nidonocu on May 29, 2008 07:02 AM

Probably either a terrible idea or one that goes against some principles behind stack overflow. So think of this as an idea for some other site, preferably one that is so self assured that it doesn't mind a) making users jump through a few hoops to sign up and b) thinks it can charge a small fee for people to contribute and still draw people.

The idea is that to contribute you have to pay a small nominal amount of money - I'm thinking $2 to $5 - as a kind of good behaviour bond.

If you want the money back or want to stop contributing you cancel the account and 14 days later you get your money back. If you've been flagged as having done nasty stuff then you don't get your money back.

A few bonuses:
- interest on the money can be used to help run the site.
- the cost is a disincentive for people who might otherwise poke around and look for exploits
- if you need both email and a credit card to sign up (and if there's some idea of uniqueness of both) then you've got something approaching two factor auth.

A few drawbacks:
- a hassle to sign up
- locks out people who don't / won't / can't use a credit card online
- more to be managed for the site, including more security and especially accounting headaches.

So not a serious suggestion, just something to think about.

I guess if you wanted less hassle you could use an invite only model and only allow a small number of invites to be sent from each user and prune the tree and / or penalize people who invited spammers / hoodlums. But then you need arbitration for false accusations...

Dave on May 29, 2008 07:07 AM

Also I think craigslist bears most of the blame here. Either their programmers truly suck, or Craig is holding the reigns too tightly.

Require user accounts, with a long slow verification process, instead of annoying verification for every post. And for christs sake, add some features. The internet now has image capability Craig. Everybody with these big sites is so afraid to change ANYTHING because their business model might explode. Have some balls.

French Horn on May 29, 2008 07:07 AM

I'd like to see a site that used clever CSS and extra textbox honeypots to make it hard for a bot to tell what fields it should be putting data into. If you get data back from a field that the human users shouldnt even see, dump it in the bin. That'd be tricky from an accessibility standpoint though.

Gary on May 29, 2008 07:19 AM

HB:
>What about something like when a user does a post it sends an e-mail to their account with a one-time link in it that they neede to click on to activate their post?

>I know you can check e-mail via programming to find the link to click back to, but it makes certain they have a valid e-mail address at least.

I have seen this or very similar on craigslist already...it has been defeated as well. This also gets into the territory of it being way easier just to sign up for an account, which from what I gather does not meet Jeff's goal, he simply wants a way to not force a walled garden on people by allowing anonymous posters.

DanaL on May 29, 2008 07:20 AM

I think the people that say it is obviously spam are out of touch with the common user.

Do you know those banner ads that pretend to be Window's Update notifications or something. I've watched as people clicked those. I tried to stop them, but I was too slow.

While it may not be a problem for a site specifically designed for computer programmers, not everyone on the internet is as savvy. Most people don't expect to be fooled.

psycotica on May 29, 2008 07:23 AM

The other problem is that when someone does come up with an effective method for deterring spam that can't be worked around, the spammers fight back and fight back hard.

http://en.wikipedia.org/wiki/Blue_Frog

Harv on May 29, 2008 07:25 AM

Selling (and maybe using) automated spamming software should be a felony with harsh penalties and it should be strictly enforced.

Mike Cohen on May 29, 2008 07:27 AM

>Wikipedia is a living testament to the fact that goodness vastly outnumbers evil.

Not everyone agrees that Wikipedia is good, these intelligent (design) folks seem to think it's evil, so have started their own wiki trunk:

"The following is a growing list of examples of liberal bias, deceit, silly gossip, and blatant errors on Wikipedia. Wikipedia has been called the National Enquirer of the Internet:"
- Conservapedia: http://www.conservapedia.com/Bias_in_Wikipedia

Couldn't resist... :) Interesting post Jeff.

Adam Kahtava on May 29, 2008 07:29 AM

It seems that best way to beat an automated tool is with a human response. Why not disallow anonymous postings and require an account to be able to post new messages?

New users would have their posts moderated and they could only response to an existing thread or subject. When they post a message, only the person who originated the thread would be able to see the message. They would decide if it's real or spam and take the appropriate action. If it's spam, that account gets tossed. If it's a real message, it gets marked as visible to the rest of the community. Messages not acted on within 48 hours get automatically purged.

New users would have some sort of threshold where they need to have their first 3 to 5 messages moderated before they become a standard user. It does pass some extra responsibility to the person starting the thread, but you get to load balance the message moderation across the user base. You could even open the moderation so that anyone who had previous participated in that message thread could moderate the new messages from unvalidated users.

Granted, this would be an annoyance. But it's a short term annoyance. I would put up with some initial annoyance if I knew that it would keep out the widows of Nigerian Princes.

Chris Miller on May 29, 2008 07:29 AM

Nobody will accept my spam defeating technique.

MAKE IT LEGAL TO VIOLENTLY MURDER SPAMMERS.

dnm on May 29, 2008 07:39 AM

If you want to stop linkspam - just disallow html except from trusted users...a healthy portion of existing internet content is already advertising. I don't know why people think the situation would be different with user-contributed content. It seems like bloggers like you -- I read and enjoy your blog regularly -- want to have it both ways. Implicitly, you want the benefits from user-contributed content but you don't want to do the manual work that is necessary to police it. Think about graffiti, how do people handle graffiti? I bet people handle it more with scrub brushes than they do with laws. You ever see those smart businesses that have a wall so attrative for graffiti that their best strategy is to hire a talented graffiti artist to create a mural? You need to quit pretending that people submitting forms to you are committing some sort of crime,and start thinking about a way to turn it to your advantage.

joe on May 29, 2008 07:57 AM

Ascii art captcha, that is awesome!

For using images of cats. Lets say there are 6 images, 2-3 of which are cats. Sure, if you have only 100 cat images, you could theoretically md5 them all, but if you actually serve 6 seperate images.

So instead of serving them seperately, 'glue' the images all together into one large physical image in a scriptable image manip program before serving. At the same time, you can generate the supporting code needed. If a few of the sample images are procedural, along with the background for the whole image, that trashes the MD5 trick.

Daniel on May 29, 2008 07:58 AM

One of the (maybe) possible solutions: After comment gets submitted, show it immediately, but then run background process that will send mail to gmail account, and then check if the mail ended in Inbox or spam folder. Although, I'm not sure that google would be happy with such (ab)usage of their service :)

Mihael Konjevič on May 29, 2008 08:16 AM

Nice post Jeff, keep the hard work

Amigoro on May 29, 2008 08:33 AM

Jeff,

it is apparent that you never used Craigslist. Had you ever used it to connect with buyers, sellers or to look for local help (landscapers for example), you would have learned that positive experiences overshadow the spam problem. Recognizing spam is very easy when dealing with people locally.

You should lighten up a little bit also. Read some Rants & Raves on craigslist for that.

BugFree on May 29, 2008 08:37 AM

BugFree,

it is apparent you cannot read. Jeff clearly stated that he HAS used CL. Further, before posting, he used sections of CL he had not used before to verify John Nagle's claims.

RTFA

Buggy on May 29, 2008 08:42 AM

To address a couple of issues. It has been pointed out that in the scenarios mentioned above there is a limited number of cat pictures (3000 for MS I believe). One way to deal with this is to continue to expand and change, using a method similar to how Recaptha does it. Taking the cat images as an example, we can surely make changes to the images to change the MD5 sum and filename to try and fool machines, but we can also add a couple of random images from Flicker and/or other image services to each test. Keep track of which of these added images have been tagged as "cats" and once they have been selected [by humans] enough times, then they become part of the cat pool. These images of course would not count for or against the determination of a human. Don't forget, we must also expire images after they have been around too long, a success rate of even 1% for a machine can add up.

I like the idea of a database of questions that are specific to the website in question (i.e. programing for a coding site, hiking for an outdoors site, etc.) the questions could be even a little difficult and require some research. But again you would have to continue to add new questions [not from easily available public sources] and expire old ones. Not an easy task in this case.

It is a good thing that for most websites just a little deterrent is enough to keep spammers at bay.

Phil on May 29, 2008 08:56 AM

@Mike Cohen:
What about mailing lists? That is a legal and useful use of what some would term spam software. However, now we get into the tricky area of the DMCA. Are you for the DMCA?

An idea that just occurred to me is that, make the terms of use for your site stipulate that any form of bots or scrapers is tantamount to reverse engineering... which of course, is disallowed under the DMCA. Then, level a dmca notice at the company being advertised.

It has a few flaws, I know, but maybe some feedback?

Zeroth on May 29, 2008 08:58 AM

Personally I feel it's a theoretically impossible task - the difference of bots and humans is the concept of humanity itself. We keep seeing that in order to overcome bots we keep making the validation technique more and more human; but it fails..

This is because it's like the allies are having a secret meeting at a nazi headquarters - in german. IMO the only way you can avoid it is by developing a medium of information that computers can't intrude - speak in a language nazis can't know, or get out of there. Not something they will have trouble with, but something they CAN'T.
Another thing is to eliminate anonymity - which is a scary 1984'ish idea..

Leafy on May 29, 2008 09:17 AM

MySpace has been fighting this battle for years. They're mostly sorta on top of it now, to the vast detriment of the flexibility of markup and javascript that they allow you to use to pimp your profile.

Since adding their new developer platform (with OpenSocial), there's been an upsurge in spam from badly coded apps. They're plugging those holes pretty quickly, but some of the solutions involve limiting access by apps to the system.

*Sigh*

I agree: you need someone on your side who understands the mind, tools and tricks of the enemy. That is, you need your own private police force.

Christian Knott on May 29, 2008 09:22 AM

Jeff,

you will need a (trained) Bayesian filter to win spam battles. Spam is your friend - it trains the filter which in turns becomes more effective against spam. Best way to fight spam is to use it against itself.

Captcha is annoying and useless, as there are scripts out there that can work around it. Nothing can work around Bayesian filters.

BugFree on May 29, 2008 09:25 AM

I'd be curious to know some statistics with regard to how many posts each section gets in a day. This problem seems ideally suited to traditional machine learning techniques, but maybe the size of the data sets makes it infeasible for Craigslist. Assuming the data set was small enough or Craigslist had sufficient resources to allocate, something like an unsupervised learner which clustered posts based on a series of attributes and then using community input to label the clusters (something which is already happening with the 'Mark as Spam' links on each post).

Erik on May 29, 2008 09:26 AM

CAPTCHAs are incredibly annoying for the good guys, and don't actually stop the baddies, so give up on them please - use the amusing alternative where you get 3 random tiny photos, and you have to click on the kitten.

I assume the most popular protection methods are going to be targeted by the spammers first, so using your own off-the-wall solution might actually work best of all!

Mags on May 29, 2008 09:26 AM

Ah, but sales of real Spam have sent Hormel's earnings up 14 percent.

http://www.nytimes.com/2008/05/23/business/23hormel.html?ref=business

.

Charles on May 29, 2008 09:26 AM

I have 2 cents to chime in about this topic. And it's a very philosophical 2 cents really, so bear with me or just skip over.

I notice that SPAM is almost developing into a hive mind that creeps into the regions which "deserve" it the most. Yeah, sure there's email spam, but one bayesian filter later I get practically no spam ever in my mail. Gray listing is also very powerful.

But what I notice on sites like ebay and craigslist is that whenever we as a society get lazy and try to do things "the easy way" - think of all the board room meetings parodies that we've all seen by now where a 22 year old "genius" says "We'll bring the pet store to their fingers and make profit" - whenever we do that, spam shows up.

I mean personals are a notoriously lazy way to date and really, even without spam, I could never trust a posting on something as anonymous as the internet. Call me untrusting or cooky for that, but it's simply absurd for me to try and reconcile something as intimate as dating with something as anonymous as the internet. Go clubbing/jogging/walking your dog if you want to meet random people.

All in all, the point I'm trying to make is that SPAM creeps in to places where we stretch the reach of our daily experience more than it's meant to go. The reason why craigslist or any other online site has difficulty separating SPAM from HAM might be that there's almost no difference between the two: indeed, how could you possibly tell if a personal ad is genuine or not?
In that sense, I think for stackoverflow.com that so long as there's a difference between genuine content and content to make profit, you will have no problem getting rid of spam. Patterns will be easy to detect, text will be easy to recognize, and you can even setup tests that are extremely task specific like programming language based riddles or whatever.
As soon as you introduce "profit making elements", like "hire a coder" style stuff, you will be faced with impossible to detect SPAM squall.

There's a reason why only certain areas of craigslist are spammed. When you are looking to buy cheap $5 stools from people moving out, you are unlikely to be a big spender and your tolerance is very low; you are after all looking fora five buck stool. But go to the real estate for sale area, and you will get hundreds of spam postings. In the same way, spammers naturally go to places where people have to either be gullible to begin with or lower their paranoia threshold to be able to participate (like personal ads).

My 2 long cents.

blah on May 29, 2008 09:30 AM

Buggy,

"stating" and "doing" are two very different things. Anyone who has used craigslist exctensively will know that spam is just noise. As I just pointed out in another post, if you want to put spam on a backburner, use a Bayesian filter. Captcha is too primitive for that.

BugFree on May 29, 2008 09:31 AM

What about a Flash based CAPTCHA - put its impenetrability to use.

Ted on May 29, 2008 09:34 AM

Captchas (specifically pictures of words) have been broken. I spent a day researching the state of the art for breaking captchas and it turns out that there is code available out there (OCaml and Python was found in about 10 minutes). The supposed gold standard of captchas, GMail's sign up, has been reportedly broken. If you actually sit down and spend a day thinking about how to break it, and you're even remotely talented at programming, the solutions become pretty obvious. Clearly reasonably talented programmers are doing this (what programmer do you know that knows OCaml, but is also incompetent?).

Another issue is that certain types of attacks can be jump started by human interaction. It turns out that if you are spamming a site over and over you get a pretty good idea what the correct answers to captchas are. If you have a team of low paid workers work an hour on entering them, then that is usually enough of a seed to overcome the captcha. This can also work with the aforementioned pr0n site redirecting.

Realistically, for a spammer to be effective they only have to get the captcha right about 25% of the time. Anything with choices that can be guessed (like 4 picture options with kittens) is an immediate fail.

The underline problem is that there is HUGE money in this activity. An out of work programmer could easily support himself. I know people who own million dollar per year businesses supported by this kind of activity. And those businesses are for a specific niche, I can't imagine what a general situation would be like.

J.V. on May 29, 2008 09:38 AM

Go read about Asirra before posting something like "oh, but spammers can build a database of all the 100 or so images and do an Md5 hash to determine which are cats and dogs".

Asirra has a database of about 3 million images and it's always growing, thanks to their relationship with petfinder.com. Imagine if all pet websites would contribute - Asirra would probably grow significantly faster than spammers can keep up.

Although, the fact that a user has to organize 12 images into 2 categories means 1/2^12 = 1/4096 spam posts will still go through. But, combine this with requiring users to register for an account to post, and give users the option to flag posts as spam, I think this could be very effective.


With as popular as this blog has become, I'm surprised your ORANGE captcha still works so well.

KG on May 29, 2008 09:39 AM

@keppla

It may only be 4096 possibilities, but it resets when you get it wrong. It'd be pointless to blindly guess.

Ryan

Ryan C Smith on May 29, 2008 09:47 AM

Clearly no one here knows that the Old English typeface is an impenetrable cloak

Erik on May 29, 2008 09:50 AM

While the idea of having people prove themselves by adding non-spam content is attractive there might be an initial hurdle with people being disinclined to try out your site if the first time they try to contribute they are apparently ignored because they need to be moderated.

Could we address this by building a trust metric on top of OpenID? I'm thinking of something vaguely like Advogato, except you build up reputation on several participating sites and that serves as your letter of introduction to another site that trusts those sites to know who to trust...

Damian Cugley on May 29, 2008 09:58 AM

I've always wondered if a CAPTCHA that 'just' targets URL's is available. If a user were to spam a product then surely the only way to get anything out of it a URL must be added to the message?

Perhaps the new method of fighting spam will consist of a centralised, human group that moderates every URL posted, checks it personally and verifies the message. Imagine something like Akismet for WordPress, but run only for URL's to verify whether the message is spam or not. I can imagine a centralised website that verifies every URL posted in any Forum or Blog software manually using paid workers could be very effective, although I'm positive that it cannot be that simple.

Mike on May 29, 2008 09:58 AM

One thing you failed to mention about Wikipedia is the vast amount of bot work that reverts vandalism over there. If humans were solely in charge of keeping Wikipedia in good shape then it would be in shambles. There is an IRC channel that receives every edit done to Wikiepdia, a bot then check the page for known bad URLs and string and reverts if necessary. Also Wikipedia as nofollow = true for all external links.

Dave on May 29, 2008 10:07 AM

While captcha obviously isn't a perfect solution, better ones can help: http://alipr.com/captcha/

That uses a two pronged test: image recognition by clicking the geometric center of various superimposed images and by identifying an object in a random image.

A little excessive for many sites but I would imagine alot hard to circumvent by automated means.

charfles on May 29, 2008 10:19 AM

"Spam is so insanely cheap to produce that it just takes one or two idiots per million clicking on it to make money." -- Rhywun on May 29, 2008 05:39 AM

Aaaaand there's your spam business model. The cost per transaction is essentially free.

Add a "cover charge" and you'd demolish that profit model. For example a smallish-but-not-micro credit card transaction (say, $5) to be able to post to the site forever.

As a happy side effect it would verify the reality (if not exactly the identity or humanity) of every prospective contributor. There would be no need to correlate user reality with site identity, thus preserving anonymity.

I was a CL true believer for years but I quit using it 2 years ago ... not just b/c of spammers but b/c the quality of all interactions was in a rapid decline. I would try to sell something at a reasonable -- no, an INSANELY CHEAP -- price (hey I just want to get rid of the thing, that's why it's not on EBay) but no matter how cheap I'd price it I'd get a flood of jerks offering literally nothing, with a healthy heaping of insults as well. And I won't even get into no-shows, abusive followups from no-shows (dude, I sold it to someone else because you NEVER SHOWED UP), and other general ass-hattery.

If CL just offered a teeny tiny cover charge the quality of interaction would skyrocket. Not because the service is "worth" the cover charge, but because of the very existence of the cover charge.

Paul Souders on May 29, 2008 10:26 AM

I'd recommend reading chapter 21 "Breaking the Rules" in "Rules of Play" by Katie Salen and Eric Zimmerman. (Then read the rest of it.) They are not talking about games rather than software, but I believe the ideas apply to the type of social software you are looking at.

While Salen, and Zimmerman are not specifically targeting your problem, the entire subject of gameplay lends itself to the type of 'social interaction hacking' that is required to avoid or mitigate these problems.

Steve Steiner on May 29, 2008 10:29 AM

"What about a Flash based CAPTCHA"

That seems like a really good idea - or an animated GIF or something. Is there a good reason why that might fail?

HB on May 29, 2008 10:33 AM

When doctors are trying to fight cancer or HIV, they use a cocktail of drugs - it is harder for diseases to adapt when every environment is different. That's why it's important not to hang your spam fighting strategy on one approach. (or even captcha implementation)

When analyzing security, one metric is the resources an attacker needs to muster in order to defeat the measures you put in place, and when the cost of spamming outweighs the value of the prize, generally you're safe. For your blog, the Orange captcha is probably fine because the existing tools that harass Wordpress and Movable Type users won't work out of the box so the cost is high enough not to be worth it. There are plenty of people with old and unmodified blogging package installs to give the Googlebot plenty of links to digest. (Although I am a little surprised you don't nofollow the links in your comments)

It's helpful to look at the measures that different sites put in place. On Slashdot, karma, the ability to moderate posts and comments up and down, is not distributed evenly to all users; it is doled out randomly. That is why Slashdot is gamed a lot less frequently than Digg or Reddit even though it's been around longer.

On Wikipedia, most of the juicy topics have people who really care about them and watch edits obsessively. Also, all edits are out in the open, tagged to at least IP addresses, and it is very easy to roll back undesired changes. Plus the external links are 'nofollow' to decrease the value of getting your spam up there.

Facebook sends external links through a redirect page and LinkedIn 'nofollow's them in attempt to devalue links on their sites.

Craigslist, by showing posts strictly chronologically, allowing anonymous users to flag an unlimited number of posts, and allowing almost all non-script HTML entities and elements invites bad behavior in areas where spam is profitable (real estate, services, etc.) There is a lot of innovation in the 'spamming Craigslist' space because of the enormous amount of traffic they get. It's a cat and mouse game - the Craigslist folks work hard to keep their identity while fighting spam, but they are outgunned and outnumbered. It is easy to generate a good income by spamming Craigslist if you're halfway good at it.

Cameron on May 29, 2008 10:34 AM

@HB:
Probably, yes.

You only need to solve one CAPTCHA term. If the flash or GIF CAPTCHA has 30 frames of animation, you have 30 different views of the term you need to solve (rather than just one), which would help heuristic OCR processes tremendously.

Eam on May 29, 2008 10:39 AM

Yet another suggestion:

You keep track of users such that you know who's a newbie, and who's trusted. Then for each new anonymous user, you pair them with a trusted user who is also about to submit. The trusted user's job is to verify the answer that the newbie was asked.

This means that the questions can get pretty arbitrary, like "Name a difference between these two pictures".

Any newbie would have to wait for a trusted user to use the system, which would increase frustration, but a spammer would be blocked by the system itself. Oh, but wait; sufficient spammers would choke the system and no one would get verified. Yeesh.

Yeah, this is hard business.

Rob Chansky on May 29, 2008 10:43 AM

Say... animated captchas is a great idea!

If you generate a piece of text rendered in 3d, then rotated in place in a few frames with lots of noise added. You could then add far more noise (because the human brain is great at picking out movement) into the animated gif than you could get away with for a stationary gif.

I do believe that would work... for a while.

Rob Chansky on May 29, 2008 10:47 AM

Lot's of people suggesting photos or multiple photos so you get the benefits of exponentials along with an adjustment to the image to stop any hash maching.

How about magic eyes I know they are computer generated but some one would have to write code to reverse it and ocr it (if even text). Obviously again you would probably need multiple for validation and stop hash matching somehow.

Although this would be a problem if you can't do them I don't know how many can and can't.

pete on May 29, 2008 10:53 AM

The spammers are getting paid by the company at the end of their many, many links. The links the place go to someone who handles monetary transactions. This leaves trails. Why not seek remediation from the site at the end of the link? Spam will always be here until/unless this is done. Spammers are just contractors. We should go after the $$$ paying them.

Stephen on May 29, 2008 10:54 AM

Just saw the buddy system post sounds good assign a new user to an existing trusted user who reviews there posts for a short period and approves them. If they get disapproved all their posts are removed.

pete on May 29, 2008 10:54 AM

Craigslist is just the new Usenet.

There is no way to stop the spam, really.

The only thing to do is to run small, targeted sites using nonstandard APIs/paths/names/protocols; then there is no one big payoff for automatic spamming tools to target (major wiki software, major blog software, smtp).

Reed on May 29, 2008 10:54 AM

Cardspace

Brian on May 29, 2008 11:01 AM

How about making users interact with an obfuscated signed applet instead of just HTML/CSS/JS? With a secure applet running on the client machine, there are more options such as: encrypt verification traffic, confuse the client machine (and not the human actually using a browser), etc. You can even verify that the browser is acutally a browser or inspect the client machine for spamware! C'mon, fight sophisticated software with same! Not a hard problem to outcode the evils...

peet on May 29, 2008 11:12 AM

"But here's the problem: following the rules and being a good citizen is easy. Being evil is hard; it takes more work."

You didn't listen to master Yoda...

Luke: Is the dark side stronger?
Yoda: No... quicker, easier, more seductive.

The problem for craigslist or wikipedia or the godforsaken wasteland of usenet is that it's so damn *easy* to be evil, and to do it anonymously. It takes moral character to be a good citizen and follow a social contract when there's no real punishment for violating others.

infidel on May 29, 2008 11:28 AM

You can keep it anonymous with registration like so:

1. You must register a username to use the service. You don't need to provide any personal information, not even an email address. There is no callback verification or captcha or any of that stuff.

2. You must apply from a non-blacklisted ip address (automatic blacklist of anonymisers, or other portals that spammers like to use). Note that this blacklist is ONLY for registration.

3. The system keeps track of what IP addresses are being used to register. If too many accounts are generated from that address, the system suspends that IP until a human decides to make the ban permanent or not. Perhaps allow a way for the human to raise the maximum allowed users for an address (thinking of universities with nat firewalls here).

4. Once you have registered from an allowed address, you may post all you want. If your posts get flagged for spam, a human comes in to check, and if it is spam, your account gets banned and the registration IP you used is blocked.

5. Add a way for a user to petition to have his IP address unblocked (perhaps his machine was commandeered into a botnet).

Karl on May 29, 2008 11:35 AM

J13:

Except that the "Star Wars" shield never worked and probably will never work as advertised (total ballistic missile defense against a full nuclear exchange). Which actually makes it an apt analogy: get things to the point where the defense against them is too costly and too difficult to pull off. It's been done with nuclear weapons—there is no effective technological defense against them except for counter-attack.

The question is: are the spammers the one making nukes, or are they making missile defense? I think the former, unfortunately—finding ways to keep people from spamming is a lot harder than spamming, on the whole.

Shmork on May 29, 2008 12:10 PM

It just seems to me that if there is money involved, then no matter what you do some body some where will figure out a way to break it. I disliked the "programming question" solution posted previously. I am not a C++ programmer. I read this stuff because I want to be a decent programmer in the future, but certainly couldn't state if given a list what one word of these four is a reserved word. And that would then exclude me fom participating.

Craig on May 29, 2008 12:25 PM

2 thoughts on how to counter spam

1) follow the money. the spam must have away to find whoever is paying for it (or why would they pay)

2) spam the spammer. reply to the spam with as much useless information as you legally can. Give them 10M useless email addresses. If you can do it morally, DDOS them. Set up voluntary bot nets (nospam@home) that make it as costly as possible to get anything useful out of spam

BCS on May 29, 2008 01:06 PM

"Say goodbye to your two best friends! And I don't mean your pals in the Winnebago!!" --Dark Helmet

Dubs on May 29, 2008 01:10 PM

It seems to me the only realistic way to prevent spamming is to have some sort of vigilante task force that identifies the root source of the spam and posts their contact info publicly.

If there was a known spammer who lived near me, even if they didn't affect sites I use, I would gladly do things to make their life miserable. Maybe not the most legal of solutions, but probably effective. Of course we would need support around the world, too.

MrGreg on May 29, 2008 01:46 PM

The real solution is to start charging.. 50 cents an ad isn't burdensome to the public, but devastating for spammers.

Right now spammers see themselves in a gray area. Might be breaking the law, but odds of being prosecuted are small. Few spammers would cross the line into credit card fraud.

Sean on May 29, 2008 01:53 PM

Hit 'em in the wallet.

1. Create a new tier of OpenID (or whatever) that requires a $1 buy in.
2. Keep a separate repositories of blacklisted accounts. Sites can share blacklists as they see fit (so one bad apple site doesn't poison the list with non-spammers)

If, as a comment poster, all your sites use this central ID then a single $1 buy-in will get you posting rights to all of them.

Requiring a buy-in for each site wouldn't be worth it (to me, at least).

Sites with a pay-to-post model usually have a very high signal-to-noise ratio. You can look at SomethingAwful, for example. While it has its own share of shenanigans going on, it doesn't have a problem with spammers.

Ron H on May 29, 2008 02:10 PM

Why aren't captchkas simple questions that would require AI far more complex than what we have right now? For example, What was the last name of the wife of the 41st President of the United States?

Or if that's too easy, something more complex, but that would still be easily solved by a google search.

Sean on May 29, 2008 02:13 PM

@Phil re:MS Cats

I believe they are cycled in and out as the pets become adopted, as this is a secondary goal. By the time a bot has picked the right combination (which by the way would be really tough because it doesn't provide any feedback as to which ones it guessed wrong...and you get a whole new set on each fail).

I think even in a fairly small pool of 3000 that get rotated in and out, guessing is a losing game.

Ryan C Smith on May 29, 2008 02:15 PM

I really liked WesleyC's comments. There's a guy who is looking at the entire game board.

"http://thephppro.com/products/captcha/"

"An encrypted timestamp (combined with something unique about the user--perhaps an IP or user agent?) placed in a hidden field--if the form is older than one hour or so, or the IP/user agent doesn't match, block the submission."

"A text field or textarea with an easily-readable, spam-worthy name, such as "comment" or "post", placed off the viewable area via CSS positioning--if it's filled in, block the submission."

The comments noticing that there might need to be a higher 'cost' (CAPTCHA or some other way of testing for a human) for posts containing URLs, or being considered more spammy after Bayesian evaluation, are also on the ball.

Looking forward to your implementation of Stackoverflow, Jeff and Jarad!

Bryan on May 29, 2008 02:43 PM

> (my favorite at the moment is "no murderers please!")

It is a funny thing to say, but unfortunately it's also a valid concern. There are worse things than spammers. A girl I knew in high school was murdered as a result of responding to the wrong person's ad on craigslist: http://www.cbsnews.com/stories/2007/10/29/national/main3422072.shtml?source=related_story .

Mark Tiefenbruck on May 29, 2008 02:45 PM

Spam identification is a computationally difficult task, so use it as your CAPTCHA.

Present the use with 4 messages, 1 known spam, 1 known ham and 2 others, and have them classify them. So long as they get the known spam and ham correct, you have a reasonable chance that they are human and that they have classified the unknowns correctly.

Larry H on May 29, 2008 02:57 PM

How do the black hats avoid their own message boards filling up with spam? Is it just the "good is dumb" thing and we "good guys" refuse to get down to their level and mess up their forums for swapping hints about how to mess up others?

Black Hate on May 29, 2008 03:38 PM

Sometimes I think the only way to end spam would be to amass a private army of henchmen-hackers, capable of tracing down commenters' locations and napalming the place unless innocence can be proved.

Remember that Russian spammer who got murdered, and the police admitted that literally millions of people have a motive? http://www.securityfocus.com/news/11256

If only we organized our hate...

pookleblinky@ on May 29, 2008 03:49 PM

I find it amazing how many ideas were posted on this blog alone (ones I hadn't thought of), yet there is some flaw in each and every one. Although the flash one or java one seem kinda convincing. Unfortunately, the better computer AI gets, the harder humans will have to work to prove their existence.

Dave on May 29, 2008 03:55 PM

I think one of the advantages that stackoverflow will have is that its subject matter is very limited--having a "discuss anything" area would be a bad idea, IMO, for exactly that reason.

In the Personals sections of websites, two things are problematic: it involves email, and it involves sex-related activities. Those are two things that spammers love and have.

Services also involves email, and is pretty easy to fake as certain services are often desired and you're going to get a lot of hits on those.

"For Sale" is harder to fake--each individual ad is unlikely to get many hits (with the exception of concert tickets or currently-popular electronics).

stackoverflow postings will be less likely to contain URLs or require email, and will definitely not be sex-related (let's hope)! I think the very nature of the site will make your job a lot easier than craigslist. It will also make it a lot easier for human beings to detect that something is spam.

The one thing to think about is attachments--if you allow anybody to attach anything, that allows people to attach redirects to websites and also to attach malicious JS that will operate in your domain.

-Max

Max Kanat-Alexander on May 29, 2008 04:01 PM

@Sean: "Why aren't captchkas [sic] simple questions that would require AI far more complex than what we have right now?"

Because coming up with hard questions is even harder than answering hard questions. Click on my URL to see how a bot can answer your sample question automatically. You had to employ a human brain to *invent* that question, and yet a machine cracked it in 0.27 seconds! Now imagine trying to make a question so hard that a machine couldn't answer it... and then try to imagine *making a machine* that could make questions so hard that a machine couldn't answer them.

There's plenty of research on questions machines can't answer; Google "The balloon hit a branch and burst." for more information. There's much less (useful) research on coming up with these questions in the first place.

But this is all academic. The right solution, as others have said, is (1) stop frequenting sites where spam is indistinguishable from ham, and (2) use human moderators to mod down spam, optionally using (2) to train a filter at the same time. It works for Wikipedia, YouTube, most competent blogs...

Anonymous Cowherd on May 29, 2008 04:07 PM

i see that the problem that needs to be solve is to differentiate human from a bot/machine. I was thinking of a predetermined places to click on a flash and then use the sequence of positions on the screen (which again changes on every re-load for better spam protection. I believe this will give better protection since it will be a timed way of doing it

vamsi on May 29, 2008 04:17 PM

Spam makes money because it costs very little to send a millions of spam messages (whether those messages are to a forum or to email addresses). Sending a 1,000,000 messages costs next to nothing, and with a click rate of 0.1% 1,000 clicks are still created. I think that attack to take is to charge for the message.

This is a very difficult thing to do on a small site, and impossible to do with your email address. How do you charge for people to send you email? Still, I think it is an obvious solution. What if it cost $0.01 per email? I send 5 or 10 emails per day. What is a dime for me to send email? Big deal! Plus charges could be avoided with white lists.

I am not the first to have this idea. I believe MS tried to push it a while back. The problem with email was getting everyone to use it. The problem implementing the same solution on a forum would be getting users to enter a CC number (or PayPal or whatever) just to use your site's comment section.

Someone like PayPal should consider implementing an API for this type of thing. Imagine submitting a comment on CodingHorror, then being redirect to a PayPal page for login to pay your 1 cent. Maybe it would be free for comments marked as good, and $1.00 for comments marked as spam...

Jason Jackson on May 29, 2008 04:28 PM

I'm a little disappointed this has turned into an extended discussion of CAPTCHA. If you have time, do refer to my previous articles on CAPTCHA, which covered all the suggestions (and more) I've seen outlined here:

CAPTCHA Effectiveness
http://www.codinghorror.com/blog/archives/000712.html

Has CAPTCHA Been "Broken"?
http://www.codinghorror.com/blog/archives/001001.html

CAPTCHA is Dead, Long Live CAPTCHA!
http://www.codinghorror.com/blog/archives/001067.html

In short, the Google CAPTCHA still works -- the amount of time necessary to get a response back from the "breaking" services is indicative of human intervention -- although some of the lesser ones have definitely fallen.

For stackoverflow, it's likely we'll use a lightweight "invisible" JavaScript captcha.

Jeff Atwood on May 29, 2008 04:48 PM

I think one of the reasons that spam is taking over the world is the mistaken philosophy of what Fake Steve Jobs would call the "freetards". Sometimes free is evil. If every commentary site in the world, if Gmail, Craigslist and similar services, required a $1 upfront payment to register an account then the spammers would go broke.

David on May 29, 2008 06:26 PM

A lot of commenters miss the point that SOF and other participation websites need to reduce the barrier to entry, not raise it. The spammers will always learn the ropes, so by making an overly convoluted path to normal participation just reduces real participation because either they are not used to the process or they can't be arsed, and the site dies. Spam drops off, but only because the spammers realize the site is a waste of their time.

I can't begin to count the number of bb's out there I've never bothered with because I don't want to sign up, I don't want an account with them. I'll never come back unless google indexes it and by mistake and I land there.

@dnm - I don't think you want the poster to be the primary moderator.
1. spammers will delete complaints that it's spam
2. posters will delete comments that their post is dog crap
3. posters will gain enough 'kudos' to allow them to spam, and then spam everywhere. (who watches the watchers)


Similarly:
4. You also have to guard against 'ganging up' on legitimate users. I'm sure there's a lot of spammers that are decent coders who could infiltrate the site to get high moderation status and abuse that power. It's also good in general because you get some right twits in this industry who think they're jesus's little brother in terms of worldly importance. They are more evil than spammers by far.

In a way I don't care if you've posted 1000 times or just the once. The only thing that matters is if you have something important to say. Participation, while great, isn't everything, and just means you have heaps of free time.


micro-payments:
I'm not giving my credit card info to some random company just so I can post. It's an idiotic suggestion. Do you trust every website you visit with your credit card details? Facebook is huge and there's no way I'd hand over that sort of info. I might think well of Jeff/Joel, but I'm not paying to make my 2 cents known.

Jeff is soliciting my response in the first place by placing the comment box there. Let's not forget Jeff gets paid through ad revenue by site traffic. I dare say this site wouldn't drive as much traffic if it was devoid of the little comments box.

Or maybe you think the free-ness of the internet should really be 2-tier. Those with a voice are those that can afford one.


So:
- commenting should be free of login if one wishes
- if people want accounts, maybe they get some minor privilege elevation (like pseudo-moderation) and the ability to post.

I like @French Horn's poisoned accounts, the alternate reality / honeypot. While I can see ways around it, it's pretty good to allow the spammers to think they're on top of things, while in reality they're not.

Ascii captcha is also damn brilliant. Not foolproof, but nothing can be.

I wonder if there could be a variant of the old-school style of verification I remember on video games: where they make you type in the third word of the second sentence on page 42, etc. The rendered page content becomes part of the captcha, which makes anonymous mechanical turks a little less effective (the unpaid variety at least).

KISS
Spam prevention has to be easy to use from the user, it has to be easy on admins, and it mustn't significantly raise the bar to participation. Remember that if a person can figure it out, so can a machine, because the machine is still programmed by the human.

freeman on May 29, 2008 07:15 PM

Like some of the posters here have pointed out I believe that spam is mostly an economic problem, and it will take an economic solution to fix it. CAPTCHA's and other forum/commenting moderation systems are really band aids and incremental advances to stemming the spam tide.

I think that the easiest way to eliminate spam, (or reduce it to miniscule levels) is to remove the financial incentive behind it. However instead of attempting to target the spammers by making each posting a monetary transaction we should instead be targeting the people who buy goods from these spamming agencies.

(Rough figures) Since it takes 1 person to buy something from a spam advertisement to make a profit for the spammer we should target that one person and fine/educate them for it.

Though somehow I don't see this solution being easy or simple by any means.

Dominik Grabiec on May 29, 2008 07:24 PM

Another thought on my last thought. If people spam spamers with junk replies then they will be forced to filter out our spam, and whatever they use can be used to filter them out them.
----
someone commented about really smart AI's; If an AI is smart enough I can't tell if it is an AI or not, I don't care. All I care about is if what is posted is useful for me.

BCS on May 29, 2008 07:25 PM

All CAPTCHA's can be beat. It's just a mater of cost. So use that.

Ship the commenter a product of two primes and the JavaScript to extract the primes. pick ones large enough that it takes about 15 seconds. Users wont care but you just cost the spammer 15 computer seconds per post. That jacks up the cost of spamming. If you can find other problems that are hard to solve and easy to check start using them as well. Even better find someone with such problems that will pay for them to be solved.

(this had better be my last post or y'all think /I'm/ a spam bot :)

BCS on May 29, 2008 07:59 PM


Craigslist also uses rDNS as an antispam measure (when sending emails).

Spammers concentrate their efforts at major and active sites. So unless StackOverFlow becomes one, I don't believe it will be a target.

I don't see spam on this site and it's pretty active.

A flash captcha sounds good. To make it really tough for spammers, create an animation where the image of the text is split in two parts and scroll them sideways at opposite directions and when they meet at some point, they create the text. I don't see how any software can figure this out.

Abdu on May 29, 2008 08:04 PM

I hope Craigslist survives... I found my current apartment (I live in NYC) there so easily and the thought of paying a realtor again makes me sick. Even the newspaper is a horribly stressful method here.

Rhywun on May 29, 2008 09:13 PM

I hope Craigslist survives... I found my current apartment (I live in NYC) there so easily and the thought of paying a realtor again makes me sick. Even the newspaper is a horribly stressful method here.

Rhywun on May 29, 2008 09:24 PM

oops

Rhywun on May 29, 2008 09:24 PM

Pretty much every technology, from the rock forward, was invented for the purposes of Good. (Caveman Ogg smashes wheat with rock, makes flour.) Almost inevitably, someone eventually comes along and uses the technology for Evil. (Caveman Grogg steals Ogg's rock, smashes Ogg in head, pwns Ogg's rock.) This has been repeated many, many times. We always think the problem is with the technology. (Maybe if we wrapped the rock in something soft so it couldn't hurt people... or if we only issued rocks to people we trust... or invented rock-proof armor...) Maybe we should investigate fixing people and not technology? :-)

Karim on May 29, 2008 09:49 PM

Queue Zed Shaw with Utu: http://savingtheinternetwithhate.com/
Think it's time for a new protocol that can punish spammers?

Nick Retallack on May 29, 2008 09:52 PM

One thing worth noting is that a would-be spammer can use a captured captcha image from your site as a captcha on their site, thereby getting humans to do the OCR. One way to undermine that strategy is to include information that identifies your site in the captcha. Similarly, bits of text that are obviously irrelevant (to a human) can break OCR-based attacks.

Nick Argall on May 29, 2008 10:01 PM

Reading over the comments here there are alot of good ideas for how to implement a better captcha. (Which is of course missing the point, but there we are.) Given motivation ($$$) and time, every single one of those can be broken.

While I was pondering this (and of course, thinking of my own ideas for a better captcha - javascript animations that follow a mathematical path onscreen to form legible text after a few seconds, semi-transparent (alpha-based) overlays to reveal hidden text like those magic glasses or color-blindness tests, liberal use of XHR to get pieces of an image, etc) I had an enlightenment: the key insight behind a captcha is not that people can real squiggly letters better than a machine, but that people are flexible where machines are not. Add into this that machines see webpages much differently than people do, and that while good is dumb, evil is lazy if we let it be. The spammers and scammers aren't going away, so let's make them work for their .000001% success rate.

A simple bot can easily repeat steps. Visit this url, grab this hidden field value, post this next step, find the image, OCR it, enter text. Changing that sequence even a little - say, changing a field name - will probably break some bots. Changing it alot will probably make the bot authors rework their programs. A person, on the other hand, will notice zero difference, because the browser hides all that from them. It's not difficult for the bot writes to rework the programs - dump the page, fix the field values, set up a new url sequence, whatever. But it still takes them time. And in that time their bots are non-functional. So - change the flow of the submission. Often. Automatically. Think of 10 little things that you could change about the form submission that would each break a naive bot. Code them all up, and have the form posting method switch to the next one once a day. If it takes a spammer a day to rework the bot and distribute it, you've bought yourself 10 spam-free days. If you can think of 10 things that are independent and can each be included or not, you've bought yourself a year.

In a way, spammers on websites - blogs, CL, etc - are much like email spammers, with one very important difference: we (the good guys) completely control the admission path. So we can make the admission path complicated (put instruction text in webpages, hide the text various different ways) , and that puts the web spammers in the position of trying to figure out what is the spam and what is the ham in the form submission process. For instance: say you have an ok image-text captcha. Instead of having the user type the funny letters, instruct them to type just the first three letters. Now figure out a bunch of different ways to ask them to do it. Now the spammer-bot is in the position of needing to not only decode the captcha, but to parse the english text next to it. "Type in the first 3 letters in the box above." "Enter the frist[sic] three letters above." "Key in the letters before the E appearing in the image above". Not too hard to come up with new schemes here. Not too hard to break them either, but it is work that needs to be done. Keep them on their toes.

Any bot can be stopped - until the programmer adapts it. Any scheme can be broken, given time. The key is to take away their time to do it. The problem is that it takes our time to do that also. Which all in all, makes alot of sense: the spammers can be stopped, but it's a full-time job, because it's their full-time job to get by you.

Evil Otto on May 29, 2008 11:07 PM

Actually, that 99.9% spam claim may be high, but not by much. On this CL city that has a "combo" area/city name, all the ones that use the same name as the CL site itself ("visalia-tulare"), are faked:

http://visalia.craigslist.org/search/cas/?query=w4m

No real person says they are from "visalia-tulare"... it's one or the other, or some other nearby town, and all the single-name towns on this page do not exist near the "visalia-tulare" area. In other words, not a single "real" post on this page. Scanning the first 300 yields only two that are *possibly* legit.

AnonymousTul on May 29, 2008 11:30 PM

You're starting to sound more like Joel by the day. Especially with this i'm-cursing-all-the-spammers-to-the-lowest-levels-of-hell thing.

If you were in their shoes would you act so different?

Jazz on May 29, 2008 11:52 PM

"...user has to organize 12 images into 2 categories..."

at which point the user decided not to bother...

The problem with all anti spam systems is that if it annoys the genuine user too much then they won't bother ...

Spammers however will because they are more persistent than a real user

Jaster on May 30, 2008 01:32 AM

"If you were in their shoes would you act so different?"

Speaking for myself, I couldn't possibly be in their shoes. It takes a certain kind of person to be a spammer.

I don't even know what you mean exactly by "in their shoes." It's not like they were forced into being spammers by circumstances. ("They killed the woman I love... they ruined my life... they made me become a spammer!")

NJ on May 30, 2008 03:06 AM

@NJ:

Perhaps he means the poor spammers are trapped in a vicious cycle of poverty where they only way they can afford to feed their children is to make a few measly cents from spam.

People like Jeremy Jaynes, who apparently had to struggle by on just $750,000 a month.

http://en.wikipedia.org/wiki/Jeremy_Jaynes

Graham Stewart on May 30, 2008 03:39 AM

Isn't that "guess which picture below is a cat" unbreakable? Why craiglist do not use it? I think that is less troublesome than normal CAPTCHA.

Seriously, I really hate CAPTCHA, specially bad made ones. Torrent leech CAPTCHA sometimes takes me 4 or 5 trys to log in. I fear of a world where I will need to fill a CAPTCHA in order to login in WoW.

Hoffmann on May 30, 2008 05:34 AM

random limit generator http://sirnot.110mb.com/c/?f=limgen.py
+
captcha backend
=
captcha from hell! http://img154.imageshack.us/img154/1544/captchafromhelltm7.png

sirnot on May 30, 2008 05:45 AM

> It takes a certain kind of person to be a spammer.

Unfortunately the world has an endless supply of that kind of person.

Rhywun on May 30, 2008 07:20 AM

Can you blame them? Craigslist is huge! It's a great place to get rid of second hand stuff or sell things in general. Plus its free. Who wouldn't want to be there?

Michael on May 30, 2008 07:43 AM

Having to pay for everything can't be the answer to SPAM. SPAM is sometimes mail, sometimes SMS, sometimes a chat message, sometimes a comment on a blog, sometimes a comment in a forum. You want all of these to cost money? Do you think the Internet would be where it is right now if all this would cost money? Some suggested to not have people pay per submission, but only noce to sign up. Some spammers make $10'000 a month. Do you think they care if you they have to pay $100 to sign up for 100 forums, if they can then spam them into the ground? It will only stop them, if they have to pay "per post" or "per mail". But that would also hit normal users and would be "the end of the world as we know it".

Have you ever considered it might be a war we can't win? Every war has a winner and a loser (usually wars don't end with a tie) and there is no reason to believe, that this is war we will certainly win. Maybe spammers will win! Maybe the evil *can* actually win and will finally win and spammers will make the Internet as we know today unusable.

Maybe the solution is a completely different one. IPv6 might be it. Stop dynamic IP addresses! Every user gets his/her own IP address (or address range as IANA wants to give everyone a whole network bigger than the currently available IPv4 address space altogether). If you always come in with the same IP address, IP address blocking finally makes sense again. Spammers will get on black lists and be permanently banned from a couple of sides, finally on the blacklist, being banned from almost anything. Okay, they can use open proxies, but guess how fast they will end up on blacklists, too.

If the only way to ever get a new IP address is changing your provider (which can be very hard, since some say contracts will last for at least two years), this could be a real problem for spammers if all open proxies are blocked, too. Especially if providers say "If you ever come back to us, you will of course get your old IP block back. Thanks to IPv6 we have so many IP addresses, if we have handed out every customer in our company history a whole subnet block and reserved that thereafter, we would still have not even used 5% of all addresses we have been getting assigned by IANA".

Of course that means you lose anonymity completely. Should ever anyone find out who is owning which IP block, you are "tagged" and everyone knows who you are, where you go and what you do. If you give up being anonymous, there might be an even easier solution. We need a way to 100% verify the identity of a person when signing up for anything on the Internet. E.g. a WWIC (World Wide Identity Card). Every new PC ever produced would have a card reader built-in (such a reader costs $20 for an end user, $10 for a bigger company getting 1000 at once). If this card is unfakable (unless you are the Ueberhacker), you can block actually "people" of your service. They can't sign up for a new account, since they can't identify as another person than the one you already blocked. Actually the system does not even have to break anonymity completely. The WWIC may just transfer a world wide unique number to you. So you know the number (that you can use for blocking) and the nickname of the user. It still won't give you the real name or street address.

Though maybe breaking anonymity alone in some way is already the perfect solution. Do you think many people will still be spamming, if everyone can look up who they are? I don't think so. Not that I would support such actions in any way (fighting evil with evil makes you becoming part of evil), but I guess if they overdue, one day their car will be beaten to crap, their cat might be dead when they come home or their house mysteriously has caught fire. And these might be more harmless things that can happen. People get killed weekly for much less than spamming.

Mecki on May 30, 2008 08:41 AM

"Ship the commenter a product of two primes and the JavaScript to extract the primes. pick ones large enough that it takes about 15 seconds. Users wont care but you just cost the spammer 15 computer seconds per post."

If javascript can factor it in 15 seconds, high-performance mathematical software will do it in 1/100 that time, with throughput even higher on a multi-core machine. In-browser computational challenges won't slow spammers unless you're willing to seriously inconvenience many regular users, because spammers can afford to develop efficient non-embedded challenge solvers that run much faster. Even if you could equalize solver performance, are you going to make someone with a 4 year old PC wait 2 minutes to solve a challenge that a brand new machine does in 15 seconds? Will the challenge take 15 minutes to complete on an iPhone?

Matt on May 30, 2008 11:43 AM

Ok, this idea probably fails the test of making it easier for the user, but what about having the poster add a question about the post along with one or more valid answers. For example, for this post, Jeff could ask "What word does Jeff use to describe what kind of target Craigslist makes for spammers?". The correct answer would be "juicy". It's open-ended so guessing wouldn't work. Using the Amazon turk method of having cheap people do it would become less cost effective because they'd have to actually read the article. The spammers would have to basically solve natural language to beat it. If they do, we thank them for advancing AI. The downside is people would have to actually read and comprehend the original post before commenting. The upside is that people would have to actually read and comprehend the original post before commenting.

Brian Deterling on May 30, 2008 03:25 PM

Have you noticed how hard it is to leave a comment these days? I remember the old days when you could just pop in to a blog, leave a comment, and hit the road. Then along came captcha technology. Now it seems that everyone wants you to register for an account to leave comments, and I'm just not down with that.

GreenLantern33 on May 30, 2008 06:30 PM

I would actually support the death penalty for spammers.

Jim on May 30, 2008 10:02 PM

Mecki: Well, if the spammers are adapting their methods of obscuring - and thus hiding - spam, we should stop thinking about just single recognizers for spam.
I am currently wondering if it would be possible to create a system of spam recognizers, message normalizers and similar agents that results in some emergent behaviour that digs through lots of obscurations.

I think, at first, you have to normalize the messages in some way, because in the last years, the simple "buy via_gra" turned into some "buy v1agra", "buy viiIiIi1agraAaA4" and then turned into "buy v1i1iiIagraA4aa4aAsagrrwezrb". However, such obscurations can be reverted with pretty good precision.

After that, you can throw your regular byanesian network on it, or create some agent that tracks the distribution of words in all posts and flags anomalies suspicious. For example, if all the comments here contain the words "This", "spam", "freaking", "annoying", all with a probability of like 22%, rest noise, and some other post contains none of those words, but loads and loads of noise, its suspicious.

Given enough suspicious points, you will flag it as spam eventually.

Im pretty sure that such a system might be fairly mighty, with regard to work required, cpu-cycles eaten and spam recognized.

PS: haha, I am not allowed to post, gotta obscure things

Hk on May 31, 2008 02:17 AM

Reverse CAPTCHA.
It is much easier to make a bot/program prove that they are such than to make a human prove they are human.

Dave on May 31, 2008 03:38 AM

"Do you think they care if you they have to pay $100 to sign up for 100 forums, if they can then spam them into the ground?"

The charge to register of course would not be the only measure. If they have to pay $1 to re-register for the forum every time their account is deactivated because they're spamming, it would become far too expensive to be worth it.

It's a decent idea on some level but not really practical. Aside from the obvious issues that quite a few people would not want to participate because they don't have a credit card or don't want to give it out, it would also probably result in more attacks aimed at hacking users accounts, and then you will have users losing their buy-in because somebody hacked their account and used it for spam. I would rather deal with spam than hacked accounts.

There's probably never going to be a 100% foolproof way to eliminate spam, but anything you can do to reduce the negative impact spam has on the users experience with your site, without unnecessarily burdening the legitimate users in the process is probably worth doing. i.e. captchas that are easy for normal users, filtering, moderators, etc.

While these issues aren't going to stop all of the spam, having recently visited some of the bigger sites that don't use them (just try reading some comments on a USA Today story), vs other big sites that use them, the difference is night and day. While spam is still a minor annoyance on some of the sites that do use anti-spam measures, it makes the comments sections pretty much useless on the sites that don't use any.

----

Personally I'm developing a blog for myself that is almost entirely Flash-based and is run through a Red5 media server instead of via HTTP. While I can see some drawbacks to it, some of the additional features will be pretty neat and it should help keep me free of spammers for a while. At least the spammers will need to work on an approach that is specifically aimed at my site in order to spam me; will have to be able to decode my obfuscated Flash, or sniff and break my encoding scheme, etc.

Best Regards,
Gerald

Gerald on May 31, 2008 07:49 PM

I just saw a story about Wal-Mart doing a beta of free classified ads to compete with Craigslist. http://news.cnet.com/8301-10784_3-9958140-7.html?part=rss&subj=news&tag=2547-1_3-0-5

I wonder what they're doing about this?

Jeff Schwandt on June 3, 2008 08:30 AM

CAPTCHA is one of these useless things that are promoted as being effective. To whom?

To the average user the idea might seem amazingly smart and effective and in that sense CAPTCHAs have done a great job to say to their main target (the average user) "Great the service/product I am using is up to date with new technologies and fighting for my interests! I like them". Or piss them off completely (as I sometimes have hard time recognizing what the letters/digits are).

How effective are they? Well... How effective is the serial key on a windows installation (or for that matter on any other software product). Useless! The only thing it achieves is to piss of the legit user with entering the code and presents the illegal user with the inconvenience of entering a few digits during the installation process. Same goes for the other "more complicated" schemes like online activation (for product that entirely relies on your own computer)... Same goes for CSS DVD rubbish and other digital media nonsense...

It's very simple. You can't expect to be able to give the user the flexibility and technology of redistribution and then control them. Or to stay in context, with regards to the spam and CAPTCHAs - allow the user to post freely. This is the same as telling the people they can vote but then asking them to vote for whoever you want them to vote for! Good luck controlling them!

As for CAPTCHA and images of animals and etc... that's the same rubbish as the previous. It creates more trouble for the user and little or no challenge for the person who has to break it. If there is a truly revolutionary strategy to combat misuse it would very likely involve a change in the whole basic model.

My opinion is that instead we would enter an even more open world where the value each entity would equalize. So that whoever makes the mainstream something (music, movie, forum, blog, social network, etc...) will simply be making more reasonable income(less than before) while the entities who have equally good products but are not as popular will make also a more reasonable income (more than before). We can argue if that will increase or decrease abuse.

Bottom line: misuse (such as spam) is not a problem that can fully be eliminated by a technical solution(especially without troubling the legit user too much), it can only be minimized by an insignificant amount. It can probably be solved socially or some other way...

Nick on June 3, 2008 04:52 PM

Hey, fellas! Has anyone heard about an accident in Russia, where a spammer got caught and literally battered to death? Do you think it was deserved?

Clever guy on June 6, 2008 12:20 PM

Vigilante justice against spammers. In a "civilized" society, this cannot be allowed, so we tell ourselves.

But at what cost do we dis-allow the vigilante justice.

Let's take, for example, spam emails. A spam email sent out to 100,000,000 email addresses may reach say 10,000,000 invidividuals (remember, some of these individuals WILL NOT filter their emails because in certain cases the consequences of a single false positive detection may be disastrous to their businesses).

Now, if each of these 10,000,000 individuals spend 10 seconds downloading, reading the subject, making the decision to delete, and then deleting the messages, we're up to 100,000,000 seconds of human life wasted (just over 3 years). Of course, that's just waking hours. Since humans need to spend some time sleeping, this would really take 5 waking human years.

If said spammer sends, say, 1000 different spam emails over his/her spamming career, then said spammer has wasted (stolen) 5000 years of human lifetime.

Therefore, there is a school of thought; frightening, but perhaps justified. This school of thought would hold that a spammer