In November 2007 I called these three CAPTCHA implementations "unbreakable":
| Google (unbreakable) |
|
| Hotmail (unbreakable) |
|
| Yahoo (unbreakable) |
|
2008 is shaping up to be a very bad year indeed for CAPTCHAs:
Which means I am now 0 for 3. Understand that I am no fan of CAPTCHA. I view them as a necessary and important evil, one of precious few things separating average internet users from a torrential deluge of email, comment, and forum spam.
So reading that the three best CAPTCHA implementations have been defeated sort of breaks my heart. Even what I consider to be the strongest, Google's implementation, fell hard:
On average, only 1 in every 5 CAPTCHA breaking requests are successfully including both algorithms used by the bot, approximating a success rate of 20%.
A twenty percent success rate doesn't sound like much, but these spammers are harnessing networks of compromised PCs to send out thousands upon thousands of simultaenous sign-up requests to GMail, Hotmail, and Yahoo Mail from computers all over the world. Even a five percent success rate against a particular email service CAPTCHA would be cause for serious concern; with twenty percent success rate you might as well put a fork in that thing-- it's done.
In the meantime, CAPTCHA still serves a useful purpose-- speed bumps that prevent evil bots and the nefarious people who run them from completely overrunning the internet, as Gunter Ollman notes:
CAPTCHAs were a good idea, but frankly, in today's profit-motivated attack environment they have largely become irrelevant as a protection technology. Yes, the CAPTCHAs can be made stronger, but they are already too advanced for a large percentage of Internet users. Personally, I don't think it’s really worth strengthening the algorithms used to create more complex CAPTCHAs – instead, just deploy them as a small "speed-bump" to stop the script-kiddies and their unsophisticated automated attack tools. CAPTCHAs aren't the right tool for stopping today's commercially minded attackers.
There's simply too much money to be made in email spam for the commercial CAPTCHA algorithms, regardless of how good they may be, to survive forever. How old is Google's CAPTCHA now? Two to three years old? In the short term, perhaps proliferation and evolution of many different CAPTCHA techniques is the most effective prevention. You should emulate the techniques from the most effective and human-readable industrial grade commercial CAPTCHA, but avoid copying them outright. Otherwise, when they're inevitably broken, you're broken too. CAPTCHA defeating tools are tailored to very specific inputs; if there's little to no monetary incentive, odds are nobody will bother to customize one for yours. My ridiculously simple "orange" comment form protection is ample evidence of that.
Beyond diversification, the deeper question remains: how do we tell automated bots from people-- without alienating our users in the process? How can we build a next generation CAPTCHA that's less vulnerable to attack?
Here's some food for thought:
At some point, unfortunately, CAPTCHA devolves from a simple human reading test into an intelligence test or an acuity test. Depending on how invasive you want to be, you'll eventually be forced to move to two-factor authentication, like sending a text message to someone's cell phone with a temporary key.
I don't have the all answers, but one thing is for sure: I hate spammers. As fellow spam-hating internet users we all have a vested interest in seeing CAPTCHA techniques evolve to defeat spammers.
| [advertisement] Don't denormalize your data just to write reports! Data Dynamics Reports can use your existing data relationships when creating reports. |
Posted by Jeff Atwood View blog reactions
« Actual Performance, Perceived Performance See You at MIX08! »
What about language barriers?
Also your site uses CAPTCHA! :)
Jesus DeLaTorre on March 5, 2008 01:26 AMwhat about using some CJK charactors?
there are ten thousands of charactors.
perhaps, it need longer time to break.
of course, a human must be a chinese, japanese, or korean.
hito on March 5, 2008 01:32 AMOn .Net Rocks, I heard them talking about "invisible Capta", which was something to the effect of your trivia questions. The whole thing had to do with having an invisible Div with a small math problem that would only be answered by Javascript enabled browsers, which would root out all bots, or something to that effect.
charles graham on March 5, 2008 01:35 AMAnother solution : stop using these damn registration pages and use OpenID. Of course there will be openid spam server but it's easier to control and ban them.
acemtp on March 5, 2008 01:39 AMWhen will we actually hit the spammers where it hurts ? And by hitting, I mean prosecution. Yes they are in various countries that do not necessarily care, but maybe, just maybe, we can make them care ? I would think the WTO is for that kind of things...
bahbar on March 5, 2008 01:42 AMAnd it is even Web 2.0 enabeled... Isnt that great?
Here is the link :
I think Bots will have a hard time breaking that one
Heiko Hatzfeld on March 5, 2008 01:43 AMP.s.:
And here is the link to the article where I found it... I know its "old" but i found it quite interesting...
http://radar.oreilly.com/archives/2006/07/another-captcha-but-i-failed-p.html
Heiko Hatzfeld on March 5, 2008 01:48 AMThe text on the Google CAPTCHA breaking page suggests that they pay humans to solve the CAPTCHAs. I'm not sure if this is true or not.
===================
If you are unable to recognize a picture or she is not loaded (picture appears black, empty picture), just press Enter.
In no case do not enter random characters!
If there is delay in downloading images, exit from your account, refresh the page and go again.
The system tested in browsers:
Internet Explorer
Mozilla Firefox
Before each payment deemed by pictures checked Admin. We pay only correctly recognized pictures!
So what if the CAPTCHA turns into an intelligence test? Let's not have dumb people make comments either :)
Oh, damn. I can't spell "orange."
Matt Gibson on March 5, 2008 01:51 AMI read the Websense report on Google's CAPTCHA last week. I was under the impression that it wasn't broken in the sense that machines were solving the CAPTCHAs automatically (via machine vision or whatever), but by duping humans to solve them (unknowingly, on a different site) in order to make money or get access to free porn (http://www.boingboing.net/2004/01/27/solving-and-creating.html)
As I understand it, the hard part about breaking Google's CAPTCHA was the bot getting the image to human eyes, and getting a response back to Google before the process timed out.
If this is the case, changing the CAPTCHA from a reading test to an intelligence test probably won't make much difference. The hard part is surely making the authentication process robust against this kind of attack?
Paul on March 5, 2008 01:55 AMAsirra has no chance to success
Users are usually dumb with the willing to be even more dumber.
If You start to forcing them to use brain they will rather search for 'X' button insteed of on the photos of cats..
However ASCII art is available on Drupal CMS and i've started to using it some time ago.
Seems to be fine for now.
Also
Another good thing is to use javascript along with captcha, even simply onmouseover effect above the captcha image (like : display captcha image when moise is above 'fake' captcha image)
Bots usually don't do that
or use splitted captcha images with different z-index, animated gif's (or just backgrounds)
Just use your imagination
arty on March 5, 2008 02:09 AMWell, as you say: "perhaps proliferation and evolution of many different CAPTCHA techniques is the most effective prevention".
However, many of these CAPTCHA alternatives you mention are broken much easier than your average "type the characters from the picture" CAPTCHA. So, how about just sticking with the image CAPTCHAs, but using much more randomness in your rendering - i.e. there's no need to distort the picture heavily, you just need to have a bunch of different not-so-distorted, easily readable CAPTCHA variants?
If you have a bunch of different algorithms (each requiring a different cracking approach), and switch them randomly (requiring the bot to be able to distinguish between them), bots will not get far.
Of course, coming up with continuous variations in your CAPTCHA rendering can be a part-time job on it's own, but is only necessary if you're a high-profile target - for most websites in existence, changing a broken CAPTCHA algorithm for a different one is going to be enough to solve your problems for a long while... Unless you have a cracker who's REALLY keen on spamming your site and your site only, enough to change his cracking approach every time you change the protection, even if it will never pay off (and as we know, most spammers are in it for money).
Let's face it: if you're Google, or Microsoft, or Yahoo - any of those "alternative" methods will be broken much faster than a new CAPTCHA rendering algorithm. Something to think about...
dave on March 5, 2008 02:10 AM> I was under the impression that it wasn't broken in the sense that machines were solving the CAPTCHAs automatically (via machine vision or whatever), but by duping humans to solve them (unknowingly, on a different site) in order to make money or get access to free porn
If that's the case, then Google's CAPTCHA generation algorithm isn't broken after all. These human farms would work against ANY turing test.
Does anyone know for sure?
Jeff Atwood on March 5, 2008 02:14 AMAs with all anti-abuse measures, CAPTCHAs have to evolve to keep up; this is the nature of adversarial systems like anti-spam and anti-virus. They'll be broken eventually, by a sufficiently-determined attacker.
also:
'Of course there will be openid spam server but it's easier to control and ban them.'
Great hand-waving assertion there, acemtp ;) Same way it's easier to control and ban mail servers originating spam in SMTP-land?
Has anyone seen efforts for captchas that reveal the letters in an animation? Something that is easy to solve by human eye looking at the letter revealing/morphing animation but really hard for OCR technique to solve since there are too many frames to tie together to make sense of the word.
Erki Esken on March 5, 2008 02:29 AMThe "Find the doggie" and "select the word" are instant failures: When there is 10 alternatives to choose from, even the dumbest (meaning: random) bot gets 10% success rate.
So, all list-based alternatives are useless. (Of course, you can link 2-n tasks with Y alternatives, making the propability of guess to 1/Y^n - which is still pretty bad.
ASCII art is just another failure. Convert the code area to PNG, and it is your regular OCR again.
And the words failed on regular OCR... well, they were too hard for me too.
My answer?
Registration to any web service costs $1, via one or another single sign-on service. Half of the money goes to the web site that got the registration, another goes to the sign-on service provider. (And if you manage to make your first million with this, please remember me. :)
There are already human farms, alright. Some involve unwitting users solving CAPTCHAs for access to porn, and others involve low-paid workers overseas solving CAPTCHAs for money a la the "gold farming" model.
Here are observed cases of "CAPTCHAs for porn":
http://www.linuxworld.com/community/?q=node/2400
http://www.theregister.co.uk/2007/10/31/captcha-busting_trojan/
And there are some cases of CAPTCHA farming:
http://ha.ckers.org/blog/20070427/solving-captchas-for-cash/ (be sure to read the comments for several farmers offering their services)
By the way, here is the source for all these recent "Google CAPTCHA broken" stories -- one Websense blog post:
http://www.websense.com/securitylabs/blog/blog.php?BlogID=174
To be honest I suspect this is blown out of proportion. It looks a lot like another CAPTCHA-solving farm behind a web service API. (Observe the timestamps in the logs -- 30 seconds to decode a CAPTCHA sounds like a human, not an algorithm, if you ask me.)
Justin Mason on March 5, 2008 02:51 AM> Another solution : stop using these damn registration pages and use
> OpenID.
Hint: this problem is *not* a nail. Your hammer is of no use.
Peter on March 5, 2008 02:53 AMWhat about developing a system that uses VOIP to call a number and give the code, as to not alienate users without cell phones. You could also sell add space on the calls to make it generate some money.
It is probably like trying to kill a house fly with a bazoka, and not totaly fool proof, but atleast it makes some money too.
FireCracker37 on March 5, 2008 02:58 AMNext step in defense against spammers is probably using an external ID authentification (google, passport, or openID).
Next spammers step is therefore id theft.
ISP are very eager to fight a grandma that download an illegal song, they seems not very interested in fighting spammers.
The only solution would be to apply ARIN/RIPE policy strictly, but it would kill business since most firms are not very carefull about where their business mail comes from...
In contexts where people come together around a specific interest, you have a better point of cleavage -- not between people and machines, but between members of the in-group and everyone else, including people who wouldn't be interested in what you are about as well as computers. As an example, an associate of mine has left a phpBB installation with just such a captcha replacement out on the 'net for almost part a year now, and despite it being at the default location in the domain, no spam sign-ups have been recorded.
If you're one of "us" for the purposes for which this was written, then signing up here
http://www.obsessivemathsfreak.org/phpbb/
should be trivial. Without some significant AI this isn't going to admit a bot, and if you just play the captcha out of context in an unwitting mechanical turk attack (e.g. as part of a porn site login), you're not going to get very many false positives.
Quote
http://www.obsessivemathsfreak.org/phpbb/
should be trivial. Without some significant AI this isn't going to admit a bot, and if you just play the captcha out of context in an unwitting mechanical turk attack (e.g. as part of a porn site login), you're not going to get very many false positives.
/Quote
2 Problems:
1) Too "good", beat me too. (I know the movies; don't remember the names)
2) There seems to be a list of maybe 7 or 8 answers? If this was interesting server (ie. one promising for example send spam mail), making dumb bot with just one correct answer would yield 12% success rate.
Pre-defined lists are not an answer for the issue.
JP on March 5, 2008 03:53 AMhotcapcha is a little subjective - I wasn't sure there was a third hot one :)
I have an onlines savings account that has a terrible system though. You have to put in about twenty secret questions. I'll never remember all the answers, or I'll have changed my mind, or something by the time I have to access it again.
As well as putting together multi-factor authentication for stuff like online banking, there also needs to be a culture-change. Governments need to get tough and prosecute and also religions/moralists/parents/whoever need to educate the next generation that it is wrong to steal.
John Ferguson on March 5, 2008 04:45 AMI've had good luck using form-morphing techniques to prevent spam: http://nedbatchelder.com/text/stopbots.html. It won't stop a human, but what will?
Ned Batchelder on March 5, 2008 04:46 AMConsidering most internet sites are niche sites (company sites, tailoring to some kind of group etc), you should *always* write your auth according to that group. If you have a webmaster forum, just ask questions webmasters *should be able to* answer, if you have a Tattoo forum, as them about that kind of thing. This can be made more difficult but warping the text of the question differently every time, so they have a 1 in 5 success rate with ocr *and* have to know the answer and use pictures.
As a previous poster already said; if I run a forum/site about X, I want people *with* a brain to comment on things, not a moron, so I don't care about people who 'fail' the test. They cannot join. Mala Suerte.
Another method is to save up the comments/forum posts by new members and auto-checking them against a bunch of heuristics; I have quite a bit of success with that; I simply grep out all http addresses in posts (using heuristics to 'fix' urls that are broken up etc) and submit them to google. If I find too much of them on unrelated forums (you can use google queries to do that) I will auto-flag the post and send me a message. Spammers have a goal with their spamming and they don't, currently, have infinite resources to prevent me from finding it and blocking it. I have a *very* high succes ratio using this technique.
Basically my (long winded) point is that you should tailor your protection to the site you are protecting and you won't have much spamming problems.
frank on March 5, 2008 04:55 AMI believe CAPTCHA has been broken and if you beleive this post:
http://www.mperfect.net/aiCaptcha/
It has been broken for a very long time. With a little time and effort you could recognize any letter that has been distorted, especially if you analyze the average pattern of the letter.
Even if it hasn't been broken with services like the Mechanical Turk it makes it much harder to determine a human that has good intentions verse one that has bad intension.
http://www.mturk.com/mturk/welcome
And the CAPTCHA definitely isn't going away. I was just asked to create one for the ASP.NET MVC Framework for a project that I am working on.
http://www.coderjournal.com/2008/03/aspnet-mvc-captcha/
Solutions are only going to get harder and harder. One growing method that I have seen to prevent bots, is by exploiting their weakness when it comes to JavaScript. Basically you add a AJAX authentication string to the POST and the authentication string is only grabbed from the server moments before it is submitted. But that doesn't really solve the problem because if AJAX can get it so can a bot.
It is a no win situation with current stateless web.
Nick Berardi on March 5, 2008 04:55 AMI believe CAPTCHA has been broken and if you beleive this post:
http://www.mperfect.net/aiCaptcha/
It has been broken for a very long time. With a little time and effort you could recognize any letter that has been distorted, especially if you analyze the average pattern of the letter.
Even if it hasn't been broken with services like the Mechanical Turk it makes it much harder to determine a human that has good intentions verse one that has bad intension.
http://www.mturk.com/mturk/welcome
And the CAPTCHA definitely isn't going away. I was just asked to create one for the ASP.NET MVC Framework for a project that I am working on.
http://www.coderjournal.com/2008/03/aspnet-mvc-captcha/
Solutions are only going to get harder and harder. One growing method that I have seen to prevent bots, is by exploiting their weakness when it comes to JavaScript. Basically you add a AJAX authentication string to the POST and the authentication string is only grabbed from the server moments before it is submitted. But that doesn't really solve the problem because if AJAX can get it so can a bot.
It is a no win situation with current stateless web.
Nick Berardi on March 5, 2008 04:56 AMSorry about the double submit there was a hickup in the form that said the permission was denied for copying the HTML file.
Nick Berardi on March 5, 2008 04:57 AMI wonder if the ever increase battle between spammers and their victims (us) will result in the first true AI systems? Will they unintentionally end up make a computer that thinks like a human?
Jim Cook on March 5, 2008 05:08 AMQuote
So what if the CAPTCHA turns into an intelligence test? Let's not have dumb people make comments either :)
Oh, damn. I can't spell "orange."
Matt Gibson on March 5, 2008 01:51 AM
/Quote
Better yet, let them enter a word that rhymes with orange.
Thijs on March 5, 2008 05:25 AMI can't help but wonder that if adding a second captcha will significantly reduce the success rate. Currently, if an automated process get's 20% correct, adding a second captcha will cut that by an additional 80%, leaving it at just around 5% success. Or at least that's my thought on it. I hate captchas, by the way, but they do keep tons and tons of spam from getting to our inboxes.
Seth Braunstein on March 5, 2008 05:27 AMHow about a CAPTCHA that depends on -errors- that humans will make (reliably!).
MattF on March 5, 2008 05:29 AMI really don't understand how spammers get so much money to make spamming worth the effort...
Nicolas on March 5, 2008 05:33 AMThe "distinguish pictures of dogs from cats" page just informed me that I am a bot.
It asked me to choose all of the pictures of cats.
I did so, including one that contained both a dog and a cat.
I guess that was supposedly a picture of a dog.
blah on March 5, 2008 05:51 AMI had read a while back about a way to make spamming prohibitively expensive.
It was Cringely IIRC, that proposed to make the sender of an email perform a small calculation sent to it by the email server. A small enough calculation to not be noticed by the average email user, but large enough that when a spammer tries to send huge amounts of mail at once, the computation becomes too time consuming to be worth it.
This was over two years ago. Has anyone heard of this concept being used?
Now, I understand that this would unjustly penalize businesses that legitimately send bulk emails. But, do any legitimate bulk emailers send as much as a spammer?
I prefer technical solutions.
For web form spam, you can easily filter keywords and links (spammers can't obfuscate them, they want links to be machine-readable - google-readable to be exact).
For other kinds of abuse, it's more tricky, but it might be possible ot use trending, i.e. don't check individual request, observe how "user" behaves, how many registration he makes, how soon and how many e-mails he sends, etc. This should reliably pick up bots until spammers learn to emulate human behaviour better.
Actually, I believe dynamically generated CAPTCHA fields would do the job. Your Server sends you a session ID and retains the properties of the generated CAPTCHA field.
A Bot will not be able to find the CAPTCHA field so, it won't be able to insert text. Human CAPTCHA solving is no option, as the field name will be different every time.
Also, you should map the other fields as well, so the whole page changes in an unpredictable way for each session.
This could also be done with the Javascripts within each page, changing function names throughout to make it even more difficult for Bots to analyze it.
Why don't we skip the CAPTCHA and move to a pay per email type system? Just like we do with stamps. No gmail account for you until you provide a credit card number.
Akira on March 5, 2008 06:10 AMWhen making a comment on the Wolfram or MathWorld site, they ask you: 3 + 4 = ?
The math is even in plain text on the page.
Not sure what their success rate is, but it is probably pretty good. For email providers, this would obviously be overrun quickly. Word problems would be even better. They could even be simple and take a lot of different forms: "What is 4 from seven?", "Jill gives Johnny two apples plus three oranges. How many apples does Johnny have?"
Jason B on March 5, 2008 06:13 AMI read it that humans were being used (either wittingly or unwittingly) to read Google's CAPTCHAs. Either way, it's a bit over the top to suggest CAPTHCA implementations are broken for everyone. Google/ Yahoo/ MS need to do something new because the monetary reward for breaking their CAPTCHAs is high enough to make it worth paying people to do it. This is not so for the average blog or even small web application.
Tom Clancy on March 5, 2008 06:16 AMWouldn't it be much easier for Google/Yahoo/Hotmail/etc. to limit outgoing emails for new accounts? Instead of a single Turing Test, CAPTCHA, these services could pose a series of tests for the new users to complete over the course of weeks and months before email restrictions were lifted.
aikimark on March 5, 2008 06:19 AMThis sounds like the Matrix - Human farms of unknowing subjects breaking the CAPTCHA algorithm for the machines.
Mandatory 1 year prison sentences for all convicted spammers. 6-figure fines for all ISP's who knowingly distribute spam. Ninkinpoops who attempt to respond to spam should be warned once, then have their internet connections disabled for 30 days if they do.
All internet advertising must be PAID advertising and belongs on commercial web pages and pop-up ads only. Everything else should be punishable by law.
Desperate measures for desperate times.
PaulG. on March 5, 2008 06:26 AMI like the JavaScript solution that was suggested earlier. You could just have JavaScript populate a hidden field with a value that will be read when the form is submitted. If a bot is visiting the page the JS won't be executed and the bot will be defeated. That would unburden the user also.
Brian K on March 5, 2008 06:33 AMSome of the options that you offer as alternatives are no better. If they offer a multiple choice then the probability of breaking the capture becomes 1 in the number of options offered. The number of choices needs to approach a really, really big number (I originally wrote infinity) to make the approach effective.
I'm just saying, is all...
prairiedog2k on March 5, 2008 06:34 AMTo me the long term solution is to figure out and define exactly what spamming is, and automatically detect that behavior.
Maybe this requires some kind of machine learning. It may require shared databases of information about current spammers too -- that has the capability to stay ahead of the spammers.
This would have to be combined also with some more sophisticated and fine-grained access control. (E.g. to beat the case where a spammer takes your captcha image and uses it to give other users access to a fake porn site, only serve captcha images to clients that you can be sure have already visited your site in the past N ms.)
A combination of countermeasures that are not uniform from site to site or even request-to-request would also be best. I.e. imagine your captcha incorporated all kinds of variation [note how similar the example cpatchas above are to each other for each of Google, Hotmail, Yahoo]. If in order to beat your captcha, a spammer had to run several recognition passes tuned for different kind of captcha distortions, it makes it that much more expensive and time consuming.
We can also come up with more sophisticated ways of defining exactly what some of the charactaristics of a "high quality" blog comment is, and score comments accordingly, and send lower ones into human moderation.
The community of people who don't like spam is much larger than spammers and people who don't care. We also have the advantage that the characteristic that unites us is that we hate spam, and want to fight it. Our disadvantage is that most of us who hate spam are just average users, and have a certain threshold of what hoops they're willing to jump through to get their actual work done.
So to me the best thing to do is to make our websites smarter, rather than forcing users to do too much work; and when we do have a task for the user to do (log in, captcha, whatever), make sure it's as streamlined and easy to deal with as possible.
Reed Hedges on March 5, 2008 06:41 AMI had to receive a text message in order to sign up for gmail. Don't they still do that?
Joe Beam on March 5, 2008 06:46 AMorange
orange on March 5, 2008 06:53 AMorange
orange on March 5, 2008 06:54 AMThe anti-bot method I like requires no script and no effort from the real user. A text input styled for display:none within the submission form, possibly with a name of "zipcode" or something similar. Most bots will attempt to populate it with "convincing" data.
When you process the form, reject any submission with data in that box.
I didn't come up with this, but it seems to work pretty well.
Brother Erryn on March 5, 2008 06:56 AMThe anti-bot method I like requires no script and no effort from the real user. A text input styled for display:none within the submission form, possibly with a name of "zipcode" or something similar. Most bots will attempt to populate it with "convincing" data.
When you process the form, reject any submission with data in that box.
I didn't come up with this, but it seems to work pretty well.
Brother Erryn on March 5, 2008 06:57 AMI think the ascii-art captchas are as weak as image based captchas. it's a kind of security by obscurity if you ask me. if google would decide to use them, it'll take a day and they're solved :)
Please enter your Social Security number, mother's maiden name, date of birth and driver's license ID.
(It works for the bank...)
I am starting to think that some kind of global internet ID system that relates back to real world credentials is the only way to go. I know, it removes our anonymity, but it solves the problem.
The ID could be constructed in such a way that websites could not access the private information, just the fact that this ID is from a real person. Of course, regulating whoever has that information would be the challenge.
Jeff Davis on March 5, 2008 07:12 AMAlmost all the great suggestions on this "thread" are security by obscurity. Putting in JavaScript ? Invisible fields named "zipcode" ? Those things will be circumvented 30 seconds after they have been implemented. Remember, we are talking about Google and Hotmail here, not some private blog. On a private blog, even something as silly as Jeff's "orange" is enough.
J. Stoever on March 5, 2008 07:13 AMBrother Erryn, that's rather easy to break - naturally it will only stop bots that are not expecting your mechanism, but any sophisticated attacker of Google/Yahoo/Microsoft is going to spend some time studying the page to determine minor obstacles such as those.
Javascript/css tricks are easily broken.
Bobby on March 5, 2008 07:20 AMJeff,
From a recent post on the Joel on Software forums:
"Chenette said organized attackers are using automated tools to sign up for Gmail and other Web-mail accounts. When the CAPTCHA image appears, it's automatically sent off to a large and low-paid workforce, typically in another country, where a worker enters the code and sends it back so the account can be created."
http://www.theregister.co.uk/2008/02/08/microsoft_captcha_buster/
http://www.enterprise-security-today.com/story.xhtml?story_id=58602
How do you stop spammers from using low paid Humans to beat CAPTCHAs? Is the CAPTCHAs days numbered?"
So it appears that the Google CAPTCHA algo hasn't been broken at all, but simply circumvented by those willing to pay people to get them through.
KenW on March 5, 2008 07:21 AMOops! Forgot the link to the post at JOS:
http://discuss.joelonsoftware.com/default.asp?joel.3.600679.21
KenW on March 5, 2008 07:24 AMi think the next step in captcha is to require a valid answer, not just repeat the letters.
here are some ideas. to be successful, you would have to have a bank of X000 simple, first grade question/answers.
what color is this?
what is 1+3
what is this year
what is the first
what is 10/2
what day is after Monday
how many hours are in the day
How exactly are spammers any different than traditional marketing houses that send bulk mail advertising to your mailbox? Guess what.. the difference is strictly due to the public's perception.
If you really want to stem the tide, it needs to be legitimized and regulated. Once that is done, the various governments would have a financial incentive to really punish rogue spammers. After all, a rogue spammer would be cutting into their own profit. Further, the traditional marketing companies would push the smaller guys out of the market.
Here's how I see it: all ISPs pool their email address list into a giant database. A spammer would buy the right to send x number of messages to x number of addresses on that list. Say 40% goes to the government of the country of the ISP of the recipient, the rest goes to the ISP. If the spammer sends a message to someone on the Do Not Email list they are fined something like $100 per instance, lack of paying the fine = jail time for whoever the government can capture. Maybe it costs something like $0.05 per address per message, which is pretty close to bulk mailing rates.
There's financial incentive for: 1. ISPs to join the list; 2. Pretty much any government to enforce regulatation, which is something they like doing anyway; and, 3. Spammers to register and follow the rules.
After all, from a spammers perspective it's much more cost effective to broadcast a message to a known good list of recipients than it is to try and harvest those addresses in the first place.
Chris Lively on March 5, 2008 07:32 AMOne solution I've seen (and only in one place - in a free 2chan-esque image board software package) is a 'spam trap' - basically invisible form fields that are only filled out by spambots. These fields are then tested and if they have any value, the input is discounted as spam.
Phil on March 5, 2008 07:32 AMI'm a fan of the reCAPTCHA project. But lately I've hit a lot of words on reCAPTCHA that I can't decipher! I love the idea of CAPTCHA using a picture instead of words; it'd be easier to internationalize such a system.
monsur on March 5, 2008 07:33 AMHeres a question: could improvements of CAPTCHA-defeating technology be used to make super-reliable OCR?
Shmork on March 5, 2008 07:41 AMas http://en.wikipedia.org/wiki/Captcha, clearly states:
[quote]
A CAPTCHA system is a means of automatically generating new challenges which:
...
- Does not rely on the type of CAPTCHA being new to the attacker. Although a checkbox "check here if you are not a bot" might serve to distinguish between humans and computers, it is not a CAPTCHA because it relies on the fact that an attacker has not spent effort to break that specific form.
[/quote]
This point seems to be missed by just about anyone, and it's something worth considering. Just think "what would Bruce Scheiner say?" and you'll get it right ;)
dave on March 5, 2008 07:44 AMthey just need flash based animated + audio captchas
netduke on March 5, 2008 07:53 AMAn intelligence test like:
"You have a bucket that holds two gallons and one that holds three gallons. How many buckets do you have?" (smirk)
Mad Prophet on March 5, 2008 07:55 AMBTW, I've expanded my comment into a post: http://taint.org/2008/03/05/122732a.html
Justin Mason on March 5, 2008 07:56 AMI think it's high time that we stop trying to address the symptoms and start addressing the root cause of these sorts of problems.
Spammers should be legal hunting targets, plain and simple. I know I'd pay a hefty license and tag fee to be able to hunt spammers. This ought to be reality TV as well. Think of "Running Man" with Dog the Bounty Hunter as the host.
Enough is enough already. If spammers can't use the internet for good, then they should lose the privilege to use it (or live, I'd prefer it that way). They're a waste of humanity. They're also utterly stupid if they can't realize I'm not going to buy they're stupid pills after the 200th e-mail...
Chris Holmes on March 5, 2008 08:00 AMi hate them
so many times they forced me to try again... and again...
nowadays, i dont want to bother. i look how difficult the captcha is.
if it is too difficult, i dont even bother to write anything and leave
the site quickly.
they may be necessary, but as far as I am concerned, they make MY life harder. So i will not give in to them at all.
The web should be for people, not against people.
shev on March 5, 2008 08:01 AMHas anyone tried bayesian filtering?
http://en.wikipedia.org/wiki/Bayesian_spam_filtering
http://en.wikipedia.org/wiki/Naive_Bayes_classifier
Do you know what a PITA CAPTCHA is on a site like JK on the Run? It's so blurry and grey that sometimes it takes me 3-4 tries to enter the correct combination.
I don't understand the need for this speedbump with people who use OpenID. WordPress has embraced OpenID and I can use it to comment at many places without having to compile a long list of usernames and passwords.
Why can't there be a method for sites to compile lists of valid OpenIDs so people like me can skip the CAPTCHA Hell?
Mike Cane on March 5, 2008 08:09 AMThe choose cat and dog problem would probably work, but of course, you'd have to do either iteratively (which people would get tired of), or have something like a 5x5 grid, and choose which pictures had cats. A 5x5 grid, with a 50% chance of any given picture containing a cat would result in about a 1 in 8 million chance of a bot getting it right by random guessing. And the server can have a very large number of pictures stored for the purpose (Each picture could conceivably be less than 10 kB in size). The CAPTCHA would have to conduct random modifications to the pictures to prevent an attacker from just storing what picture corresponds with a given answer, however.
Matthew Hui on March 5, 2008 08:10 AMI don't like CAPTCHAs, but I see a major problem (from a web design perspective) with most of the new methods as well: most of them rely on Javascript.
Now, don't get me wrong, lacking support for non-javascript browsers isn't a show-stopper, but it does pose a problem for people who browse with javascript disabled. This includes users of the NoScript extension for Firefox, and people with text-based browsers like Lynx.
Jacob on March 5, 2008 08:12 AMThe website describing the GMail captcha crack was confusing, but it seemed to me that far from inventing a brilliant captcha-reading algorithm, they were just employing people to type in the captcha's as they come in. No human-vs-machine principle can beat that.
Zack on March 5, 2008 08:17 AMAbout the human farms, including a watermark/some kind of branding in the captcha image would at least generate some suspicion.
lmjabreu on March 5, 2008 08:30 AMMy own blog at http://smokinn.com/blog does similar to the cats vs dogs thing. I make people pick out between fluffy/not fluffy.
I fully expect this to be the new wave in captchas. It's MUCH more user-friendly and there are so many implementation tricks you can use (mine is very naive but I can already think of 3 improvements I'll make if ever a spam bot gets through) that it can be very solid.
Guillaume Theoret on March 5, 2008 08:30 AMI think it's time to add some inteligence in the process, what about 'questions' like these:
a banana cost $1. three bananas will cost $_____
I had a $3 discount on a $15 product. I paid $____
and the text could come as a standard captcha.
( I am Brazilian, so my writing might have some mistakes ).
captcha has a grave usability problem. alternative to this is to ask simple questions. eg. water is liquid or solid? answer:liquid etc. here is such plugin for wordpress blogs.
Ades on March 5, 2008 08:49 AMMy knowledge of how CAPTCHA works is very limited but I want to know why CAPTCHAs are always static? Would bots be able to break CAPTCHAs that use kinetic typography? I would think that trying to analyze moving, morphing text/images would be much more difficult to break.
http://www.cs.cmu.edu/~johnny/kt/ ..the demos are really neat.
chillings on March 5, 2008 08:51 AMThe major web-based email providers - GMail, Hotmail, etc... - should require the user's browser to perform some calculation in java script for every email that is sent. The time required to perform this calculation would be minimal for normal users, but prohibitive if you're sending bulk spam.
@JP on March 5, 2008 02:37 AM
Paying $1 for every web registration is a terrible idea. First, I don't want to give out my credit-card number to everyone. I even feel paranoid about Amazon.com trying to store it. Second, lots of people don't have credit cards (e.g., kids). Lastly, it would discourage me from posting on almost any discussion board because I'm too cheap to pay for the privilage of providing help or asking questions myself.
Logging into HSBC's personal internet banking account requires 3 things: user name, password you type in and last - another password where you have to use your mouse to point and click on an on-screen keyboard in order to enter the information.
If it is required to point and click an on-screen keyboard in order to enter information - would that help stop bots?
JR on March 5, 2008 09:10 AMActually you forgot a catagory: Social synchronization. If you expect people to pick a word that best represents multiple pictures you are expecting them to think the same way. I frequently fail this type of social sync test, I just don't seem to think the same way as most people for some reason. Crossword puzzles are perhaps the best example of this.
With bots cracking it 20% of time, I would be intersted in failure rate for flesh and blood. I know that I don't hit 100% Makes me wonder what the average is.
At a thought, put instructions into a image to complete a task and have the result of that task be the key. Of course, that would exclude the simple and the visually impaired and those of a different language.
I suspect that ultimately we will have to fall back on a third party that could be use used to verify our identity and provide websites a semi-anonymous ya or nay without passing on any of the personal details we provided to the third party to verify our identity.
Wait, I think that's been done and it failed the paranoia test...
We are our own worst enemies at times.
Xepol on March 5, 2008 09:15 AMHere's a thought... use the captcha to do a little research as well.
While we're asking users a question... maybe we can make the answers they give relevant and useful to us
another thought... you could make the instructions an image... so the algorithms would need to get the OCR right... then interpret the directions... then figure out how to follow them.
seems pretty bulletproof to me (at the moment)
Jim on March 5, 2008 09:19 AMAny kind of Captcha is useless for determined spammers. If I were I spammer, I would hire a low cost laborer who can, in an hour, manually open tens of new email accounts. Why waste time developing and launching anti captcha bots?
Abdu on March 5, 2008 09:22 AMSpammers are like child molesters. They only stop when you put bullets through their heads. I continue to hope that one day society will decide to step up to the plate, instead of endlessly and pointlessly playing Spy-vs-Spy. But (sigh) just like the "War on Terror", the real point is how much money MegaCorpGov can make out of it, and that is ultimately determined by how long it can be dragged out.
Ed Tuonine on March 5, 2008 09:57 AMIn regard to Guillaume's comment.
Cute.. fluffy is nice.
The biggest issue with these types of captcha is that it is not too complicated to build a pretty thorough library of images used, quite quickly.
For instance, one can go to http://smokinn.com/blog/app/img/spam_pics/1.png
to get the first image (cool... public folder!)
and then iterate through the images to the end, gathering each.
A human with malicious intent can correlate the filenames to fluffy/not fluffy and then build automation. To make it even more robust (just in case you decided to use a renaming script to rotate the images around internally) the actual image could be correlated to fluffy/not fluffy. The program can then check the data in each image (rather than the filename itself) and 'sense' the matches and thus appropriate selections.
You would need a rather substantial library to make this captcha strong enough... and enough time to manually go through each one noting if it is fluffy or not.
Interested in hearing what improvements you have thought of...
Demi Raven on March 5, 2008 10:01 AMMost captcha relies on you being able to see and speak English so you have instantly alienated all your potential blind and non-english speaking users
Most of the alternative systems mentioned above rely on a bot not knowing your captcha method, as soon as they do they can defeat it easily (at least often enough to be useful), or use subjective tests which humans regularly fail as well, or cost the user money ....
This last one is the best and worst, it would stop all the spammers, (if it costs more than the return they will not spam) but it will also stop the majority of your potential users
Universal ID is not an option, universal ID is never universal, I for one will not have one and so will not pass, and it assumes that the ID system is perfect (cannot be cracked, cannot be faked) and every system can be and is, if there is enough money involved.
Jaster on March 5, 2008 10:15 AMOh ...
Who took transported the Cleopatra's Needle to Central Park and shares a name with a firm of Tailors in England
Henry Honychurch Gorringe (Rhymes with ....)
Jaster on March 5, 2008 10:19 AMI think that the actual problem is somehow, even with all of us telling our parents, friends and children NOT to click on spam links or buy from spammers some "people" still do it.
If there was no money in spam, they wouldn't have any incentive, but you have to remember that in direct mail (snail mail) 1% is considered a good return in email costs are so much lower that .001% is still a good return.
So the real question is, How do we find that ignorant .001% of people & educate them???
Dave on March 5, 2008 10:34 AMAll this talk and not a single person suggested replacing SMTP protocol with something more up to date with the real world?
Alex G on March 5, 2008 10:41 AMI just say we make it legal to murder these bastards. Put out a bounty on their heads.
Where do I start?
dnm on March 5, 2008 10:45 AMCheck out the submission page on thoof.com - it uses a fairly novel approach where you must click on the kittens in a picture (its a much more elegant implementation than the Microsoft proof of concept you link to above).
Ian Clarke on March 5, 2008 10:46 AMDave: With a claw hammer.
dnm on March 5, 2008 10:46 AM@Alex G on March 5, 2008 10:41 AM
SMTP is slowly being replaced by web-based email.
KG on March 5, 2008 10:53 AMWe use reCaptcha - it works great so far :)
Lukas on March 5, 2008 11:40 AMCaptcha is good in theory but I have come across many users who struggle to read the letters, including those who have lowered visual acuity. Some captcha themes are so obscure though as to make it difficult to read them in any circumstance. A new method is certainly needed!
BlackWasp on March 5, 2008 12:12 PMI have a simple and I think only breakable on a site by site basis. I don't think this could be broken by an automation except on a case by case basis.
The idea is simple, present the user with a paragraph of text describing something. Subsequently the user must answer a key question the solution to which was clearly presented in the paragraph previously. For example,
Fact: 20% of all dogs suffer from Fleas.
Question: In a selection of 25 dogs how many are likely to suffer from fleas? (One word) Five
Of course thousands of these could easily be created, and certainly more complicated ones with non numeric solutions. Basically solving these would require a certain level of intelligence. There are still I guess several hurdles such as the language barrier, the intelligence barrier and the requirement that these be created by a human in the first place and will probably require regular updating. Of course if this idea took on it would be possible for a company to create a server of these puzzles and then charge for site/content providers to use their regularly updated set of solutions.
The advantage of this approach is that the user must show real intelligence in order to solve these sorts of problems. This has never been solved by an automation, but before capcha was even invented there was already software which could solve hand written character recognition so it was only a small step to cracking capcha.
I guess the other problem with this approach is that lots of internet users don't want to invest the time to read a paragraph of text just to sign up to something.
Southern Chess Player on March 5, 2008 12:55 PMAny new method will be attacked and broken too, not that that's necessarily a bad thing. It drives us forward, makes us find new ways of protecting ourselves. Often, the technology used by the bad dudes becomes useful too.
I always disliked CAPTCHA though. I've never been able to decipher them. Hopefully whatever is used to replace them is a lot more user friendly.
Naked Programmer on March 5, 2008 12:58 PMWell. CAPTCHA are often not understandable for old people.
And since they are generated with computers, it seems possible that a way to reverse it with a computer exists.
I really prefer clever and hand made captha :
"What is the first name of Jeff Johnson ?"
"What is the year of the end of 2nd world war(1939-1945) ?"
And so on ... If you create hand-made silly question, then all the spammers will be defintively blocked. There is no software which is able to understand a question.
Somewhere I've read that using hidden inputs as bot traps can be effective. If something was entered into the hidden fields, it must be a bot. The bot isnt going to render the page to determine if a textbox is hidden. You'd probably have to constantly randomize the field names on high profile sites though
Gary on March 5, 2008 01:18 PMMy Freakonomics thing tells me "CipherTrust has analyzed the effectiveness of various kinds of spam. It turns out that pornography is far and away the most effective spam, with a click-through rate of 5.6 percent. The next-best click-through rate is pharmaceuticals, at 0.02 percent."
The only way to solve spam forever is to stop people opening spam messages.
Best dumb-butted responses so far:
Pay $1 for all the (stupid) websites that ask for your info.
- The reason I give fake email addresses is because I don't want spam, and I don't want dodgy website having my credit card neither. Gee let me pay to comment on forums? I'm already annoyed that I have to sign up in the first place.
Give up your identity, SSN, credit card, etc
- Why should you really know who I am? Spammers will still have fake ID's, while honest people pay the price.
- Do I trust you with that information. Do I trust your security and data retention policies?
Limit the number of emails for new accounts.
- Sure, but for how long. Spammers will then create accounts, have their fake accounts send a few 'real' messages, and after a period of time resume full-time spamming. All you did is introduce a temporary delay.
All internet advertising must be PAID advertising
- I'm sure someone paid the spammer, so ipso facto stupido. Do you think spammers are doing it gratis?
Universal ID
- Who manages this, and can you be sure that their captcha works? You're just pushing the problem up a level, and making one large target instead of many small targets.
- Conventiently you also get universal tracking of habits and selling my information to ... spammers. Thanks! Where's my tinfoil hat?
Charge people per email:
- Great idea. Maybe I already do pay you retard. I pay for hosting. I pay for internet service. I pay for bandwidth.
PS Banks don't use captcha. They have secure offline processes in place to set up your internet banking so that even their employees can't fake the system out. Multi-level authentication isn't captcha.
That is excellent food for thought. Distinguish a type of animal, bloody brilliant! At least then the captcha would be fun!
Ryan Allen on March 5, 2008 02:17 PMHehe:
http://www.ubersite.com/m/113411
Is this the future (possibly NSFW due to two swear words)
Bryan Childers on March 5, 2008 02:39 PMA surprising number of the suggestions above are culturally dependent. For example:
Not everyone has a Social Security Number. In New Zealand for example, the is no universal personal identifier (and long may that freedom continue).
Not everyone has a drivers license. I was 42 before I felt the need to get one.
Not everyone will recognise an athlete dribbling a round ball as related to the word "basket", as http://gs264.sp.cs.cmu.edu/cgi-bin/esp-pix just asked me.
Any trivia questions (no matter how simple) will be foreign to some people. ("What's the second world war?" asks someone in Chad.)
Spam is a tax we pay for having email. Use spam filters - all you can.
ISPs should have better spam filters than most do: Gmail does well. Learn not to let spam annoy you.
Make the captcha too hard, and you'll lock out many of your human readers. Life's too short to spend it straining my eyes at distorted text.
Captcha: apple? banana? grape? peach? I KNOW! it's cherry, right? No?
kiwi? guava? strawberry? secret? password? Oh please, just post my comment already.
Watermelon? Pineapple? Persimmon? Sweet potato? Lime? Lemon? Tangerine? Pomegranate? Olive? Nectarine? Pumpkin? Cantelope?
Izzy on March 5, 2008 02:57 PMI run a website/forum for a World of Warcraft guild that I'm in. We used to get a lot of forum spam. What worked for us was to have add a question to the registration form - a trivia question. In our case I asked a question that anyone who has leveled a character to 70 would know the answer to, but noone in a captcha-breaking sweatshop would be able to answer. A lot of topic-specific websites could use similar techniques to filter out spambots. You just need to tailor the questions for your audience.
After making this change we haven't seen any spam posts.
DancesWithLysol on March 5, 2008 03:05 PMForums / Blogs,
Seriously, any form of validation that requires a user to enter anything but their blog comment or forum message is useless. It may be partially effective against automated means, but a human farm of people can break any of these ideas EXCEPT bayesian filtering.
Once you train your bayesian filter by marking actual spam as spam, and good posts as good then only a very small percentage of spam make it. The ones that do make it you mark as spam manually which further 'trains' your filter. Simple.
Email providers,
bayesian won't help you prevent people from creating spam sending accounts.
>>>
If you're one of "us" for the purposes for which this was written, then signing up here
http://www.obsessivemathsfreak.org/phpbb/
should be trivial. Without some significant AI this isn't going to admit a bot, and if you just play the captcha out of context in an unwitting mechanical turk attack (e.g. as part of a porn site login), you're not going to get very many false positives.
<<<
Yeah, right. I wasn't able to answer a single one out of ten. :D Obviously, I don't to belong to the targeted audience.
Vinzent Hoefler on March 5, 2008 04:03 PM@KG
No, desktop SMTP is slowly being replaced by web based SMTP clients. SMTP is still there.
Alex G on March 5, 2008 04:17 PMHow about instead of testing if a human is filling out the form, just make sure to map an account to something pretty unique
SUCH AS A CELLPHONE.
Problem solved :)
Greg Magarshak on March 5, 2008 04:21 PMThanks for the heads up regarding Asirra (the "click on all of the cat pictures" CAPTCHA). It's definitely way less tedious (and more fun) than the standard text CAPTCHAs...
Erik Novales on March 5, 2008 04:29 PMThere has to be a way using Flash or some sort of randomly generated animated field that a human can distinguish that a bot can't.
The simple ability to use actionscript to randomly create a timeline for the flashed letter forms and the ability to use the same scripiting to create an endless combination of noise or disfigurement, would have to at least set the absolute bots back a bit.
Of course the real deal spam houses forcing actual people to do their dirty work may never be stopped, but the marco/bot programmers would have a hell of a time with a CAPTCHA that had its own timeline and actually moved.
If you were really clever you could also have the algorithms get MORE strict each time the person pressed the "show me a new one" button based on their session ID or cookies.
Perhaps also getting away from letterforms entirely, using a gradient slice of colors with a corresponding universal sound, upon mouseover, by universal I mean things anyone would recognize. Flowing water, a cheering crowd. For the backend, have hundreds of dummy sounds, use a scripted backend to randomize the filename of the embedded sound clip, always make sure its position is as random as you can get it.
I think any system can be broken, and perhaps none of my ideas would work since the obvious weakness is it must be executed client side and would therefore be susceptible to reverse engineering.
I find this topic particularly interesting even the first time you brought it up, because its a call back to the most simple rule in our electronic age:
Anything that can be made, can be unmade.
And frankly, there's something almost comforting about that, as crazy as that sounds.
Mike on March 5, 2008 05:10 PMWhy bother with Captcha? Just let them spam all they want and ignore them silently. This is what services like Akismet and Defensio are for. They will take care of watching over the evolution of spam messages and adjust the filtering techniques.
Defensio advertises an efficiency of 99.77%. Considering Akismet (no numbers) is at least 99.5%, you can combine both and get 99.9999% accuracy. Who needs a CAPTCHA?
Louis-Philippe Huberdeau on March 5, 2008 05:53 PMI write some bots myself (though not spam bots!). Just some bots to simplify certain internet tasks and I use WebKit which actually loads javascript. So to the people who are saying that using an invisible div with a javascript math problem solves CAPTCHA... it does not...
Mitchell Hashimoto on March 5, 2008 06:44 PMThe economics are heavily in favour of the spammers, aren't they? The spammers have an ongoing financial incentive to break big systems so they'll keep working on it. Which basically means entering into an arms race with spammers.
Better would be to hit the opposite side and prosecute anyone who uses spam to sell something. If they aren't legally accessible, just block name/IP of the mail/web server that does the trade until fixed. I imagine that would reduce spam by an order of magnitude more-or-less instantly.
Jim on March 5, 2008 06:53 PM@monsur recaptcha has an audio base captcha
ka2 on March 5, 2008 07:03 PMHow about this:
Once your CAPTCHA algorithm is broken, you obtain the solution and incorporate it into you own CAPTCHA generator:
1) Generate the image.
2) Use the solution.
3) See if the "solution" matches the actual answer.
4) If it does, discard it and do steps 2-4 again. If it doesn't match, then your CAPTCHA is safe!
The only problem would be obtaining the solution. $$$ :P
Hmm, perhaps this is why I've been getting a lot of spam from gmail users as of late...
Thic Ric on March 5, 2008 08:04 PMOkey, it is used everywere. Often it is very easy to see what letters anc charactars there is, BUT try this one: http://www.iis.se/domains/domainandcontactsearch?query=sunets0702-00001
Hkkathome on March 5, 2008 10:03 PMi am beginning to wonder if captchas are nothing more than turing tests.
ralph on March 6, 2008 12:04 AM@Thic Ric: no, Gmail spam can just be sent from Gmail's smtp server, or you can just spoof the From address. There are other protections (too many sent messages will cause your IP to be blocked) but it doesn't require solving captchas.
Paolo Bonzini on March 6, 2008 01:18 AMLOL, the ASCII art captcha can be broken in a second... the only serious one to me seems recaptcha.
Paolo Bonzini on March 6, 2008 01:26 AM
Instead of capta why don't you ask a question.
And a have a database full of questions like "what color is the sky"
or even harder questions like riddles, so the computer wont be able to brake them but a normal human with basic understanding will.
Another idea is have a movie played and the answer to the question is inside the 10 second clip which could be a flash or real player.
boya on March 6, 2008 02:20 AMMy missus liked the "cats and dogs" one but she would!!
I Like what Ajaxian.com do, ask a question like "what does the X in AJAX stand for?" of course this has the added bonus of weeding out any human that doesn't know what their talking about as well
Of course, as with all systems like this, it only takes time for people to hack it. This article could easily be posted after several years of any alternatives.
Are we just too reliant on computers to do things for us?
Matt Smith on March 6, 2008 03:00 AMAll of the replies above have a flaw in as much as they refer solely to the quality of the Captcha. If a spammer's machine fails a captcha four times then succeeds (and I realise this is not how probability works, but law of averages here) then clearly they're safe and can go on to make as many accounts as they like, right?
What's being forgotten is that it's very easy to shore up the captcha capability with automated or manual flagging of IP addresses and identities. Keep logs, alert admins. If IP address xxx.yyy.foo.bar just tried to send out a x captcha requests in y minutes and got z% of them wrong, ban it - or, if you're feeling charitable, block it for a week. While we're at it flag the email addresses they successfully made and either automatically block, disable or remove those or else drag them to the attention of an admin. You could argue that the wave of captcha requests can happen too fast for a human administrator to respond, that it'd be relentless and your poor admins would never get any sleep; what's to stop this process being totally automatic on the part of the server, and letting admins take a look at sufficiently borderline cases?
You could further argue that letting an automated system cancel and ban accounts is too heavy-handed, but these are free email accounts on privately owned servers: in return I'd point out we are very far into 'Access is a Priviledge, Not A Right' territory here. If they were charged for I would expect a far more sophisticated and authenticated system, but on what is (no offence meant) the lowest common denominators of popular webmail sites I would rather the admins be heavy-handed than too soft.
codemonkey on March 6, 2008 03:03 AM* Distinguish pictures of dogs from cats
- what if you're a dumbass and can't manage that?
* Choose a word that relates to all the images
- what if you're a dumbass and can't manage that?
* ASCII art
- what if you're a dumbass and can't manage that?
* Solve failed OCR inputs
- what if you're a dumbass and can't manage that?
* Trivia questions
- what if you're a dumbass and can't manage that?
* Math and word problems
- what if you're a dumbass and can't manage that?
You have to remember that 50% of the world's population has a lower than average IQ (obviously). It's a bit cruel asking them to answer even the simplest of questions.
p.s.
***CLICK HERE*** For hot orange babes with MASSIVE oranges who want to suck your orange - these orange sluts will make your orange 5 INCHES LONGER in just ONE WEEK!!!
Get cheap orange MEDS online from OnlineOrangePharmacy etc...
RWW on March 6, 2008 03:55 AMI suspect cat vs. dogs will give a 50% success rate :-)
Yuval Perlov on March 6, 2008 05:53 AMPart of the protection inherent in schemes like Guillaume's fluffy/not-fluffy is that it's there on his site, and not very many else; same with the "who is this character you should know about?" scheme. Each individually might be easy to write a data-driven script for that gets a reasonable number of successes by chance. But to crack both of them, you need two sets of precomputed response data, at least.
Now you can get into a personal site and at most a smallish number of fan sites. Big deal.
If there are many flavours of site using these approaches, each with a different set of possible responses to subject-specialist challenges, the work required to overcome any arbitrary site increases. Yes, it would be possible to build up a site to plausible vocabulary database, but it would at least incur a cost to build that up. It's just another instance of diversity vs monoculture.
The places that will always have the problem are the all-comers type site, like Google, and I don't have a solution for that, especially under the plausible assumption that the attacker is using a botnet to spread his signal.
Steve on March 6, 2008 06:20 AMResearch have note that the best captcha has a combination of varying character size, character font type, character colour, character positioning, background. Now, if you randomise these factors, i.e. each characters of the captcha word is unique, you should theoretically get a very good captcha.
This is what we are trying to accomplished, and an example can be found here:
http://twiki.org/cgi-bin/view/Plugins/CaptchaPlugin
to follow up on my last comment... As for email or other 'identity' providers, why not require two forms of other 'identity'. That technique is used to get a new driver's license, a new bank account, a passport, and even a job. I think it is perfectly resonable to expect one or more forms of identity to create another form.
Obviously the issue with this is that a majority of people are not going to be willing to give some forms of personal identification such as an SSN#, birth certificate, last utility bill, or last paycheck. Additionally, a number of these are culturally dependent. however, there are forms of ID that anyone can get where the identity provider has done some kind of validation of the identity.
The first that comes to mind is a cell phone number. Some cell phones do not require identity such as AT&T GoPhones. But they do cost some money. It is doubtful that it would be worth it to a spammer to buy 1000 phones, use them to get other online identities, and then take the time to sell those phones again. However, if that is a concern, you can require that the phone be from a subset of cell phone providers that do require billing information from the user.
Does anyone else have an idea for another identity a web site can resonable request and VALIDATE that a user would be willing to supply? ideally, there should be at least three different identity options, and let the user decide which two they have available to supply.
This comment does NOT apply to validation of an anonymous user when posting a blog comment or forum post. Those sites should either require login, or rely solely on statistical filtering to determine spam based on content.
Michael Lang on March 6, 2008 07:58 AMTo Michael Lang:
I've got four Gmail accounts - one that is my real name (joe.q.bloggs@, as a purely random example) for jobhunting and similarly official stuff, one that is historic and I've had for ages and that people I know casually use to contact me, one that is sufficiently extant I can use it for random purposes, and one that is used solely to sign up for forums and other such services to alleviate some of the spam. This isn't even a remotely unusual use of Gmail: part of the very draw of these services is that you can throw out multiple accounts for whatever purposes you need.
In the face of that:
1) How are you going to compare-validate? If someone gives you a document reference that's a duplicate to the one in your system, are you going to say 'no, no second email for you'? That's taken away half of the usability of the system in a stroke, and you'd better hope that your single address doesn't get hit by too many spammers that end up making it unusable - since you can't get a new address to 'start over' without destroying your old one. If you're not going to compare-validate what's the point in demanding ID that anyone could make up? (See: address+postcode fields in hotmail signup.)
2) How are you going to verify-validate? If someone gives you a cellphone number are you going to call them up and ask 'hey, did you just sign up an email address?' You have to, otherwise how do you know they didn't just pluck a number out of midair? Multiply that amount of bureaucratic hassle by the number of gmail accounts there are. Then multiply that exponentially to be able to validate on things that are truly unique and have heavy information restriction in place such as SSN#s or National Insurance numbers. I'm reasonably sure the Data Protection Act would not exactly consider your SSN critical for signing up for free email. Compare that hassle to Hitting The Big Red Ban Button for too many hits from one IP or for too many suspicious captcha fails.
(As a side note, together with my 4 Gmail accounts I own 0 mobile phones.)
3) Are you going to trust bank details, billing details, stringent sets of contact details, or SSN/NI details to sites that are favorite targets for iframe / phishing / spamvirus attacks?
codemonkey on March 6, 2008 08:30 AMIn response to Steve:
"Now you can get into a personal site and at most a smallish number of fan sites. Big deal."
Big Deal indeed for those who run those sites should it be compromised. Unless there is a suitable message throttling mechanism in place, it may lead to a DoS attack and perhaps significant overage charges from one's ISP.
You are absolutely correct in your comment that "all-comers" such as Google face the biggest issues.
The main point is that one should never be too self-assured of the security of one's CAPTCHA method. There are plenty of methods that spammers can use to break a CAPTCHA, and any CAPTCHA that can be seen and analyzed by a person can be similarly analyzed by a computer...
A big question is: if we still wish proceed in using CAPTCHAs, how does one develop a CAPTCHA that is complicated - perhaps random - enough that a computer would have a difficult time with it without making it so complicated that it becomes an obstacle for a human?
Demi Raven on March 6, 2008 08:59 AMHey Now Jeff,
It sure will be interesting to see how CAPTCHA evolves over time.
Coding Horror Fan,
Catto
Why not use better stalling/probationary techniques. 1 e-mail per day until enough long term users have not called your e-mail spam, then weed out the long term accounts that approve accounts later marked as spam senders.
x (probationary user) has sent you this e-mail is it spam? y/n
Sending the first 10 e-mails require captcha of various forms, failing any deletes the account. 20% success wouldn't be good enough.
Any mass mailing activity in the first 30 days = deletion.
Have people prove their nationality by finding the bad grammar in a story, then monitor how many e-mails are sent to countries not speaking that native language. (oh noes, you may needs grammar to sends males to the internets - wouldn't that be a bonus.)
But basically, just continue to imagine new tricks that computers haven't been programmed to defeat yet, and cycle through them randomly. Then when one technique is defeated, remove it from the rotation and add 2 more unsolved techniques. Eventually you'll have a collection of 100's or 1000's of tricks to be solved and solving any one of them will have a very low rate of return.
Berg on March 6, 2008 10:43 AMPrompt the user with 2 captchas... 20% x 20% = 4% chance of success for bots.
Manu on March 6, 2008 11:07 AM"You must make a 30 cent payment using PayPal. Click on the "Pay Now "button.
How's that for a captcha or gotcha or whatever
John A Davis on March 6, 2008 12:18 PMTicketmaster recently won a lawsuit under the DMCA against a company that created bots that circumvented CAPTCHAs...
If companies can go after music file-sharers in the courts, why can't they go after spammers?
LS on March 6, 2008 12:32 PMTo codemonkey:
validating identity does not mean that policy would not permit having more than one email address. But maybe those separate addresses can be managed under a single 'account'. So I do NOT propose that people should be limited to only one email address with a given provider, just like I have two cell phone numbers. I can turn off my work cell phone when at home, and vice versa. But it should be up to that identity provider (cell phone, email, whatever) if they want to allow a single person to have more than one. It may even prove to be another point of service for gmail to allow you to manage more than one email box under a single login.
Validation: That depends on the form of identity. As for the case with cell phones, a text message can be sent by an automated system. To activate the account, you reply to the text message with a certain message. I've actually seen this implemented on a site, and it worked for me. I don't recall where it was, since I only had to do it once and it was over a year ago.
I left my message with an open question, what other forms of Id are acceptable. I explicitely noted that users would NOT find it acceptable to use SSN.
Michael Lang on March 6, 2008 01:58 PMI see where you're coming from, but what's your average Joe who needs a quick email for whichever benign reason going to think? The credo of the big sites is their accessibility - secure email providers exist for those users sufficiently paranoid, but for everyone else there's a quick gmail account and away you go. Are casual users going to remain if they have to jump through more hoops than 1 or 2 captchas?
I stand by my claim that the best solution is a dual system - captcha and administrative routines to back that captcha up. Captcha alone clearly isn't a solution, and any decent admin ought to be keeping tabs on this stuff anyhow even (or especially) on a site as huge as google. Stating that you should have to pay a micropayment or submit identity to gain a simple web-based email address seems kind of boggling, though maybe that's just the culture shock setting in.
codemonkey on March 6, 2008 02:57 PMFor the sites thate are concerned with signups - as opposed to say codinghorror where it's just a login - why not just have a minimum time set? Somewhere between 1-2 mins?
For a person entering all of their information and reading the whatnot on the page, they might not even notice at all that 1-2 mins have passed since the page loaded, whereas some algorithm (or paid person) that is able to crack the captcha, can only do it from 1440 - 720 times a day. It's not deal breaker but it definately shifts the supply curve for the people who are in it for the money.
Steve-O on March 6, 2008 06:09 PMMost people know the difference between a man and a woman from looking at a picture - why not use that fact for a CAPTCHA?
Ulrik on March 7, 2008 05:49 AMThe "What's the common theme of the images" is intriguing, but might have cultural bias.
Also, as tagged pictures (Flickr, Zooomr, others) become more of a searchable database, such a system might become less effective.
Ike on March 7, 2008 09:00 AMRWW --
"You have to remember that 50% of the world's population has a lower than average IQ (obviously). It's a bit cruel asking them to answer even the simplest of questions."
Umm, you mean 50% of the world's population has a lower than *MEDIAN* IQ. Guess we know which half you're in.. wokka wokka.
Brian on March 8, 2008 12:00 AMWhat is the button below F on your keyboard?
:D
I see two keys below the F on mine... neither of which is the one key below the F on a Dvorak... also, it is not necessarily the case that everyone can receive cellphone texts...
The hotmail captcha is one of the worst that I've seen.
Syahid A. on March 9, 2008 06:26 AMI'm reminded of XKCD's solution ( http://xkcd.com/233/ )
When Littlefoot's mother died in the origional 'Land Before Time,' did you feel sad?
( ) Yes
( ) No
(Bots: No lying)
Joking aside, any solution should address human spamfarms. Like a, "What's the name of this site?" or "What color is this site's background? Yellow, white, or blue?" where multiple choice is not radio buttons, but a text field, and the question asks something about the context that'd be removed in a spamfarm.
The problem is that for something like Google or Hotmail, the site's too well known and the reward for cracking is too high for most captchas, including context-based questions, to be effective.
Also, what would we call a CAPTCHA that is meant to thwart human spammers?
Blain on March 9, 2008 03:47 PMAt an abstract level a CAPTCHA is attempting to perform a specific Turing test to determine if an unknown participant is a human or machine.
As the variety of CAPTCHA increase, the Turing tests change from specific to general.
A program capable of discriminating the difference between a human and another machine for these 'general' Turing tests would capable of passing itself off as human to itself (and possibly humans too).
You end up with an infinite recursion with a CAPTCHAs arms war. As a side effect SPAM solves a key problem of machine intelligence (who said it was useless!).
what if you showed a picture which the user had to describe in one word. but also randomly change the pictures.
so you could have a few thousand off each (i.e. dog, cat, house, market, man, woman etc pictures.) and thousands of different words.
then change all of the pictures every so often.
it sounds daft i know.
JamesT on March 10, 2008 03:12 AMsorry didn't read the above posts.
JamesT on March 10, 2008 03:13 AMHere's a nice idea about captcha cracking:
http://ardoino.com/41-online-social-and-unaware-captcha-cracking/
I use a modified captcha-type Turing test on my blog. Now, I don't get the traffic that some sites get, but I had a post make the front page of digg.com recently. That post garnered well over 100 comments, without a single spam comment.
How did I modify it? Well, I don't use reading or images. I use a form of an intelligence test with questions that should be easy for a human to answer, but not for anything automated to guess easily. Some answers are text, some are numbers. It's not perfect, and it could probably be broken pretty quickly and easily by anyone with a will to do so, but really, if we are honest with ourselves, all a captcha or any other Turing test is going to do is help eliminate the nuisances. This is like putting a lock on the front door of your house, it won't prevent a thief with intent, but it will stop the casual opportunist attempting to open the door.
matthew on March 10, 2008 04:55 PMWith CAPTCHA breached, do you think that Google system issues like the meltdown Google Groups group-owners are experiencing
http://groups.google.com/group/Google-Groups-Basics/browse_thread/thread/1427ec5996001762/
are the result of Google overreacting to this security threat?
I run a web site that has a registration form that was getting bombed by spammers. I threw in two very simple tests:
1. I scan every submission against a list of "unlikely words". This list includes words that were routinely showing up in the spam adds, like "mortgage" and names of sex drugs, including a few common "obfuscated spellings" like "\/iagra". (Obviously if you are running a web site for a bank, blocking anyone who asks about mortgages may not be a good plan. The list of prohibited words would have to be tailored to the site.) (I see from my first attempt to submit this post that you're blocking names of sex drugs also.)
2. The funny part: One field on the form asked the user to place himself in a category with a set of radio buttons to pick. I noticed that the spammers picked the first radio button well over 90% of the time. So I added a new first choice, "I am a spammer", and if they picked that, I rejected the entry.
Since making the above two changes several months ago, only a handful of irrelevant entries have made it through, and those look too coherent to be machine-generated spam, I think they're "manual spam".
The big caveat on this sort of strategy is that my site gets about 60,000 unique visitors a month and the only thing anyone has to gain by spamming my site is getting his ads or links to his site onto my pages. That is, I'm not a big target. I'm sure if Google or a big bank or somebody tried my tactics the spammers would see what they were up to and easily circumvent it.
But I think it stands to reason that "adequate security" for a small site with little to steal is much different from adequate security for a big site that could potentially give a succesful hacker access to megabucks. Like, I lock my front door and I keep a gun handy for self-defense. I consider that adequate security. I certainly hope that First National Bank, not to mention nuclear weapons depots, have more stringent security than that. I have no illusions that the lock on my front door is going to keep a skilled team of terrorists from breaking into my house. But I also pretty much assume that no skilled team of terrorists is likely to target my house.
jay on March 13, 2008 11:40 PMOn a totally different direction: How about if we just start compiling a big list of web sites and email addresses of spammers. It should be easy enough to collect this using spam filters on email programs. Then post many copies of this list, with hot links, all over the net. Then the spammers robots will find it, and they'll start spamming each other! It may not do much to solve the problem but it would certainly be poetic justice.
Idea #2: Put together an organization dedicated to tracking down the home phone numbers of spammers. Post this on the net. Encourage hundreds of thousands of people to call them at all hours of the day and night. Maybe they'd sue for harassment, but it would make for a fun day in court.
jay on March 13, 2008 11:57 PMJay: because blacklists don't work. Enumerating badness is like trying to count grains of sand.
http://www.codinghorror.com/blog/archives/001009.html
Jeff Atwood on March 14, 2008 01:10 AMThere's a service that hires captcha typers from bulgaria.
To Jay: most addresses are faked or are joe jobs.
Justin Goldberg on March 14, 2008 02:30 PMgraylist
calcnerd256 on March 16, 2008 12:15 PMCaptcha is a hurdle for visitors. Why should visitors have to jump thru hoops b/c of spammers? (And still, it's not 100%).
Blacklists / greylists / whitelists are a PITA to maintain, distribute and make errors.
Moderation puts the onus on the blogger and a delay in the comment posting - who wants either?
Bots makes oodles of assumptions and can be tested for, just need to think like a bot. :D
No hurdles, open commenting, no maintenance, no delay ... simple. ;)
stk on March 29, 2008 09:03 AMInteresting ideas, but most won't work
1) ASCII art
-> take a png from the webpage and OCR it (piece of cake, 1 hour work)
2) Javascript
-> comments are made by HTTP requests GET or POST No javascript is involved, and if it's in the browser, you can look what it does, simulate and POST it. Robots don't use a webpage, they use a socket to send the HTTP request
3) dogs/cats/ugly people
-> 9 pictures, 3 choices, that would be 1/1000 ?
could work, but I saw some guys that I wouldn't call ugly that were labeled ugly. Can't work for google/hotmail, spammers would just harvest the images and create 1 big database with the results ugly/not ugly. Homeusers can't use it either, they don't have an ugly-people database.
4) jane has 4 oranges, take away one, how many does she have left?
-> useless, you can't say this question in 20 difference ways, so hackers be able to calculate this very easily.
5) math
-> if there is something a computer can do, it's solving math... so useless
Acutally, no system will work. People in China get 30$ a month (!) to make my NIKE running shoes. Give 50$ to some friends from India and they'll solve captcha's all day long... Defying all captcha's
Tricks:
- no human can enter a captcha within 1 second, so if the message is posted delete
- noone is supposed to post more messages than 1 per minute
- limit the regeneration of the captcha: 1 minute for the 2nd chance, 2 for the 3rd, 5 for the 4th, 10 for the 5th...
- if the captcha isn't solved withing 10 seconds after generation (let them first solve the captcha before entering userdetails/comments) it fails -> solves the farming/sending to p*rnsites
- internet police: log IP's, IP + time = user
that user's internet access is blocked for 7 days. countries not cooperating: cut-off of the internet
Also:
use a captcha to inform the user if the registration/comment was successful
-> that way, a bot doesn't know if he solved the captcha correctly ;) since knowing that would require solving a captcha :P
Tim on April 7, 2008 05:38 AMI don't mean to spam :)
But for forums/blogs: registered users should be able to flag something spam. Make use of the "web 2.0 social" techniques to fight spammers
http://www.theregister.co.uk/2008/04/14/msn_captcha_breaking/
MSN is truly broken...and definitely by script, not cheap labor.
Jeff on April 14, 2008 02:08 PMFirst, judging by Poker's comment above, your captcha is broken ;)
Second, your captcha has made the news: http://www.news.com/8301-10784_3-9929073-7.html
Caleb on April 29, 2008 03:07 PM| Content (c) 2008 Jeff Atwood. Logo image used with permission of the author. (c) 1993 Steven C. McConnell. All Rights Reserved. |