I <3 Steve McConnell*
Coding Horror
programming and human factors
by Jeff Atwood

June 30, 2005

Bayesian Kryptonite - spoofed email

I use POPFile bayesian filtering to keep email spam at bay. With a little training, this works amazingly well-- I'm at 99.8% accuracy, and that's with a little over a month of "training" precipitated by a recent server migration. But bayesian filtering has one big weakness that I'm seeing more and more: spoofed emails.

You know what I mean-- emails titled Your Account Has Been Violated from, ostensibly from service@paypal.com. The body is a direct cut and paste from a real PayPal email:

Security Center Advisory!

We recently noticed one or more attempts to log in to your PayPal account from a foreign IP address and we have reasons to belive that your account was hacked by a third party without your authorization. If you recently accessed your account while traveling, the unusual log in attempts may have been initiated by you.

If you are the rightful holder of the account you must click the link below and then complete all steps from the following page as we try to verify your identity.

Of course, the spoofer is desperately hoping you won't notice that the crazy URLs in their email ..


http://paypaldemo.com.previewyoursite.com/source/service/ema/helpextsourcepage/PaypalISAPIruhttp3A2F2Fmyebamcom3A802Fws2FeBayISAPIdll3FMyeBay26ssPageName3DH253AH253A/
http://ebay.doubleclick.net/clk;13012399;10693575;h?http://cardsavetransfer.com/cmdr_login/index.htm
http://ebay.doubleclick.net/clk;13012399;10693575;h?http://paypalcardstraznact.com/cmdr_login/index.htm

.. aren't actually pointing to paypal.com (or ebay.com), and you'll key in your account and password on their servers.

These spoof emails contain so-called "kryptonite" because they so closely mimic actual emails from PayPal with valid words and phrases. Bayesian filtering is useless against this type of spam; if the spammer knows what any email in your actual inbox looks like, he can construct one that will beat any Bayesian filter. This is a a strict requirement at the very heart of bayesian filtering itself; any knowledge of valid contents (eg, things that "get through") has to be strictly eliminated.

I usually just delete these emails from my inbox; what else can I do? One thing is for sure: popular web-based services can no longer communicate via email with their customers. That's like giving spoofers a free pass; once they have the "template" email they can copy and paste it into a spoof email that is almost guaranteed to get past bayesian filtering for users of that service.

eBay, for example, has almost given up altogether on email communication. You have to visit eBay.com and check your web-based "message center" to communicate with them. I can't say I blame them; what other choice do they have?

Posted by Jeff Atwood    View blog reactions

 

« For Best Results, Forget the Bonus VS.NET and Code Regions »

 

Comments

I wonder if they have filters that say more than 3 links and its junk...because I never get emails with more than 2 unless they are junk.

Sushant Bhatia on June 30, 2005 11:51 PM

GMail actually has a decent spam filter, and it also has good spoof e-mail detection. I got a phishing e-mail the other day, that was spoofing the e-mail address, service@paypal.com, and I got a bright red message at the top that said the following:

"Warning: This message may not be from whom it claims to be. Beware of following any links in it or of providing the sender with any personal information."

Now admittedly, anybody in our line of business should know immediately whether an e-mail is legitimate or not, but it's still a good thing for the more non-technical people using e-mail.

On an amusing note, the e-mail I'm referring to was the worst attempt at phishing I've ever seen. Check out this snippet from the body of the e-mail:

"U need to update ur account once again, u forgot fill in ATM PIN at from update, come to link below and do it."

I mean, if you're going to do something, at least try a little harder.

Marty Thompson on July 1, 2005 09:34 AM

How many people send from mail servers other than the name they're actually claiming nowadays? I know for a long while I was using name@pritchetts.us but sending from mail.comcast.net. If most people don't do that, maybe blacklist mails where the sender doesn't match the server? I don't have much spam blocking experience (I just stick with whatever Thunderbird does for me) but I did have to run a spam filter on a Win2k3 server for a few months. I used GFI MailEssentials, which has several different forms of filtering that work independently:

Blacklists, whitelists, bayesian, keyword, and some other stuff I don't remember.

The upside was that I could pretty easily add and remove key words (prozac) to the word filter and let the bayesian filter take care of the rest. One of my favorite features was the auto whitelist, which whitelisted anyone I ever sent an outgoing mail to.

About web services (eBay etc.) not sending email, I don't know if the world is ready for that yet. It's still the only ubiquitous form of net communication that a lot of people are willing to give out their connection details for. What's next, instant messenger?

Daniel Pritchett on July 1, 2005 09:38 AM

Sorry for not bothering to read your POPFile entry until after I commented, it looks like you're on the right track looking at multiple filtering tools getting you from 98% to 100%. I certainly can't think of anything better at the moment.

I'm glad you have an RSS feed.

Daniel Pritchett on July 1, 2005 09:44 AM

> I mean, if you're going to do something, at least try a little harder.

Right. Some of the spoofs I've seen recently were quite good. Very professional looking, no misspellings, etcetera.

My rule of thumb is "could my Mom tell this was fake?" For a lot of recent spoofs, the answer is no.

Jeff Atwood on July 2, 2005 12:52 AM

> How many people send from mail servers other than the name they're actually claiming nowadays?

That is the other way to attack the problem: actually validate the identity of the sender (or at least the server sending the email). There have been some baby steps in this direction from Yahoo and Hotmail but I'm not sure if anything substantive has come from it yet.

It's definitely a good idea, but the architecture of POP3/SMTP isn't built around identity or even security-- so it's hard to retrofit.

Jeff Atwood on July 2, 2005 01:05 AM

In paypal case:
paypal.com. 3600 IN TXT "v=spf1 mx include:s._spf.ebay.com include:m._spf.ebay.com include:p._spf.ebay.com include:c._spf.ebay.com ~all"
paypal.com. 3600 IN TXT "spf2.0/pra mx include:s._sid.ebay.com include:m._sid.ebay.com include:p._sid.ebay.com include:c._sid.ebay.com ~all"


That's it.. everything coming thru paypal must come from those ranges... this way just block everything that fail on SPF tests.

Baysian by itself can't handle problems like this one.

Rodolfo Sikora on July 5, 2005 11:52 AM

Actually the logical choice would be for them to offer personalized RSS / ATOM / RDF feeds for their users. So I can get one feed from them that has all their general news (customized by me) and all news specific for me. Through in some HTTPS if they want and I know I am getting the straight dope from the horses mouth.

Jim McKeeth on July 6, 2005 02:59 PM

It's still in dev mode, but at <a href="http://www.feed-mail.com">http://www.feed-mail.com</a>; we're working on developing a Messaging over RSS application.

It sets up reciprocal RSS feeds between you and your contacts so that you can message back and forth (like email), but entirely over RSS.

Hopefully by switching from a SendTo to a PullFrom architecture it could drastically reduce the amount of spam and phishing people have to deal with.

Michael Buckbee on July 9, 2005 04:41 PM

Both spoof@paypal.com and spoof@ebay.com work fine. I always report PayPal and Ebay spoof messages. Note, though, that spoof@ebay.com won't accept an attached message - you have to forward the message to them.

Mike Dimmick on July 15, 2005 03:31 PM







(hear it spoken)


(no HTML)




Content (c) 2008 Jeff Atwood. Logo image used with permission of the author. (c) 1993 Steven C. McConnell. All Rights Reserved.