I'm not a huge fan of The Daily WTF for reasons I've previously outlined. There is, however, the occasional gem – such as this one posted by ezrec:
Browsing through a web archive of some old computer club conversations, I ran across this sentence:
"Apple made the clbuttic mistake of forcing out their visionary - I mean, look at what NeXT has been up to!"
Hmm. "clbuttic".
Google "clbuttic" - thousands of hits!
There's a someone who call his car 'clbuttic'.
There are "Clbuttic Steam Engine" message boards.
Webster's dictionary - no help.
Hmm. What can this be?
As programmers, this isn't much of a mystery to us; it seems every day a brand new software developer is born and immediately begins repeating all the same mistakes we made years ago. I can't resist linking to Language Log again on this topic, where a commenter disputes whether or not this is an actual real world problem:
The "clbuttics" story may be a little exaggerated if not actually a web-legend. Sure, Google returns 4,000 hits – but by the time one reaches page 2 (in search of a page that isn't reporting on the silliness, or reporting on the reports, etc.) we're down to 200 hits.
Almost all of those 200 seem to have a "clbuttic mistake" by Apple at their core. Google's redundancy-compacting routines are only invoked when requested, it seems, and even then, the variety of information in 200 hits may be small.
In short, it's an echo chamber. 200 or 4,000 or however many hits today aren't as impressive as the same number last year, etc. All the more so as web sites of all kinds put randomly chosen (even Googled!) words out there just to game Google.
While I agree this particular manifestation of the mistake is probably over-reported (because, haha, butts are funny) and fairly rare on the open web, I still get this shiner on page one of my search results:
Is the song Dueling Banjos considered blue grbutt?
Poor Bluegrass World. I'm pretty sure that site is legitimate, though I have no idea how they'd post an article in that state. Obligatory link to dueling banjos scene from Deliverance. I'm inclined to believe this is, in fact, still a problem. There are many, many examples besides "clbuttic" out there. Perhaps you've heard of the United States Consbreastution?
Of course, what we have here is failed obscenity filters implemented by (extremely) newbie developers with regular expressions. I could explain, but as they say, a picture is worth a thousand words, particularly when it's a picture of my very bestest friend, RegexBuddy:
Oh, great, an inexperienced developer had a problem, and thought they would use regular expressions. Now they have two problems. Well, technically through Google they now have many thousands of problems, but who's counting.
I'm not sure regular expressions are to blame here. The replacement is so mind-bendingly naive that it might as well have been a simple Replace operation. We, being extra-smart-gets-things-done developers, would write a superior regular expression using the \b word boundary qualifier around the replacement, and use some capturing parens to handle both the singular and plural cases.
How about those Great Tits, eh?
Proving, yet again, that bad ideas are just plain bad ideas, regardless of language or implementation choices. Obscenity filters are like blacklists; using one is tantamount to admitting failure before you've even started.
But it still happens all the time. One of the most famous incidents was when the Yahoo! email developers created the accidental non-word Medireview. They weren't trying to filter obscenities, but JavaScript webmail exploits.
In 2001 Yahoo! silently introduced an email filter which changed some strings in HTML emails considered to be dangerous. While it was intended to stop spreading JavaScript viruses, no attempts were made to limit these string replacements to script sections and attributes, out of fear this would leave some loophole open. Additionally, word boundaries were not respected in the replacement.The list of replacements:
Javascript → java-script Jscript → j-script Vbscript → vb-script Livescript → live-script Eval → review Mocha → espresso Expression → statement
Some side-effects of this implementation:
| medieval | → | medireview |
| evaluation | → | reviewuation |
| expressionist | → | statementist |
medireview.com is currently occupied by domain squatters. Perhaps that's a fitting end for this "company", though I perversely almost want the company to exist, as wholly formed from our imaginations, sort of like Jamcracker.
I can't help wondering just how freaked out the brass at Yahoo must have been about then-new JavaScript browser exploits to actually deploy such a brain-damaged "solution". To be fair, it was seven years ago, but still – did it not occur to anyone that such broad replacement criteria might have some serious side-effects? Or that replacing one thing with another, when it comes to human beings and written language, is an activity that is fraught with peril even in the best possible circumstances?
Obscenity filtering is an enduring, maybe even timeless problem. I'm doubtful it will ever be possible to solve this particular problem through code alone. But it seems some companies and developers can't stop tilting at that windmill. Which means you might want to think twice before you move to Scunthorpe.
Amusing. My one-off a href='http://www.cafeaulait.org/greattits.html'Great Tits/a page is still one of the most heavily trafficked pages on my site off the main page, even though it's barely linked from anywhere.
Elliotte Rusty Harold on October 24, 2008 11:11 AMThere was (or is) a real case of some Russian enterpreneurs who decided to steal snapshot of Russian Wikipedia, make their own site off it and server ads. To remove references from wikipedia, they presumably used mass replace, of wiki to encyclo. There's only one letter for V in Russia, so Wiki and Viki is the same for directly transliterated words such as Wikipedia. This resulted in many interesting articles, most notable of which is northern tribe of Encyclongs, that became a small scale meme.
Sergey Shelukhin on October 24, 2008 11:38 AMThere are some web services out there that filter profanity so you don't have to write your own. They seem to be pretty effective. I use WebPurify (www.webpurify.com) on my blog and it seems to work pretty well. It doesn't check for spelling mistakes though (-;
James Rosenstein on November 2, 2008 8:09 AMI remember once trying to discuss the forward ptookus in a football (American) chat room... (btw, on the Battlestar Galactica example, that one actually dates back to the *original* series, and the new one retained it in tribute)
silverpie on December 16, 2008 1:56 AMI bet they had a really embarrassed statement on their faces...
warlock on May 7, 2009 5:21 AMButtcopulation.
(Well, really not all that much better...)
anon on February 6, 2010 10:37 PM United States Consbreastitution?
You left the word 'tit' in as well as your replacement :)
Word filters are always going to fail, we can't account for every variation. T1Ts won't get picked up by a regular expression but any person will see the word right away.
I always try and steer clients towards a path of moderation and po,icy enforcement, but for those who must have word filters its best that you make it clear right at the start that not everything will be caught...people will always find a way to be naughty,
I want to stick my long-necked Giraffe up your fluffy white bunny.
perfect example :)
Aaron Bassett on February 6, 2010 10:37 PM@J. Stoever
The internet can not be made child friendly.
Software is no replacement for good parenting; you can not abdicate your responsibility to ensure their browsing is safe.
Make sure you can always supervise their usage!
Simon
Simon Johnson on February 6, 2010 10:37 PMWe had a fun case when they tried to set up filters like that on our work email server some years back. A large amount of email started getting inadvertently blocked, which for the most part was from, or in reply to, anyone with Analyst in their job title (and therefore in their email signature).
Simon on February 6, 2010 10:37 PMLook at my name and imagine how useful filtering has been for me...
Ben Sexton on February 6, 2010 10:37 PMThen there's the ever classic dawizard incident. (http://www.everything2.com/e2node/DaWizard)
Karl von L. on February 6, 2010 10:37 PMJust because it's hard/impossible doesn't mean it's completely wrong to do it. I'd rather see a few clbuttic mistakes that certain words all over kids internet browsers.
Robin Day on July 27, 2010 2:22 AMIts really a Great site.
we are:
Filtration Solutions Inc. (FSI) is comprehensive distributor and manufacture of filtration products and services. We represent a world wide network of filter manufactures. For Over 20 years, FSI has been developing solutions that can effectively benefit you and your company.
http://www.filter-supply.com/
Thank you
The above discussion is very intersting but we are not agree with it.
http://www.filter-supply.com
There have been a lot of changes since 2008, now with the power of cloud servers you can focus resources on figuring out context, etc. If you are interested in seeing this in action, check out http://www.webpurify.com
Profanityfilter on April 6, 2011 4:32 PMThe comments to this entry are closed.
|
|
Traffic Stats |