May 31, 2008
Revisiting the Black Sunday Hack
One of the most impressive hacks I've ever read about has to be the Black Sunday kill. Since the original 2001 Slashdot article I read on this is 99.9% quote, I'm going to do the same. I can see why they quoted so extensively; it'd be difficult to improve on the unusually succinct, well written summary provided by Pat from Belch:
One of the original smart cards, entitled 'H' cards for Hughes, had design flaws which were discovered by the hacking community. These flaws enabled the extremely bright hacking community to reverse engineer their design, and to create smart card writers. The writers enabled the hackers to read and write to the smart card, and allowed them to change their subscription model to receive all the channels. Since the technology of satellite television is broadcast only, meaning you cannot send information TO the satellite, the system requires a phone line to communicate with DirecTV. The hackers could re-write their smart cards and receive all the channels, and unplug their phone lines leaving no way for DirecTV to track the abuse. DirecTV had built a mechanism into their system that allowed the updating of these smart cards through the satellite stream. Every receiver was designed to 'apply' these updates when it received them to the cards. DirecTV applied updates that looked for hacked cards, and then attempted to destroy the cards by writing updates that disabled them. The hacking community replied with yet another piece of hardware, an 'unlooper,' that repaired the damage. The hacker community then designed software that trojanized the card, and removed the capability of the receivers to update the card. DirecTV could only send updates to the cards, and then require the updates be present in order to receive video. Each month or so, DirecTV would send an update. 10 or 15 minutes later, the hacking community would update the software to work around the latest fixes. This was the status quo for almost two years. 'H' cards regularly sold on eBay for over $400.00. It was apparent that DirecTV had lost this battle, relegating DirecTV to hunting down Web sites that discussed their product and using their legal team to sue and intimidate them into submission.Four months ago, however, DirecTV began sending several updates at a time, breaking their pattern. While the hacking community was able to bypass these batches, they did not understand the reasoning behind them. Never before had DirecTV sent 4 and 5 updates at a time, yet alone send these batches every week. Many postulated they were simply trying to annoy the community into submission. The updates contained useless pieces of computer code that were then required to be present on the card in order to receive the transmission. The hacking community accommodated this in their software, applying these updates in their hacking software. Not until the final batch of updates were sent through the stream did the hacking community understand DirecTV. Like a final piece of a puzzle allowing the entire picture, the final updates made all the useless bits of computer code join into a dynamic program, existing on the card itself. This dynamic program changed the entire way the older technology worked. In a masterful, planned, and orchestrated manner, DirecTV had updated the old and ailing technology. The hacking community responded, but cautiously, understanding that this new ability for DirecTV to apply more advanced logic in the receiver was a dangerous new weapon. It was still possible to bypass the protections and receive the programming, but DirecTV had not pulled the trigger of this new weapon.
Last Sunday night, at 8:30 pm est, DirecTV fired their new gun. One week before the Super Bowl, DirecTV launched a series of attacks against the hackers of their product. DirecTV sent programmatic code in the stream, using their new dynamic code ally, that hunted down hacked smart cards and destroyed them. The IRC DirecTV channels overflowed with thousands of people who had lost the ability to watch their stolen TV. The hacking community by and large lost not only their ability to watch TV, but the cards themselves were likely permanently destroyed. Some estimate that in one evening, 100,000 smart cards were destroyed, removing 98% of the hacking communities' ability to steal their signal. To add a little pizzazz to the operation, DirecTV personally "signed" the anti-hacker attack. The first 8 computer bytes of all hacked cards were rewritten to read "GAME OVER".
Nobody knew how the satellite companies had suddenly developed such smarts. Until now. A recent Wired article exposes Christopher Tarnovsky as the mind behind the epic Black Sunday Hack.
Among the countermeasures he says he created was one known among pirates as the "Black Sunday" kill -- an elaborate scheme that destroyed tens of thousands of pirate DirecTV cards a week before Super Bowl Sunday in 2001.Instead of being delivered all at once like other measures, the Black Sunday attack code was sent to pirate cards in about five dozen parts over the course of two months, like a tank transported piece by piece to a battlefield to be assembled in the field. "They never expected us to do this," Tarnovsky says.
The kill didn't last long before pirates found a way to jump-start the cards. But it holds an enduring position in pirate lore; for the first time, they could see a cunning mind at work on the other side.
It's fascinating to finally hear the Black Sunday kill described so intimately from the inside. It's a gripping tale of high stakes programming, a life of electronic warfare with millions of dollars at risk on both sides. I've never been a satellite television subscriber, but apparently the war rages on even today -- at least according to the Wikipedia entry on pirate decryption.
May 29, 2008
Strong Opinions, Weakly Held
I seldom pause to answer criticism of my blog. If I did, I'd have time for little else in the course of the day, and no time for constructive work. But occasionally I'll encounter a particularly well written critique that gives me pause, such as Alastair Rankine's Blogging Horror. Since I feel that Alastair wrote it out of genuine good will, and that his criticisms are sincerely set forth, I want to try to answer his statement in what I hope will be patient and reasonable terms.
However, Coding Horror has become so popular that Atwood has quit his day job and struck out on his own. To my mind, this raises the bar somewhat. Professional bloggers deserve more scrutiny than dabblers, just as in many other fields.Not only has Atwood has gone pro with his blog, but has recently started a venture called stackoverflow to collate accepted wisdom from the software development community. It is early days, but from what I can gather there is still likely to be Atwood's editorial hand in the output, despite intentions of adopting community generated content.
In other words, Atwood seems to be setting himself up as an authority figure on software development and, well, I have some issues with this.
I'd like to first answer this with two slides from my January CUSEC presentation, presented here verbatim with no modifications.
Authority in our field is a strange thing. Perceived authority is stranger still.
I've always thought of myself as nothing more than a rank amateur seeking enlightenment. This blog is my attempt to invite others along for the journey. It has become a rather popular journey along the way, which has subtly altered the nature of the journey and the way I approach it, but the goal remains the same.
It troubles me greatly to hear that people see me as an expert or an authority, and not a fellow amateur:
When I got back to Boston I went to the library and discovered a book by Kimura on the subject, and much to my disappointment, all of our "discoveries" were covered in the first few pages. When I called back and told Richard what I had found, he was elated. "Hey, we got it right!" he said. "Not bad for amateurs."In retrospect I realize that in almost everything that we worked on together, we were both amateurs. In digital physics, neural networks, even parallel computing, we never really knew what we were doing. But the things that we studied were so new that no one else knew exactly what they were doing either. It was amateurs who made the progress.
These people are industry giants, so any comparison between them and myself is accidental. It's the overall point they're making that I want to call your attention to: software is an incredibly young discipline. Everything in software is so new and so frequently being reinvented that almost nobody really knows what they are doing. It is amateurs who make all the progress.
When it comes to software development, if you profess expertise, if you pitch yourself as an authority, you're either lying to us, or lying to yourself. In our heart of hearts, we know: the real progress is made by the amateurs. They're so busy living software they don't usually have time to pontificate at length about the breadth of their legendary expertise. If I've learned anything in my career, it is that approaching software development as an expert, as someone who has already discovered everything there is to know about a given topic, is the one surest way to fail.
Experts are, if anything, more suspect than the amateurs, because they're less honest. Regardless, you absolutely should question everything I write here, in the same way you question everything you've ever read online -- or anywhere else for that matter. Your own research and data should trump any claims you read from anyone, no matter how much of an authority or expert you, I, Google, or the general community at large may believe them to be.
But if, as Alastair correctly points out, I now derive a significant part of my income from blogging, doesn't that make me a professional blogger by definition? I thought Dave Winer had a great explanation that I'll gladly co-opt:
Now if you ask me -- there never was such a thing as a pro blogger. It's a contradiction in terms. It's like calling someone a professional amateur. It's like salty orange juice, a drink whose taste is derived from its acidity. Blogging is an amateur activity. It's users writing about what they do, not professionals writing about what users do.
What Dave's describing here is the difference between a journalist writing about programmers versus a programmer writing about programming. Blogging does not mean observing from the outside; it means participation. I like to think what I do at Coding Horror is a byproduct of shipping software, not some sort of bizarre sociological experiment I'm conducting. Although sometimes, I'll admit, it does feel that way. I am a generalist with a decidedly lowbrow coding background, so I can be a little scatterbrained. But directly or indirectly, everything I've ever written on this blog is a side-effect of my deep, lifelong love of my ongoing work as a programmer.
You could argue that I'm a better writer than programmer. Perhaps that's true. I'll be the first to tell you that I am not an exceptional programmer. A competent programmer, yes. Always. On a good day, perhaps even a decent programmer. But I don't kid myself, either. I'll never be one of the best. But I have an ace up my sleeve that most don't: what I lack in talent, I make up in intensity.
Which means, mathematically speaking, I must be pretty damn intense.
The bite-sized morsels posted to Coding Horror are all very well for bite-sized topics. But things can often go awry if the topic is too complex to be distilled down easily. Oversimplification often ensues, as in the following examples, all recent:
- An attempted critique of XML ...
- A similar "it's-too-hard" reaction seems to be at the heart of an article on humane markup languages ...
- Admittedly Model-View-Controller is an increasingly vague concept these days, but I just couldn't buy Atwood's example of it ...
- A comment that software forking is "the very embodiment of freedom zero" demonstrates that Atwood has no idea what freedom zero is ...
Common to all of these are a superficial understanding of the topic at hand. In short, Atwood just isn't credible.
Maybe a little too intense, sometimes. It's almost like I'm trying to overcompensate for something, but I can't imagine what that could be.
![]()
I'm Rex, founder of the Rex Kwon Do self-defense system! After one week with me in my dojo, you'll be prepared to defend yourself with the STRENGTH of a grizzly, the reflexes of a PUMA, and the wisdom of a MAN.
Like Rex of Rex Kwon Do, perhaps I'm relying a bit too heavily on the "Smackdown" learning model here in my dojo. I use it because I personally find it incredibly effective, for all the reasons that Kathy Sierra outlines.
But I worry that, for some, it's getting in the way, that it is damaging the credibility of the underlying message. Instead of arriving at the desired learning part, all they're getting is the smackdown. I certainly hope my posts are read and understood as slightly more nuanced than "Everything About PHP Sucks", "Everything About XML Sucks", or my personal favorite, "Everything About (your favorite technology) Sucks. Seriously."
I suppose it's also an issue of personal style. To me, writing without a strong voice, writing filled with second guessing and disclaimers, is tedious and difficult to slog through. I go out of my way to write in a strong voice because it's more effective. But whenever I post in a strong voice, it is also an implied invitation to a discussion, a discussion where I often change my opinion and invariably learn a great deal about the topic at hand. I believe in the principle of strong opinions, weakly held:
A couple years ago, I was talking the Institute's Bob Johansen about wisdom, and he explained that -- to deal with an uncertain future and still move forward – they advise people to have "strong opinions, which are weakly held." They've been giving this advice for years, and I understand that it was first developed by [former] Institute Director Paul Saffo. Bob explained that weak opinions are problematic because people aren't inspired to develop the best arguments possible for them, or to put forth the energy required to test them. Bob explained that it was just as important, however, to not be too attached to what you believe because, otherwise, it undermines your ability to "see" and "hear" evidence that clashes with your opinions. This is what psychologists sometimes call the problem of "confirmation bias."
So when you read one of my posts and hear this:
My name is Rex, and if you study with my eight-week program you will learn a system of self defense that I developed over two seasons of fighting in the Octagon. It's called... Rex Kwon Do!
Please consider it a strong opinion weakly held, a mock fight between fellow amateurs of equal stature, held in an Octagon where everyone retains their sense of humor, has an open mind, and enjoys a spirited debate where everyone ultimately learns something.
Now bow to your sensei! Bow to your sensei!
May 28, 2008
Designing For Evil
Have you ever used Craigslist? It's an almost entirely free, mostly anonymous classified advertising service which evolved from an early internet phenomenon into a service so powerful it is often accused of single-handedly destroying the newspaper business. Unfortunately, these same characteristics also make Craigslist a particularly juicy target for spammers and evildoers. Who knows; maybe it's karma.
I consider Craiglist a generally benevolent public service. Perhaps that's why I was so profoundly disturbed by John Nagle's wartime narrative of the raging battle between Craigslist and spammers.
Spam on Craigslist has been a minor nuisance for years. Not any more. This year, the spammers started winning and are taking over Craigslist. Here's how they did it. Craigslist tries to stop spamming by:
- Checking for duplicate submissions.
- Blocking excessive posts from a single IP address.
- Requiring users to register with a valid email address.
- Using a CAPTCHA to stop automated posting tools.
- Letting users flag postings they recognize as spam.
Several commercial products are now available to overcome those little obstacles to bulk posting. CL Auto Posting Tool is one such product. It not only posts to Craigslist automatically, it has built-in strategies to overcome each Craigslist anti-spam mechanism:
- Random text is added to each spam message to fool Craigslist's duplicate message detector.
- IP proxy sites are used to post from a wide range of IP addresses.
- E-mail addresses for reply are Gmail accounts conveniently created by Jiffy Gmail Creator (ed. note: this does not break Google's CAPTCHA, as you can see in this screenshot.)
- An OCR system reads the obscured text in the CAPTCHA.
- Automatic monitoring detects when a posting has been flagged as spam and reposts it.
CL Auto Poster isn't the only such tool. Other desktop software products are AdBomber and Ad Master. For spammers preferring a service-oriented approach, there's ItsYourPost. With these power tools, the defenses of Craigslist have been overrun. Some categories on Craigslist have become over 90% spam. The personals sections were the first to go, then the services categories, and more recently, the job postings.
Craigslist is fighting back. Its latest gimmick is phone verification. Posting in some categories now requires a callback phone call, with a password sent to the user either by voice or as an SMS message. Only one account is allowed per phone number. Spammers reacted by using VoIP numbers. Craigslist blocked those. Spammers tried using number-portability services like Grand Central and Tossable Digits. Craigslist blocked those. Spammers tried using their own free ringtone sites to get many users to accept the Craigslist verification call, then type in the password from the voice message. Craigslist hasn't countered that trick yet.
Much of the back and forth battle can be followed in various forums. It's not clear yet who will win.
I've used Craigslist quite a few times in the past, mostly to sell things that are too unwieldy to ship, with generally positive results. But that's the "for sale" section, and the spammers seem to be concentrating on the personals and services. I was curious about this, so I delved into the local personals section in what I guessed to be the most popular category. (Note to my wife: this is research! Research! I swear!)
Almost immediately I found a personals ad with the following "image":
It's an encoded wartime transmission from someone battling Craigslist spammers. It ends on this dire warning:
99.9% of the ads these days are fakes. Sad but true. REALLY, ALMOST ALL THE ADS ARE FAKE!
But is it true? I saw some obvious spam in the personals section -- all of which had been flagged for removal by the time I clicked on it -- but certainly nothing to corroborate this 99.9% claim. I did a few unique term searches on random personals (my favorite at the moment is "no murderers please!"), and they came up unique.
Clearly, there's a war on, and there have been casualties on both sides. Even if the spammers aren't winning, every inch they gain further undermines the community's trust in Craigslist and devalues everyone's participation.
This is a topic I am acutely interested in as we build stackoverflow.com out. Like Craigslist, stackoverflow will offer a rich experience for anonymous internet users. We will not require you to create an account or "login" to answer or ask questions. We'll even track your reputation and preferred settings for you, as long as you allow us to store a standard browser cookie. While it's true that we'll initially be a low-value target due to limited traffic and a specialized audience, that will inevitably change over time. So you can expect some of the same measures on stackoverflow that Craigslist and Wikipedia use to mitigate anonymous evil:
- Some form of CAPTCHA.
- The ability to temporarily "lock" controversial questions so only registered users can edit or add responses.
- An automatic throttle if we see rapid, bot-like actions from your IP address.
- Some basic heuristics to detect "spammy" content, such as too many URLs.
- An easy way for users with sufficient reputation to undo vandalism by reverting to an earlier version.
The community itself can also assist. Every question and answer on stackoverflow can be rated Digg style; if a given bit of content rapidly accrues a large number of downmods, it is likely to be spam or inappropriate content, and will be automatically removed or directed into a moderation queue.
Don't get me wrong. I've been humbled by the quality -- and the sheer size -- of the community that has grown up around this blog. I expect the overwhelming majority of people who participate in stackoverflow.com will be absolutely upstanding internet citizens. Wikipedia is a living testament to the fact that goodness vastly outnumbers evil. We good guys can win, if we've had the forethought to put some controls in place first.
Allowing anonymous users write permission creates a volatile situation where a dozen sufficiently motivated spammers can easily poison the well for thousands of typical users. These spammers don't give a damn about the community we're building together. All they care about is getting paid by posting their links anywhere and everywhere they can. They'll run roughshod over as many websites and pages as possible in their frantic, abusive pursuit of money. If I didn't so desperately want to choke the life out of each and every one of them, I might actually feel sorry for the poor bastards.
But here's the problem: following the rules and being a good citizen is easy. Being evil is hard; it takes more work. Sometimes a lot more work. The bad guys get paid to learn about their exploits. Are you willing to educate yourself about the complex evil that a tiny minority of powerful users are prepared to unleash upon your site? As with so many things in life, this is best illustrated by a scene from Spaceballs:
HELMET So, Lone Starr, Yogurt has taught you well. If there is one thing I despise, it is a fair fight. But if I must than I must. May the best man win. Put 'er there. (offers to shake his hand)
![]()
LONE STARR goes to shake his hand. HELMET takes the ring off LONE STARR'S hand.
HELMET The ring. I can't believe you fell for the oldest trick in the book. What a goof. What's with you man? Come on. You know what? No, here let me give it back to you. (offers the ring back)
LONE STARR goes up to get the ring back. HELMET throws it in a grate. The ring goes in the grate. LONE STARR tries to catch it and falls to the grate.
HELMET Oh, look. You fell for that, too. I can't believe it man.
LONE STARR gets up and runs to a corner.
HELMET So, Lone Starr, now you see that evil will always triumph, because good is dumb.
As the good guys, we can't afford to be ignorant of the spammers' techniques. If that means spelunking through the grimiest corners of some scummy black hat forums, then so be it. I'll tell you this: I've never nofollowed a single link on this blog until today. The most effective way to fight the evil spammers is to understand them, and the first step toward understanding evil is openly linking to their tools and methods, exposing them to as much public scrutiny as possible.
When you design your software, work under the assumption that some of your users will be evil: out to game the system, to defeat it at every turn, to cause interruption and denial of service, to attack and humiliate other users, to fill your site with the vilest, nastiest spam you can possibly imagine. If you don't do that, you'll end up with something like blog trackbacks, which are irreparably busted at this point. Trackbacks are the source of countless untold hours of institutionalized spam pain and suffering, all because the initial designers apparently did not ask themselves one simple question: what if some of our users are evil?
When good is dumb, evil will always triumph.
Websites that allow users to post content will always be vulnerable to the actions of a handful of evil, spammy users. It's not pleasant. It is a dark mirror into the ugly underbelly of human nature. But it's also an unfortunate, unavoidable fact of life: some of your users will be evil. And when you fail to design for evil, you have failed your community.
May 26, 2008
It's Clay Shirky's Internet, We Just Live In It
I can't remember when, exactly, I discovered Clay Shirky, but I suspect it was around 2003 or so. I sent him an email about micropayments, he actually answered it, and we had a rather nice discussion on the topic. I've been a fan of Clay's writing ever since. (In case you're curious, Clay was right -- micropayments are dead -- and I was dead wrong. All the more reason to be a fan.)
I don't think you'll find a smarter, more articulate writer on the topic of internet community than Clay Shirky. His A Group Is Its Own Worst Enemy, for example, is the seminal article on the folly of addressing social software problems purely through technology. I've referenced Clay a number of times on this blog, and his writing seems more and more prescient with each passing year. It's Clay Shirky's Internet; we just live in it.
Gin, Television, and Social Surplus is a more recent example:
Did you ever see that episode of Gilligan's Island where they almost get off the island and then Gilligan messes up and then they don't? I saw that one. I saw that one a lot when I was growing up. And every half-hour that I watched that was a half an hour I wasn't posting at my blog or editing Wikipedia or contributing to a mailing list. Now I had an ironclad excuse for not doing those things, which is none of those things existed then. I was forced into the channel of media the way it was because it was the only option. Now it's not, and that's the big surprise. However lousy it is to sit in your basement and pretend to be an elf, I can tell you from personal experience it's worse to sit in your basement and try to figure if Ginger or Mary Ann is cuter.And I'm willing to raise that to a general principle. It's better to do something than to do nothing. Even lolcats, even cute pictures of kittens made even cuter with the addition of cute captions, hold out an invitation to participation. When you see a lolcat, one of the things it says to the viewer is, "If you have some sans-serif fonts on your computer, you can play this game, too." And that message -- I can do that, too -- is a big change.
This is something that people in the media world don't understand. Media in the 20th century was run as a single race -- consumption. How much can we produce? How much can you consume? Can we produce more and you'll consume more? And the answer to that question has generally been yes. But media is actually a triathlon, it 's three different events. People like to consume, but they also like to produce, and they like to share.
It's exactly this sort of deep, penetrating insight which makes me wonder if Clay Shirky will be looked back on as one of the key historical figures of the nascent internet era. Maybe I'm just a naive fanboy, but the guy seems to see a lot farther than everyone else. So you can imagine the great interest I had in Clay's new book, Here Comes Everybody: The Power of Organizing Without Organizations.
(I'm showing the UK version of the book cover because it's about a zillion times better than the US cover. Seriously, what were they thinking?)
After reading Here Comes Everybody, I'm happy to report that it does not disappoint. I'd even go so far as to say if you're developing social software of any kind, this book should be required reading. I feel so strongly about this, in fact, that I just gave my copy to my stackoverflow coding partner. And I will be following up with pop quizzes. What's that, you say? You don't develop social software? Are you sure?
So I said, narrow the focus. Your "use case" should be, there's a 22 year old college student living in the dorms. How will this software get him laid?That got me a look like I had just sprouted a third head, but bear with me, because I think that it's not only crude but insightful. "How will this software get my users laid" should be on the minds of anyone writing social software (and these days, almost all software is social software).
"Social software" is about making it easy for people to do other things that make them happy: meeting, communicating, and hooking up.
As Jamie Zawinski once said, these days, almost all software is social software.
If you're not able to devote the time to the book, I encourage you to at least check out Clay's 42 minute presentation on "Here Comes Everybody" from earlier this year.
I found the introduction particularly inspiring; I've transcribed it here.
I've been writing principally for an audience of programmers and engineers and techies and so forth for about a dozen years. I wanted to write this book for a general audience, because the effects of the internet are now becoming broadly social enough that there is a general awareness that the internet isn't a decoration on contemporary society, but a challenge to it. A society that has an internet is a different kind of society, in the same way that a society that has a printing press was a different kind of society. We're living through the largest increase in human expressive capability in history.It's a big claim. There are really only four revolutions that could compete for that:
- The printing press and movable type considered as one broad period of innovation.
- Telegraph and telephone considered as one broad period of innovation.
- Recorded media of all types, first images, then sound, then moving images, then moving images with sound.
- Finally, the ability to harness broadcast.
These are the media revolutions that existed as part of the landscape prior to our historical generation. There is a curious asymmetry to them, which is the ones that create groups don't create two-way communication, and the ones that create two-way communications don't create groups. Either you had something like a magazine or television, where the broadcast was from the center to the edge, but the relationship was between producer and consumer. Or you had something like the telephone, where people could engage in a two-way conversation, but the medium didn't create any kind of group.
And then there's now. What we've got is a network that is natively good at group forming. In fact, this isn't just a fifth revolution. It holds the contents of the previous revolutions, which is to say we can now distribute music and movies and conversations all in this medium. But the other thing it does is move us into a world of two-way groups. Thirty years from now, when I'm presenting this book, if I had to describe it in one bullet point -- this is what the bullet point would say:
Group Action Just Got Easier.
This is, in the context of change in our historical generation, the big deal. This isn't just a new way of broadcasting information, it isn't just a new way of having two way communication, it actually engages groups. In this medium, freedom of speech, freedom of the press, and freedom of assembly are all now the same freedom. And the spread of that capability is the big deal.
Now, it could be that blogging and working on stackoverflow is clouding my perspective, making these social software issues unusually relevant to my work. When I wrote:
I realized, that's it. That's it exactly. That is what is so intensely satisfying about writing here. My happiness only becomes real when I share it with all of you.
I didn't realize the serendipitous parallels between that sentiment and Clay's claim that the internet runs on love:
In the past, we could do little things for love, but big things, big things required money. Now, we can do big things for love.
I have no idea if stackoverflow will be a "big thing" or not. But it sure is nice to wake up in the morning and work on building a community of people who love computers and code as much as I do.
Or maybe I'm just a hopeless romantic.
May 22, 2008
OpenID: Does The World Really Need Yet Another Username and Password?
As we continue to work on the code that will eventually become stackoverflow, we belatedly realized that we'd be contributing to the glut of username and passwords on the web. I have fifty online logins, and I can't remember any of them! Adding that fifty-first set of stackoverflow.com credentials is unlikely to help matters.
With some urging from my friend Jon Galloway, I decided to take a look at OpenID. OpenID aims to solve the login explosion problem:
OpenID eliminates the need for multiple usernames across different websites, simplifying your online experience.You get to choose the OpenID Provider that best meets your needs and most importantly that you trust. At the same time, your OpenID can stay with you, no matter which Provider you move to. And best of all, the OpenID technology is not proprietary and is completely free.
In the spirit of Show, Don't Tell, here's how it works:
Let's say you're visiting a new website for the first time. As you browse around, eventually you'll do something that requires more than anonymous guest access. So you'll get shunted to the "create a new account" page, in whatever form that takes. I'm sure everyone reading this knows the drill. But if the website is OpenID enabled, you don't have to go through all the typical rigamarole necessary to create a new account. Instead, you can enter your OpenID login:
I'm going to indulge in a bit of hand waving here and assume that you already have an OpenID login. It's not such a terrible stretch, honestly; every AOL and Yahoo user already has an OpenID login even if they don't know it yet.
OpenIDs are technically URLs. Here are a few examples:
- http://claimid.com/yourname
- http://yourname.signon.com
- https://me.yahoo.com/yourname
That's one usability problem with OpenID: you have to remember a relatively complete personal URL that no two OpenID providers define the same way. Which compares unfavorably to, say, remembering your email address. There are shortcuts around this that I'll describe later, but for now, there's ID selector, which provides a reasonably friendly UI for building an OpenID login URL.
If you enter the right URL, you'll get redirected back to your OpenID provider, where you'll enter your single set of login credentials.
You'll be prompted to add this site to your provider's list of "trusted sites" for your account. Once you do this, you can bypass all of these steps the next time you're on the site.
And, finally, you're logged in for the first time!
If that seems like extra work -- and remember, I'm not counting the time it took to set up the initial account at ClaimID, either -- well, I won't lie to you. It is more work. But it's worth noting that:
- The cost of account creation at your OpenID provider can eventually be amortized across dozens of sites which will all accept those same credentials.
- After the first OpenID login at a particular site, assuming you've added that site to your trust list, subsequent logins are literally one-click operations.
It's not exactly frictionless, but it's a heck of an improvement over having to remember 50 different usernames and passwords for 50 different websites, wouldn't you say? I think it compares quite favorably with the current champion of frictionless communication: anonymous comment boxes. They typically have three fields to fill out: username, URL, and email. OpenID requires only one. Your provider can proxy your URL and email back to the blog automatically from your provider profile, if you choose a smart provider with attribute exchange support.
Which brings me to the other problem with OpenID. The quality of your OpenID experience is heavily influenced by the provider you choose. For example, Yahoo! is smart enough to work even if you enter nothing but "yahoo.com" as your OpenID URL. That is, assuming you've enabled OpenID support for your Yahoo! login. Providers can also offer unique functionality that sets them apart, too. For example, SignOn.com allows the use of Information Cards in Windows, so you can log into a website without ever typing in a password! It's a bit of work, as you have to associate the Information Card with your provider account first, but I tried it, and it works as advertised.
My experiments with OpenID were quite positive, but all is not wine and roses in the land of OpenID. Stefan Brands identifies some potentially large problems with OpenID, backed by exhaustive references:
- Phishing. A malicious site could visit the OpenID provider URL you gave it, screen-scrape your login form, and present it locally, intercepting your login and password. However, if you choose a quality OpenID provider, they'll use SSL and a high-grade certificate so you'll have some confidence you're not being fooled. Yahoo also offers anti-phishing image watermarks for OpenID logins, as well.
- Privacy. Your OpenID provider will know, by definition, every site you log into using its credentials. So I hope you trust your provider.
- Centralized Risk. If your OpenID account is compromised, every site you used to access it is also compromised. I'm not sure how much riskier this is than having your email credentials compromised, as many (most?) sites allow you to send a password reset to your email address.
- Lack of Trust. The OpenID providers provide no identity checking whatsoever. It's sort of like those generic "identity cards" you can obtain online, which are pretty useless next to, say, your Driver's License, which was issued by a local governmental authority. What if Fake Steve Jobs created a fake OpenID purporting to be Steve Jobs, or a fake OpenID provider?
- Additional Complexity. Your login now involves two completely different entities: the website you're attempting to gain access to, and your OpenID provider. You have to understand this new relationship to troubleshoot any problems with your login -- and the OpenID provider has to be up and running for you to log in at all.
- Adoption Inequality. It's easy for AOL, Yahoo!, Six Apart, and Technorati to become OpenID providers -- but what good does that do you when there are very few OpenID consumers? As Dare points out, there are no financial incentives to accept credentials from your competitors, but there are certainly plenty of incentives for driving account creation on your own site. For now, I expect OpenID to be driven primarily by small applications and sites that don't have millions of dollars of skin in the game.
As I mentioned above, I feel most of these criticisms can be mitigated by picking a quality, trustworthy OpenID Provider. Particularly one that uses SSL. Since it's an open ecosystem, I'd hope the more reputable and reliable OpenID providers would rise to the top. And consider the advantages: as an application developer, you no longer have to store passwords! That's a huge advantage, because storing passwords is the last business you want to be in. Trust me on this one.
I also found Jan Miksovsky's criticisms of the user experience of OpenID -- as of 6 months ago -- fairly damning:
And all this is for -- what, exactly? To save me from having to pick a user name and password? As annoying as that can be, it's just not that hard! Remembering an arbitrary user name does cause real trouble, but simply allowing email addresses to be used as IDs can solve almost all of that problem. As more and more sites allow email addresses as IDs, the need for OpenID becomes less compelling to a consumer.For the time being, I can't imagine a sane business operator forcing their precious visitors through this gauntlet of user experience issues just for the marginal benefits that accrue to a shared form of ID. I've read numerous claims that all it will take is for someone big like Google to support OpenID to crack this problem open. Unfortunately, there's no business of any size that can afford to direct their traffic down a dead end.
Most service operators will, at best, offer users a choice between using a proprietary ID or an OpenID, creating a terrible economic proposition for a consumer. Faced with the proposition of: 1) struggling once for thirty minutes to struggle through a process they can barely understand, or 2) spending two minutes on every new site breezing through a familiar process they've done countless times before, normal busy people will choose the familiar route time and time again. I'll bet anything that most people will keep going for proprietary IDs, further deferring the network effects possible from OpenID adoption.
Perhaps the most compelling point Jan makes is this one: it is a bit odd to ask users to associate themselves with an arbitrary URL instead of an email address. I definitely saw some rough edges in today's experimentation, but I'd say the user experience has improved since Jan looked at OpenID. That's encouraging.
I realize that OpenID is far from an ideal solution. But right now, the one-login-per-website problem is so bad that I am willing to accept these tradeoffs for a partial worse is better solution. There's absolutely no way I'd put my banking credentials behind an OpenID. But there are also dozens of sites that I don't need anything remotely approaching banking-grade security for, and I use these sites far more often than my bank. The collective pain of remembering all these logins -- and the way my email inbox becomes a de-facto collecting point and security gateway for all of them -- is substantial.
If you're a software developer building an application that requires user accounts, please consider using OpenID rather than polluting the world with yet another login and password. I also encourage you to experiment with OpenID as a user. Create one. Try logging in somewhere with one. If you don't like the experience, or if you agree with one (or more) of the criticisms I listed above, how can we collectively fix it? We desperately need a solution to the login explosion, and right now the only thing I've seen on the horizon that has any kind of critical mass whatsoever is OpenID.
If we can't make OpenID work, at least for run of the mill, low-value credentials that litter the web in increasing numbers -- what hope do we have of ever fixing the login explosion problem?
May 20, 2008
PHP Sucks, But It Doesn't Matter
Here's a list of every function beginning with the letter "A" in the PHP function index:
I remember my first experience with PHP way back in 2001. Despite my questionable pedigree in ASP and Visual Basic, browsing an alphabetical PHP function list was enough to scare me away for years. Somehow, perusing the above list, I don't think things have improved a whole lot since then.
I'm no language elitist, but language design is hard. There's a reason that some of the most famous computer scientists in the world are also language designers. And it's a crying shame none of them ever had the opportunity to work on PHP. From what I've seen of it, PHP isn't so much a language as a random collection of arbitrary stuff, a virtual explosion at the keyword and function factory. Bear in mind this is coming from a guy who was weaned on BASIC, a language that gets about as much respect as Rodney Dangerfield. So I am not unfamiliar with the genre.
Of course, this is old news. How old? Ancient. Internet Explorer 4 old. The internet is overrun with PHP sucks articles -- I practically ran out of browser tabs opening them all. Tim Bray bravely bucked this trend and went with the title On PHP for his entry in the long-running series:
So here's my problem, based on my limited experience with PHP (deploying a couple of free apps to do this and that, and debugging a site for a non-technical friend here and there): all the PHP code I've seen in that experience has been messy, unmaintainable crap. Spaghetti SQL wrapped in spaghetti PHP wrapped in spaghetti HTML, replicated in slightly-varying form in dozens of places.
Tim's article is as good a place to start as any; he captured a flock of related links in the ensuing discussion. As you read, you'll find there's an obvious parallel between the amateurish state of PHP development and Visual Basic 6, a comparison that many developers have independently arrived at.
Every solution I've ever seen or developed in PHP feels clunky and bulky, there is no elegance or grace. Working with PHP is a bit like throwing a 10 pound concrete cube from a ten story building: You'll get where you're going fast, but it's not very elegant. ... I love PHP, and it's the right tool for some jobs. It's just an ugly, cumbersome tool that makes me cry and have nightmares. It's the new VB6 in a C dress.
From my own experience, and the countless of online tutorials and blogs, many PHP developers are guilty of the same crap code VB developers were once renowned for. OO, N-Tier, exception handling, domain modeling, refactoring and unit testing are all foreign concepts in the PHP world.
Understand that as a long time VB developer, I am completely sympathetic to the derision you'll suffer when programming in a wildly popular programming language that isn't considered "professional".
I've written both VB and PHP code, and in my opinion the comparison is grossly unfair to Visual Basic. Does PHP suck? Of course it sucks. Did you read any of the links in Tim's blog entry? It's a galactic supernova of incomprehensibly colossal, mind-bendingly awful suck. If you sit down to program in PHP and have even an ounce of programming talent in your entire body, there's no possible way to draw any other conclusion. It's inescapable.
But I'm also here to tell you that doesn't matter.
The TIOBE community index I linked above? It's written in PHP. Wikipedia, which is likely to be on the first page of anything you search for these days? Written in PHP. Digg, the social bookmarking service so wildly popular that a front page link can crush the beefiest of webservers? Written in PHP. WordPress, arguably the most popular blogging solution available at the moment? Written in PHP. YouTube, the most widely known video sharing site on the internet? Written in PHP. Facebook, the current billion-dollar zombie-poking social networking darling of venture capitalists everywhere? Written in PHP. (Update: While YouTube was originally written in PHP, it migrated to Python fairly early on, per Matt Cutts and Guido van Rossum.)
Notice a pattern here?
Some of the largest sites on the internet -- sites you probably interact with on a daily basis -- are written in PHP. If PHP sucks so profoundly, why is it powering so much of the internet?
The only conclusion I can draw is that building a compelling application is far more important than choice of language. While PHP wouldn't be my choice, and if pressed, I might argue that it should never be the choice for any rational human being sitting in front of a computer, I can't argue with the results.
You've probably heard that sufficiently incompetent coders can write FORTRAN in any language. It's true. But the converse is also true: sufficiently talented coders can write great applications in terrible languages, too. It's a painful lesson, but an important one.
Why fight it? I say learn to embrace it. Join with me, won't you, in celebrating the next fifty years of glorious PHP code driving the internet. Just don't forget to call the maintain_my_will_to_live() PHP function every so often!
May 19, 2008
Twitter: How Not To Crash Responsibly
In yesterday's post on Crashing Responsibly, I outlined a few ways to improve your application's crash behavior. In the event that your application crashes -- and oh, it will -- why not turn that crash into something that:
- Records lots of diagnostic information developers can use to improve the application over time.
- Reassures users and provides them with helpful information.
With that in mind, let's take a look at the Twitter crash page. How does it serve developers and users?
I don't mean to pick on Twitter; their bouts of downtime are near legendary at this point. Frankly, it's been discussed to death.
It's unfortunate, because I love Twitter. Like Michael Lopp, I'm dangerously close to being a Twitter fanboy.
The answer comes down to value. In the time that I've been using Twitter, it's transformed from a curiosity to an essential service. What were seemingly random status updates have now become organized into organic conversational threads that bring a steady flow of relevant content across my desktop.
An "essential service" is exactly the kind of thing you don't want to see error pages on. So, then, how does the Twitter error page fare?
Not so badly at first glance. It's an attractive error page, styled to match Twitter, with some basic links and navigational elements. Let's be generous and assume that the notification and logging of errors behind the scenes is taken care of. The Twitter developers must have access to a voluminous set of error logs by now.
But Twitter's error page is conspicuously lacking any real information. As an enthusiastic Twitter user presented with this error page, I am anything but reassured. Instead, I have some nagging questions:
- Is this an ephemeral, temporary error or some kind of scheduled downtime? How do I tell the difference?
- If this is scheduled downtime, when will it be over? Can I view the maintenance schedule, or the current status of the maintenance work?
- Is Twitter down for everyone, or just me? Is there a place I can go to check Twitter's current system health?
- Twitter has a reputation for unreliability. Where can I find out about Twitter's ongoing efforts to improve their reliability?
There's absolutely no mention of any of these things on the error page, the exact place I would care the most. Clicking through to the blog provides no relief, no mention of any availability work or maintenance schedules.
Furthermore, it's difficult to take the glib claim that "we're going to fix it up and have things back to normal soon" seriously. I've seen so much of the Twitter error page in the last year that I've lost confidence that these errors mean anything to anyone -- or that they're even recorded. This is the static error page that cried wolf. Where's the improvement over time from the collection and analysis of these errors?
I understand that Twitter has scaling problems I can only dream of. I don't envy the amount of work they'll have to undertake to fix this pernicious, systemic problem of massive scale.
But I sure wish they could be a lot more transparent about it.
Isn't that what crashing responsibly is all about -- establishing an honest, open dialog between users and developers, even at the worst possible moment of that relationship?
May 18, 2008
Crash Responsibly
As programmers, it is our responsibility to ensure that when something goes horribly wrong with our software, the user has a reasonable escape plan. It's an issue of fundamental safety in software error handling that I liken to those ubiquitous airline safety cards.
Which one accurately depicts the way your software treats the user in the event of an emergency?
If I've learned anything in the last thirty years, it's that I write shitty software -- with bugs. I not only need to protect my users from my errors, I need to protect myself from my errors, too. That's why the first thing I do on any new project is set up an error handling framework. Errors are inevitable, but ignorance shouldn't be. If you know about the problems, you can fix them and respond to them.
Note that when I say "errors", I don't mean mundane, workaday problems like empty form values, no results, or file not found. Those kinds of errors are covered quite well in 37 Signals' Defensive Design for the Web: How to Improve Error Messages, Help, Forms, and Other Crisis Points.
It's a great book; a quick read with lots of visual do's and don'ts side by side. Despite the giant exclamation point icon on the cover, however, it's mostly about fundamental web usability, not error handling per se.
I'm talking about catastrophic errors -- real disasters. Cases where a previously unknown bug in your code causes the application to crash and burn in spectacular fashion. It happens in all applications, whether they're websites or traditional executables.
The situation is pretty dire at this point, but some disaster recovery is possible, if you plan ahead.
- It is not the user's job to tell you about errors in your software!
If users have to tell you when your app crashes, and why, you have utterly failed your users. I cannot emphasize this enough.
It's bad enough that the user has to use our crashy software; are we really going to add insult to injury by pressing them into service as QA staff, too? If you're relying on users to tell you about problems with your software, you'll only see a tiny fraction of the overall errors. Most users won't bother telling you about problems. They'll just quietly stop using your application.
Whatever error handling solution you choose, it should automatically log everything necessary to troubleshoot the crash -- and ideally send a complete set of diagnostic information back to your server. This is fundamental. If you don't have something like this in place yet, do so immediately.
- Don't expose users to the default screen of death.
It's true that we can't do much to recover from these kinds of crashes, but relying on the underlying operating system or webserver to deliver the generic bad news to the user is rude and thoughtless. Override the default crash screen and provide something customized, something relevant to your application and your users. Here are a few ideas:
- Let users know that it's our fault, not theirs.
- Inform the user that the error was logged and dispatched.
- If possible, suggest some workarounds and troubleshooting options.
- Perhaps even provide direct contact information if they're really stuck and desperately need to get something done.
- Have a detailed public record of your application's errors.
In my experience, nothing motivates a team better than a detailed public record of all crashes. There should of course be a searchable, sortable database of errors somewhere, but active notifications are also a good idea. Crashes are incredibly annoying to your users. It's only fair that the team behind the software share a little of that pain for each crash. You could broadcast an error email, text message, or instant message to everyone on the team. Or maybe have every crash automatically open a bug ticket in your bug tracking software. Tired of dealing with all those error emails and/or bug tickets? Fix the software so you don't have to!
- Leverage the 80/20 rule.
Once you have a comprehensive record of every crash, you can sort that data by frequency and spend your coding effort resolving the most common problems. Microsoft, based on data from their Windows Error Reporting Service, found that fixing 20 percent of the top reported bugs solved 80 percent of customer issues, and fixing 1 percent of the top reported bugs solved 50 percent of customer issues. That's huge! Let the Pareto principle work for you, not against you.
As software professionals, we should protect our users -- and ourselves -- from our mistakes. Crash responsibly!
May 15, 2008
Oh Yeah? Fork You!
In Where Are All The Open Source Billionaires? I used this chart as an illustration:
Because open source code is freely distributable, anyone can take that code and create their own unique mutant mashup version of it any time they feel like it. Whether anyone else in the world will care about their crazy new version of the code is not at all clear, but that's not the point. If someone wants it bad enough, they can create it -- or pay someone else to create it for them. This is known as "forking". It's the very embodiment of freedom zero, and it's an essential part of every open source license.
But there are forks, and there are forks:
What is different about a fork is intent. In a fork, the person(s) creating the fork intend for the fork to replace or compete with the original project they are forking.
That's exactly what happened to the Pidgin project recently.
In their 2.4 release they changed the GUI action of the text field where the user types their IM from a manually re-sizable window, to a fixed size window that auto-re-sizes based on the amount of text typed. On the surface, this sounds like a minor change, but it triggered a massive user revolt! Why?
This is what they're up in arms about:
The developers, for whatever reason, dug in their heels on this one and refused to budge. You can read through some of the commentary on the bug ticket to get an idea, but the general tenor was combatative bordering on hostile. The bug was eventually closed as "won't fix".
The community's response was swift: Oh yeah? Fork you!
Funpidgin is a fork of the popular open source client Pidgin which allows instant messaging with over twenty different protocols.What makes us different from the official client is that we work for you. Unlike the Pidgin developers, we believe the user should have the final say in what goes into the program.
So far five new features have been added to Funpidgin upon requests from users, and all of them are optional. It is these options that make the use of Funpidgin enjoyable to a diverse range of people.
Funpidgin is a fork in the truest sense; the developers intend to replace Pidgin. But will it? Who knows. There are four possible outcomes from any fork:
- The fork dies
Funpidgin languishes due to lack of attention from developers and users. Funpidgin eventually dies. - The fork merges
Funpidgin and Pidgin reach a consensus. The Funpidgin changes are folded back into Pidgin. - The original dies
Funpidgin becomes so popular that it draws developers and users away from Pidgin. Pidgin eventually dies. - Both original and fork survive
Funpidgin and Pidgin both succeed on their own terms, perhaps by attracting different audiences or meeting different user needs.
You can find examples of all four outcomes peppered throughout the history of open source software. You might think that the adoption of open source software licenses would lead to dozens if not hundreds of incompatible, slightly-different versions of the same stuff -- bewildering users and developers alike. I'm not so sure. There's a tremendous amount of inertia around the open source projects that survive long enough to become popular. Consider the challenges the newly forked Funpidgin project now faces:
- A divided community of users and developers.
- Siphoning enough energy and attention away from an established project to remain viable.
- Differentiating themselves enough from Pidgin so that they aren't viewed as useless or irrelevant.
- The original Pidgin project is free to take whatever parts of the Funpidgin open source code they deem appropriate and fold that into Pidgin, thus undermining the fork.
Forking is incredibly difficult to pull off. It is a painful, but necessary part of the evolution of open source software. Just as in real evolution, I suspect that most forks die in vast, nameless numbers, before they become strong enough to engender any forked progeny of their own. Forking is the absolute bedrock of open source software -- but it is also not a path to be chosen lightly.
May 13, 2008
Is HTML a Humane Markup Language?
One of the things we're thinking about while building stackoverflow.com is how to let users style the questions and answers they're entering on the site. Nothing's decided at this point, but we definitely won't be giving users one of those friendly-but-irritating HTML GUI browser layout controls.
I have one iron-clad design guide: this is a site for programmers, so they should be comfortable with basic markup. None of that nancy-boy GUI toolbar handholding nonsense for us, thankyouverymuch. If you can sling code, a little bit of presentation markup is child's play.
We will support some sort of markup language to style the questions and answers. But what markup language?
I mentioned in podcast #4 that we consider Wikipedia a defining influence. Let's see how Wikipedia handles markup syntax. This is what the edit page for Joel Spolsky's Wikipedia entry looks like:
It's an effective markup language, but I think you'll agree that it's more intimidating than humane. Wikipedia's How to Edit a Page and the accompanying Wikipedia syntax cheatsheet helps. Some. I'd argue that writing a Wikipedia entry is a step beyond mere presentational markup; it's almost like coding, as you weave the article into the Wikipedia gestalt. (Incidentally, if you haven't ever edited a Wikipedia article, you should. I consider it a rite of passage, a sort of internet merit badge for anyone who is serious about their online presence.)
Let's consider a simpler example. What we're looking for is some kind of middle ground, a humane text format. Let's start with some basic HTML.
Lightweight Markup LanguagesAccording to Wikipedia: A lightweight markup language is a markup language with a simple syntax, designed to be easy for a human to enter with a simple text editor, and easy to read in its raw form. Some examples are:
Markup should also extend to code: 10 PRINT "I ROCK AT BASIC!" 20 GOTO 10 |
Here's what that looks like expressed in a variety of lightweight markup languages. Bear in mind that each of these will produce HTML equivalent to the above.
| Textile | Markdown |
h1. Lightweight Markup Languages According to *Wikipedia*: bq. A "lightweight markup language":http://is.gd/gns is a markup language with a simple syntax, designed to be easy for a human to enter with a simple text editor, and easy to read in its raw form. Some examples are: * Markdown * Textile * BBCode * Wikipedia Markup should also extend to _code_: pre. 10 PRINT "I ROCK AT BASIC!" 20 GOTO 10 |
Lightweight Markup Languages
============================
According to **Wikipedia**:
> A [lightweight markup language](http://is.gd/gns)
is a markup language with a simple syntax, designed
to be easy for a human to enter with a simple text
editor, and easy to read in its raw form.
Some examples are:
* Markdown
* Textile
* BBCode
* Wikipedia
Markup should also extend to _code_:
10 PRINT "I ROCK AT BASIC!"
20 GOTO 10
|
| Wikipedia | BBCode |
==Lightweight Markup Languages== According to '''Wikipedia''': :A [[lightweight markup language]] is a markup language with a simple syntax, designed to be easy for a human to enter with a simple text editor, and easy to read in its raw form. Some examples are: * Markdown * Textile * BBCode * Wikipedia Markup should also extend to ''code'': <source lang=qbasic> 10 PRINT "I ROCK AT BASIC!" 20 GOTO 10 </source> |
[size=150]Lightweight Markup Languages[/size] According to [b]Wikipedia[/b]: [quote] A [url=http://is.gd/gns]lightweight markup language[/url] is a markup language with a simple syntax, designed to be easy for a human to enter with a simple text editor, and easy to read in its raw form. [/quote] Some examples are: [list] [*]Markdown [*]Textile [*]BBCode [*]Wikipedia [/list] Markup should also extend to [i]code[/i]: [code] 10 PRINT "I ROCK AT BASIC!" 20 GOTO 10 [/code] |
None of these lightweight markup languages are particularly difficult to understand -- and they're easy on the eyes, as promised. But I still had to look up the reference syntax for each one and map it to the HTML that I already know by heart. I also found them disturbingly close to "magic" for some of the formatting rules, to the point that I wished I could just write literal HTML and get exactly what I want without guessing how the parser is going to interpret my fake-plain-text.
Which leads directly to this question: why not just stick with what we already know and use HTML? This c2 wiki page titled Why Doesn't Wiki Do HTML? makes the case that -- at least for Wiki content -- you're better off leaving HTML behind:
- In a Wiki, the emphasis is on content, not presentation. Simple Wiki markup rules let people focus on expressing their ideas.
- Why not use a domain-specific markup language designed to do "the simplest thing that could possibly work"?
- Some HTML tags are difficult to work with and can break the flow of your thoughts. The table tag, for example.
- Does the average user really need total HTML and CSS layout power?
- Allowing the full range of HTML tags can lead to major security vulnerabilities.
- Many people don't know HTML. A simple Wiki markup language is easier to learn.
I'm not sure I agree with all of this, but it can make sense in the context of a full-blown Wiki. It's worth considering.
After all this research on humane markup languages, much to my chagrin, I've come full circle. I now no longer think humane markup languages make sense for most uses. I agree with the guy at fileformat.info -- HTML is generally the better choice:
- Simplicity
If the source and destination are the web, why not use the native markup language of the web?
- Readability
HTML is a bit less readable than the lightweight markup languages, it's true. But basic HTML is not onerous to read, particularly if we hide the repetitive paragraph tags.
- Security
With a bit of careful coding, it is possible to whitelist specific HTML tags that you will allow. This way you avoid exposing yourself to risky/vulnerable tags.
- Conversion
It's not at all clear that any existing lightweight markup language has critical mass, with the possible exception of Wikipedia's flavor. On the other hand, text parsers and tools will always understand HTML.
- What people know
A lot more people know HTML than any given flavor of humane text. If you're a programmer, you damn well better know HTML. For the handful of wiki-like functions we may need, it's possible to add some optional attributes to the HTML tags. And wouldn't that be easier to learn than some weird, pseudo-ASCII derivation of HTML?
I do think we'll adopt some of the cleverer functions of Textile and Markdown, insofar as they remove mundane HTML markup scutwork. But in general, I'd much rather rely on a subset of trusty old HTML than expend brain cells trying to remember the fake-HTML way to make something bold, or create a hyperlink. HTML isn't perfect, but it's an eminently reasonable humane markup language.

