I <3 Steve McConnell*
Coding Horror
programming and human factors
by Jeff Atwood

May 13, 2008

Is HTML a Humane Markup Language?

One of the things we're thinking about while building stackoverflow.com is how to let users style the questions and answers they're entering on the site. Nothing's decided at this point, but we definitely won't be giving users one of those friendly-but-irritating HTML GUI browser layout controls.

an example HTML GUI editor

I have one iron-clad design guide: this is a site for programmers, so they should be comfortable with basic markup. None of that nancy-boy GUI toolbar handholding nonsense for us, thankyouverymuch. If you can sling code, a little bit of presentation markup is child's play.

We will support some sort of markup language to style the questions and answers. But what markup language?

I mentioned in podcast #4 that we consider Wikipedia a defining influence. Let's see how Wikipedia handles markup syntax. This is what the edit page for Joel Spolsky's Wikipedia entry looks like:

Wikipedia Edit page for Joel Spolsky entry

It's an effective markup language, but I think you'll agree that it's more intimidating than humane. Wikipedia's How to Edit a Page and the accompanying Wikipedia syntax cheatsheet helps. Some. I'd argue that writing a Wikipedia entry is a step beyond mere presentational markup; it's almost like coding, as you weave the article into the Wikipedia gestalt. (Incidentally, if you haven't ever edited a Wikipedia article, you should. I consider it a rite of passage, a sort of internet merit badge for anyone who is serious about their online presence.)

Let's consider a simpler example. What we're looking for is some kind of middle ground, a humane text format. Let's start with some basic HTML.

Lightweight Markup Languages

According to Wikipedia:

A lightweight markup language is a markup language with a simple syntax, designed to be easy for a human to enter with a simple text editor, and easy to read in its raw form.

Some examples are:

  • Markdown
  • Textile
  • BBCode
  • Wikipedia

Markup should also extend to code:

10 PRINT "I ROCK AT BASIC!"
20 GOTO 10

Here's what that looks like expressed in a variety of lightweight markup languages. Bear in mind that each of these will produce HTML equivalent to the above.

Textile Markdown
h1. Lightweight Markup Languages

According to *Wikipedia*:

bq. A "lightweight markup language":http://is.gd/gns
is a markup language with a simple syntax, designed 
to be easy for a human to enter with a simple text 
editor, and easy to read in its raw form. 

Some examples are:

* Markdown
* Textile
* BBCode
* Wikipedia

Markup should also extend to _code_: 

pre. 10 PRINT "I ROCK AT BASIC!"
20 GOTO 10
Lightweight Markup Languages
============================

According to **Wikipedia**:

> A [lightweight markup language](http://is.gd/gns)
is a markup language with a simple syntax, designed 
to be easy for a human to enter with a simple text 
editor, and easy to read in its raw form. 

Some examples are:

* Markdown
* Textile
* BBCode
* Wikipedia

Markup should also extend to _code_: 

    10 PRINT "I ROCK AT BASIC!"
    20 GOTO 10
Wikipedia BBCode
==Lightweight Markup Languages==

According to '''Wikipedia''':

:A [[lightweight markup language]]
is a markup language with a simple syntax, designed 
to be easy for a human to enter with a simple text 
editor, and easy to read in its raw form. 

Some examples are:

* Markdown
* Textile
* BBCode
* Wikipedia

Markup should also extend to ''code'': 

<source lang=qbasic>
10 PRINT "I ROCK AT BASIC!"
20 GOTO 10
</source>
[size=150]Lightweight Markup Languages[/size]

According to [b]Wikipedia[/b]:

[quote]
A [url=http://is.gd/gns]lightweight markup language[/url]
is a markup language with a simple syntax, designed 
to be easy for a human to enter with a simple text 
editor, and easy to read in its raw form. 
[/quote]

Some examples are:

[list]
[*]Markdown
[*]Textile
[*]BBCode
[*]Wikipedia
[/list]

Markup should also extend to [i]code[/i]: 

[code]
10 PRINT "I ROCK AT BASIC!"
20 GOTO 10
[/code]

None of these lightweight markup languages are particularly difficult to understand -- and they're easy on the eyes, as promised. But I still had to look up the reference syntax for each one and map it to the HTML that I already know by heart. I also found them disturbingly close to "magic" for some of the formatting rules, to the point that I wished I could just write literal HTML and get exactly what I want without guessing how the parser is going to interpret my fake-plain-text.

Which leads directly to this question: why not just stick with what we already know and use HTML? This c2 wiki page titled Why Doesn't Wiki Do HTML? makes the case that -- at least for Wiki content -- you're better off leaving HTML behind:

  1. In a Wiki, the emphasis is on content, not presentation. Simple Wiki markup rules let people focus on expressing their ideas.
  2. Why not use a domain-specific markup language designed to do "the simplest thing that could possibly work"?
  3. Some HTML tags are difficult to work with and can break the flow of your thoughts. The table tag, for example.
  4. Does the average user really need total HTML and CSS layout power?
  5. Allowing the full range of HTML tags can lead to major security vulnerabilities.
  6. Many people don't know HTML. A simple Wiki markup language is easier to learn.

I'm not sure I agree with all of this, but it can make sense in the context of a full-blown Wiki. It's worth considering.

After all this research on humane markup languages, much to my chagrin, I've come full circle. I now no longer think humane markup languages make sense for most uses. I agree with the guy at fileformat.info -- HTML is generally the better choice:

  • Simplicity

    If the source and destination are the web, why not use the native markup language of the web?

  • Readability

    HTML is a bit less readable than the lightweight markup languages, it's true. But basic HTML is not onerous to read, particularly if we hide the repetitive paragraph tags.

  • Security

    With a bit of careful coding, it is possible to whitelist specific HTML tags that you will allow. This way you avoid exposing yourself to risky/vulnerable tags.

  • Conversion

    It's not at all clear that any existing lightweight markup language has critical mass, with the possible exception of Wikipedia's flavor. On the other hand, text parsers and tools will always understand HTML.

  • What people know

    A lot more people know HTML than any given flavor of humane text. If you're a programmer, you damn well better know HTML. For the handful of wiki-like functions we may need, it's possible to add some optional attributes to the HTML tags. And wouldn't that be easier to learn than some weird, pseudo-ASCII derivation of HTML?

I do think we'll adopt some of the cleverer functions of Textile and Markdown, insofar as they remove mundane HTML markup scutwork. But in general, I'd much rather rely on a subset of trusty old HTML than expend brain cells trying to remember the fake-HTML way to make something bold, or create a hyperlink. HTML isn't perfect, but it's an eminently reasonable humane markup language.

[advertisement] Don't denormalize your data just to write reports! Data Dynamics Reports can use your existing data relationships when creating reports.

Posted by Jeff Atwood    View blog reactions

 

« Cleaning Your Display and Keyboard Oh Yeah? Fork You! »

 

Comments

The official implementation of Markdown supports HTML in the input, so you can use Markdown, and your users will still be able to use HTML if they want to.

Peter Hosey on May 14, 2008 05:05 AM

Isn't the textile language just sort of troff lite? We can leave troff in the horrible bad old days where it belongs, please.

reed on May 14, 2008 05:09 AM

The biggest feature I can see in wikipedia that would seem to be missing in basic HTML is the automatic cross referencing functionality. A user shouldn't have to look up the URL to type [a href="http://en.wikipedia.org/Markup_Languages#Light_Weight"] when the server can figure it out for them from [[lightweight markup language]].

I guess you'll be adding some special syntax to html for those sorts of issues?

Mike on May 14, 2008 05:09 AM

We opted for Markdown in our CMS, because clients in combination with visual editors invariably screwed things up horribly. Although the output would be well-formed, it was inevitably nonsense, and it was far too easy to copy and paste the wrong bits of formatting from Word or somesuch (and lo, if we disabled that bit of functionality, there'd be complaints that they could no longer copy and paste other bits of formatting from Word).

Markdown has a double-pronged advantage for us:

1. It's simple for clients to learn how to mark stuff up properly. Because they have to think at least a tiny bit about the separation between content and formatting, it's easier for them to retrospectively tweak the markup to match what the content's supposed to convey as opposed to what Word made it look like

2. We can stick raw HTML into posts where a client's asked us to do something more complicated than they can manage themselves—Markdown's smart enough to leave the HTML as-is. Our clients, not being programmers, aren't likely to ever put in HTML themselves (and are aware that if they do, they stand a greater risk of screwing up their pages and so caveat emptor).

Works well for us.

Mo on May 14, 2008 05:10 AM

Html is harder to learn then the others when it comes to people without any experience. It has tags and attributes, which can be hard to wrap your mind around. These Lightweight types are easier to use for a beginner. Wikipedia is not a wiki for developers, it is for users who have never made a website before.

Think about your audience, if it's developers, they would be able to use html and have no problems with that. Although, they might then be able to interfere with your site code, which can be quite damaging. Leaving an <a> tag open, <table> open, javascript(!) etc.

Though, I have to say that it is a lot easer to do simple styling like bold and italic in bb code than html (specially if you are to make it xhtml strict valid)

Thomas Winsnes on May 14, 2008 05:12 AM

I really think you ought to provide a simple wysiwyg editor, with the ability to edit code by hand. There's plenty of free, cross-browser applications available that you simply need to drop in and tell it what tags to allow.

Why make people do the markup by hand just because they can? That's like making a user edit a config file instead of providing them options within the program, just because they can.

I'm glad you at least went with HTML though, so no need to learn a new markup syntax. Especially with the completely _unintuitive_ underscore to mean italics. I can't think of a worse choice. I mean, there's the slash which is s/anted like italics, or the underscore, which looks much like an underline. Ugh. =)

Sammy Larbi on May 14, 2008 05:12 AM

At Pendant's corner over here I have noticed than none of the markup examples would produce the HTML above. Replace "Some examples" with "A Few examples" at first glance.

Pedant on May 14, 2008 05:13 AM

Use ReST!

Calvin Spealman on May 14, 2008 05:15 AM

We had the same discussion when we developed a wiki-like interface in our application.

It seemed that Markdown was easier for users to understand than Textile after initial tests.

I would not go the HTML way since it allows users to break any semantic value you could find in their entry.

I would neither create my own language based on both Markdown and Textile, since users, especially blog users, are very used to one of them. You would just create confusion and mistakes.

Vincent on May 14, 2008 05:15 AM

Amen. I curse every second that I have to think about (or, God forbid, actually look up) the correct markup to link something or make whitespace non-wrapping or whatever. I already know HTML. You already know HTML. And you, over there, who doesn't already know HTML: the time you spend learning the tiny subset of HTML that you need to post a comment to a web site will be much more worthwhile than spending that same time learning one of the umpteen subtly different "lightweight" markup systems out there.

John on May 14, 2008 05:17 AM

Technically, wiki syntax should be
"[http://is.gd/gns lightweight markup language]", not [[lightweight markup language]]]. But that's because I'm anal. :]

I personally disagree with you, html markup, while easy to understand for us coders, is quite harder to type than Textile or Wikimarkup. (and less pleasing to the eye, imho)

lucasbfr on May 14, 2008 05:22 AM

Isn't the point to be able to let people express their ideas quickly and easily? Why not let us use the GUI editor buttons, it's not like we're trying to prove our l33tne55 to anyone; we just want to push text into the computer efficiently.

Failing that, BBCode since it's simple and doesn't create tons of visual clutter - we're writing human readable text with markup, not Perl ;) - unless we're writing about Perl, of course.

Whatver you choose, please let it handle code in a sane way - what I mean is a little scrolling box with the code in rather than a five-screen scrolling mess, not decorated with line numbers, and in a format that can be easily copy and pasted (so no random blank lines or loss of indenting). Oh, it should handle non-wrapping lines of code correctly too without destroying your page template or making the browser have a horizontal scrollbar.

James on May 14, 2008 05:25 AM

I've recently been mulling over this very subject, because my company uses a CMS with a *horrible* GUI that completely mangles input, produces invalid markup, etc.

Markdown certainly looks the 'easiest' to learn, although I'm suspecting there's a lot more to it than presented here (off to research later ...)

In my experience, though, however simple the HTML subset, and however much training you give re: elements, attributes, valid nesting, etc., people will always struggle with that most fundamental of beasts: the humble hyperlink. Let's be fair, to a non-developer, a URL is a pretty complex string of syntax. And editors simply resort to copy+pasting. If I were adressing a 'low-tech' audience, I'd seriously consider one of:

a) Denying any out-bound links + clever wiki-style auto-linking
b) Auto-linking allowing out-bound links via a search engine (or similar)
c) Robsust URL parsing looking for obvious errors

bobby on May 14, 2008 05:30 AM

+1 Markdown...

It allows HTML and does a very nice job of very easy to use formats...

Jake Good on May 14, 2008 05:30 AM

I'd like HTML (and as a result Markdown is good too).
It would be nice to do some slightly pretty with code snippets though.

Des Traynor on May 14, 2008 05:33 AM

Why invent something new when there are so many reasonable choices? I agree with those of you that say that an HTML posting syntax would be ideal. If that is not possible for security reasons, please don't invent something new. Let me leverage the time spent learning textile or markdown or whatever existing markup technology you decide to use. My time is valuable and I'd rather spend it conveying a message rather than learning a new way to format a message.

Jay on May 14, 2008 05:34 AM

addressing those wiki points -

1. HTML focuses on content, not presentation - semantic html let people focus on (or sometimes even gain deeper understanding of the format of) their ideas.

2. Why use domain-specific markup when you've already got global markup that serves all your needs?

3. Tables aren't any less difficult to understand then the puzzling mixture of dashes, asterisks, and brackets that wikis employ

4. No - they don't need it, and you don't have to give it to them.

5. Only if you leave yourself open to it... "we're too lazy/busy to address security concerns" is not a good reason.

6. What makes Wiki markup easier to learn then HTML? Why would you learn a new markup language, which will just get converted back to HTML again? Isn't that a little redundant? If people need to learn a markup language, why not learn the one that is universally used in every page on the web?

... I'm not a big fan of wiki markup either - bbs tags are only marginally better.

matt on May 14, 2008 05:36 AM

Just a thing: a way to get code coloration is, I think, necessary. Seriously.

Also, bbcode blows (and 9 out of 10 bbcode parsers are purely regex-based translators, thus break down real fast), thanks for not using it.

Masklinn on May 14, 2008 05:36 AM

I generally agree that a subset of HTML is fine for formatting. If all you want to do is have lists and paragraphs and bold and italics, it's exactly as clear as almost any other markup language. If you're willing to automatically add p tags on double newlines, most people can muddle through without touching it at all.

However, Mediawiki is a special case, in that the html tags don't actually fully represent what most of the corresponding markup means. As you say, it represents the structure of the data, and the structure you give the data using wiki markup has side effects beyond the formatting you'd get from basic html.

For example, take the 'triple equal sign' - on first glance, it's just an h3. That doesn't tell the whole story, though - there's some deeper meaning to that tag. Not only does it do your h3 formatting, but it also generates a named anchor, and it automatically appends a link to it in the table of contents. It does have the same logical meaning as h3, but it does more - h3 is a subset of triple equal. You could of course impart that power upon h3, but I'd argue that's even more confusing than having a separate syntax.

This doesn't even touch on the templating language or the category system, both of which have no equivalent in html. So with mediawiki, you *know* html won't meet all of your needs - so coming up with a language that does allow for everything only makes sense.

Jeremy T on May 14, 2008 05:39 AM

I usually love your posts... but this is exactly the kind of attitude which stops developers from making good UI imo. You've obviously thought about this a lot, but you immediately ruled out all of the best approaches by making a big assumption about your target audience.

Just because you expect every good programmer to be comfortable with markup doesn't make it so... and as you often remind us, there are plenty of bad programmers out there.

Maybe I got it a bit wrong... but I don't think you should expect your users to understand your markup, or even HTML. Showing the markup and allowing the user to edit it is fine, (ala wiki) but not implementing nice buttons and an interface... you shouldn't demand anything of users that isn't necessary imo.

I'll take it all back if you plan on having the buttons as well... but thats not how the post came across. :)

Jheriko on May 14, 2008 05:39 AM

Jeff, glad to hear you've settled on HTML for the input method. As you say, we developers already (should!) know HTML.

My only issue with use of HTML versus lightweight markup is the few extra characters needed to type out HTML tags, as opposed to the comparatively fewer characters needed to do formatting in the lightweight markup languages. But that's just one more reason that developers should all know how to touch type, right?

It seems like several other commenters are advocating Markdown... one option to accommodate these folks might be to give users an option to have their posts parsed for HTML, for Markdown syntax (or whatever lightweight markup language you choose), or both. Post entry in forums based on the UBB.threads (http://www.ubbcentral.com/) package does this, for example.

Jon Schneider on May 14, 2008 05:43 AM

As much as the bugs in Blogger annoy me, the one thing they do right is to allow the user to go to source and edit the HTML. For the users that don’t understand markup, they have a WYSIWYG editor.

Reinventing a markup language is the wrong approach. I've been creating HTLM pages since 1994, and every time I edit a Wikipedia page I roll my eyes because I still have to look-up that URL syntax. I agree with Calvin, use REST if you want to implement "automatic cross referencing functionality" but remember, that is a server-side function, not a mark-up issue.

One recommendation, I would create a White List of HTML you will support. This way you don't have to try to manage a Black List of restricted tags.

Josh Hurley on May 14, 2008 05:43 AM

It seems to me that an assumption is being made that all developers know how to code in HTML. As a desktop developer I rarely, if ever, have to touch web code and hence will have to invest time and effort into learning a whole new 'syntax' if I am expected to format my posts correctly.

While I understand that there will always be a need to have some kind of markup I cannot see the reasoning behind forcing us to hand craft it. If I am popping onto the site to post a question (or indeed an answer) then I probably have that problem space loaded up in my brain. Having to interrupt that and find out the correct way to markup my post seems like a sure fire way of reducing the integrity of that post.

I can see no logical reasoning why you feel the need to forgo a simple GUI driven text box, that requires minimal thought while using, in favour of forcing us to learn whatever you choose to be 'my way is the best way'.

That just smells of elitism.

One of the books on your reading list says it all - "Don't Make Me Think: A Common Sense Approach to Web Usability".

Martin Wallace on May 14, 2008 05:46 AM

Take a page from MS' book. Code up some intellisense for whatever markup you use. Even if markup isn't intuitive you can hint the user to where they need to be...

Add some syntax highlighting and users won't know they aren't using something they already know...

JPunyon on May 14, 2008 05:47 AM

I've always liked 37signals' solution - give them just a few whitelisted tags for bold, italic, links, quotes. Forget about attributes. Keeps it pretty clean, and they can explain it in a sentence.

Evan on May 14, 2008 05:48 AM

In the spirit of various other articles on this very blog, wouldn't the correct answer be to allow both html _and_ simpler markup?

I know HTML inside out, but given the choice I'd rather write in textile whenever possible.

Jack on May 14, 2008 05:52 AM

"this is a site for programmers, so they should be comfortable with basic markup."

Well, at you put your prejudices involving what a 'real programmer' is right there where everyone can see it.

Embedded programmer who's enver marked up with anything other than a highlighter pen on May 14, 2008 05:54 AM

Isn't the point of software to make things easier? Just because as a programmer, I can code in HTML doesn't mean I want to code in HTML just to write a comment. WYSIWYG is not a bad word.

Although, for a developer site there is a very limited set of markup needed:

* Plain format (the default)
* Lists (ordered/unordered)
* Bold/Italic/underline
* Hyperlinks
* Sourcecode (the biggie for a programming site)

One nice thing would be color formatting sourcecode automatically based on language.

Jeff Cuscutis on May 14, 2008 05:55 AM

If I were You I'd definately go for plain old html but whitelist only a bunxh of tags and limited attributes per tag, and ofcourse validate properly before accepting anything. I've done it before and it's rather simple.

Yet I strongly disagree with the whole "If you're a programmer, you damn well better know HTML" thing; I might be comfortable programming for the web but that doesnt mean all programmers are. I still know some that only do specific FoxPro based stuff, or even just a small subset of c on embedded devices. These people are definately programmers, but don't have any reason to know anything about HTML.

Therefor, and because I am not you, I'd still go for a nicer custom wysiwyg editor to generate nice lean and valid HTML with a code button for advanced users perhaps. I have this love for the KISS principle, but I realize it applies to my users' point of view, not to mine.

Kris

kris on May 14, 2008 05:56 AM

I find the simple formatting^ offered by the Australian broadband forum Whirlpool to be quite good. It isn't as full featured as many others but in most instances, its more than enough - they conveniently allow you to enter in raw HTML if that suites your purposes better as well.

^ http://whirlpool.net.au/wiki/?tag=whirlcode

Al on May 14, 2008 05:57 AM

I'll note that the blog Making Light uses a subset of html for comment markup, including urls, and the users there seem to have no trouble figuring it out. The users are typically science fiction nerds, but at least 2/3rds do not come from technical backgrounds, but when sufficiently motivated can even figure out html, with prompting.

I'll append a section of prompting, but I had to mangle the angle brackets to abide by the "no HTML" rule you have for comments. (Irony!)

HTML Tags:
[strong>Strong[/strong> = Strong
[em>Emphasized[/em> = Emphasized
[a href="http://www.url.com">Linked text[/a> = Linked text

Spelling reference:
Tolkien. Minuscule. Gandhi. Millennium. Delany. Embarrassment. Publishers Weekly. Occurrence. Asimov. Weird. Connoisseur. Accommodate. Hierarchy. Deity. Etiquette. Pharaoh. Teresa. Its. Macdonald. Nielsen Hayden. It's. Fluorosphere. More here.

sled reference on May 14, 2008 05:58 AM

Have you tried out markItUp:

http://markitup.jaysalvat.com/home/

It's an excellent javascript utitity that puts a friendlier face on the standard textarea. I use the HTML version to allow my users to enter snippets of XHTML, but it also works with MarkDown, Textile, etc.

I also wanted control over what tags and attributes the users are allowed to enter, so I wrote an extension that validates the XHTML by parsing it against a list of valid tags/attributes (defined in JSON).

Ben Mills on May 14, 2008 05:58 AM

Good for you. I'm sick to death of being coaxed into joining some new forum or social networking site or wiki and finding out that I have to learn a new, totally arbitrary set of rules that are kinda sorta like the ones I already know but not quite, and always getting mixed up between the 8 different markup styles.

On the other hand, I don't think it would hurt you to have a WYSIWYG editor - some developers may know HTML but are very slow typists and having to type HTML would slow them down significantly. Just make sure you leave the option for people to use a normal textarea that's not horribly mangled (Community Server, I'm looking at you).

Aaron G on May 14, 2008 05:58 AM

I'm wondering if you have put any though into automatically supporting syntax highlighting for code snippets? I've used several sites that do this (forums.devshed.com, for example), and while it's rarely perfect it can really help when reading through posted questions.

Joel Coehoorn on May 14, 2008 06:00 AM

Doesn't the conclusion here go against the reasoning in the "XML: The Angle Bracket Tax" post a couple of days ago? I know you can get away without all the closing tags in HTML, so it is slightly better than XML, but to willfully twist your own words:

1. Should HTML be the default choice? The authors of most styled text entry code developed that would probably say NO to this.
2. Is HTML the simplest possible thing that can work for your intended use? NO.
3. Do you know what the HTML alternatives are? YES
4. Wouldn't it be nice to have easily readable, understandable posts, without all those sharp, pointy angle brackets jabbing you directly in your ever-lovin' eyeballs? Ummm, Yes?

As pointed out above, whatever you do it isn't going to actually be HTML. You're going to have to add your own stuff to it and limit it in some ways. I admit my HTML knowledge is basic, but I've no idea how to enter some syntax highlighted javascript in HTML, but i can manage it in mediawiki syntax.

Chris on May 14, 2008 06:02 AM

Your link to "why doesn't wiki do HTML" is broken. It should be

http://c2.com/cgi/wiki?WhyDoesntWikiDoHtml

Dave on May 14, 2008 06:02 AM

I vote for html format with a WYSIWYG editor, such as one of these:

http://www.geniisoft.com/showcase.nsf/WebEditors

Just because I know html, doesn't mean I always want to type it or any other markup to enter a comment. All of these editors I have looked at supply a Design (lazy) mode and a raw html mode.

Michael Lang on May 14, 2008 06:02 AM

@Jheriko: seriously?! Any *competent* (not "good") programmer must be comfortable with markup. Otherwise it's not, *by definition*, competent programmer.

As for bad programmers out there, they're not stackoverflow.com's target audience and the more of them runs away screaming, the better -- their input equals noise and degrade the value of the site for people who actually do have something useful to say

Peter on May 14, 2008 06:07 AM

You could be different and split the difference. Add a few tags to html, like a <markdown> tag or a tag for the other commonly used internet formatting options, with everything untagged defaulting to a whitespace sensitive version of html (so people don't have to type paragraph tags).

It requires some additional processing work, and that's never fun, but it seems trivial to me to implement and is adaptable to different formats in the future.

Ben on May 14, 2008 06:08 AM

So you are going to use plain-old HTML!

What about new-lines? Are we going to have to type "BR" all over the place?

Also... people are going to want to post HTML code *AS CODE*, without having to type all those escape sequences just to post some example of a DIV that isn't working or whatever.

Of course, all these things COULD be done - we ARE programmers!

But will anyone bother? Or will they just go somewhere else?

Finally, you may be making a mistake in saying that it is "secure, with careful parsing" - this sounds like pride coming before a fall to me!

Syd on May 14, 2008 06:10 AM

An additional consideration to my above thoughts occured to me. You could add pre-processors to tags for each language <c++>, etc, and they could make an attempt to apply proper indentation and code highlighting that would be more versatile than a language agnostic version.

Ben on May 14, 2008 06:11 AM

The one thing I know that argues against using actual HTML for post styling is if you want people to be able to post markup or even code-- if someone posts an example containing a 'for' loop, the angle brackets can cause all manner of weirdness. And if they try to post some sample HTML, then look out Francis.

So you either have to define a markup pattern that passes through untouched whatever's inside ('code' is a common choice) or else move to something like Textile or Markdown and set it up to encode stuff like angle brackets so it passes through untouched.

Eric Meyer on May 14, 2008 06:11 AM

I really think you should just use (or build) a good WYSIWYG editor. As a coder I can write in assembly language, but it doesn't mean I want to.

If you want to offer the ability to alter the raw markup then you can give users that option, but I want my editor fully cooked please...

Chris E on May 14, 2008 06:12 AM

I agree with all those that just want to use HTML. It fits the target audience and likely usage model:

Audience = programmers
Usage = occasionally

It's not like I will be on stackoverflow.com for hours on end everyday trying to write programs. There's no need to learn anything extra or new.

Solburn on May 14, 2008 06:16 AM

@Jeff:
> "I'd much rather rely on a subset of trusty old HTML than expend brain cells trying to remember the fake-HTML way to make something bold"

Ironically enough, there isn't a way to make something bold* in (modern semantic) HTML since, as you pointed out in a previous blog entry, HTML is the 'model', not the view :)

* The <b> tag was officially "discouraged" way back in 1999 with HTML4.
The current HTML5 working draft doesn't go as far as deprecating it (yet), but it does say "The b element should be used as a last resort".

Graham Stewart on May 14, 2008 06:17 AM

"Any *competent* (not "good") programmer must be comfortable with markup. Otherwise it's not, *by definition*, competent programmer." - Peter, about five comments ago.

Brilliant, just absolutely brilliant. You couldn't make this kind of ignorant comment up. Reminds me of the Java coders who can't believe there's still a place for C in the world ("It's, like, over a decade old, man! Move on!" - Java School grads, everywhere).

Embedded programmer who's never marked up with anything other than a highlighter pen on May 14, 2008 06:18 AM

Try to make a plain jane table.

In HTML it's sane. In light-weight markup lingo, it sucks; they've tried to reduce the tr and td tags even more; Wiki makes it completely incomprehensible.

The "rich" text editors are rich, but not robust. To write HTML fast, you have tag completion and a suggestion system for the values. RTEs fail on this point.

I find it immensely more pleasing to just get the raw HTML, dump it in my text editor of choice, and copy + paste it back again.

Rob Janssen on May 14, 2008 06:21 AM

Seeing as how most of the potential users of Stack Overflow currently reside at forums, it might make sense to cater to them and use whatever markup language is most prevalent across the inter-tubes. In my travels I've found the most widely used markup is BBCode or HTML.

BBCode is easy to use and offers the potential to add custom tags to allow special functionality. I think it's your best bet.

Matt Briancon on May 14, 2008 06:24 AM

<CleverEndQuote>Again.</CleverEndQuote>

Adam on May 14, 2008 06:24 AM

And you'd be building that in VB.NET Jeff?

Or have you jumped ship to C# now too? Is all that "love" you show about VB.NET just empty talk?

VBMan on May 14, 2008 06:25 AM

I'd still prefer a row of buttons. What I'm a coder so on your site I have to hand code because I can? Why not combine them, let me code or click a button when I'm lazy.

Mike on May 14, 2008 06:26 AM

A problem with HTML is unclosed tags. Leaving, for example, a bold tag open can cause text farther down your page (ie your footer) to render in bold. So if you do allow straight HTML, you'll have to create a script which finds and closes any tags left open. Considering all the different types of tags, this is no easy task. I'd recommend Texttile or Markdown for this reason.

Chad Braun-Duin on May 14, 2008 06:26 AM

Darn, was meant to be <CleverEndQuote>Again.</CleverEndQuote>

Adam on May 14, 2008 06:27 AM

@Jeff
> Incidentally, if you haven't ever edited a Wikipedia article, you should. I consider it a rite of passage, a sort of internet merit badge for anyone who is serious about their online presence

I'm not serious about my online presence but I am serious about programming. You've edited wikipedia, I haven't. I know C, you're a web celebrity. Go figure.

Anon on May 14, 2008 06:30 AM

Actually, what I think matters most is if you provide a good, accesible quick reference on the editing page.

It doesn't really matter which language it describes, as long as it's present and small but complete enough. I mean, people will need to look HTML up too eventually, editing your site.

Adam on May 14, 2008 06:30 AM

I agree totally. Of the four markups you presented, the only one that was readable enough that I didn't have to refer back to the rendered version to see what was going on was the BBcode. (For a couple I'm still not sure how the first quoted section's end is delimited). But BBcode is practically html with square brackets, so why bother?

As for developers like Martin who don't know HTML, I'd say you should be prepared to learn. This isn't Swahili we are talking about. Learning a markup language for a real developer should be trivial.

In fact, I'd say lack of basic HTML skills in posts might be a good way to spot the posers.

T.E.D. on May 14, 2008 06:34 AM

It depends what you want to do ?

HTML is fine for just formatting (That's what it is for!) but you then have the problem of cleaning the HTML, filtering broken syntax, and your pages are not a consistent format anymore....

Wiki syntax is more than formatting it adds meaning to the text which as a side effect might format it, e.g. Tables since they are a standard format can be read and processed by the wiki as data, internal links work both ways automatically, categories/tags aggregate data automatically etc ...

Perhaps you should use XML instead? [The Angle Bracket Tax] ;-)

BTW Internal links in mediawiki are [[article]] or [[article#section]] external links are not much harder [http://otherwebsite.net/Light_Weight.htm] but are deliberately avoided ....

Jaster on May 14, 2008 06:36 AM

I have used freetext (http://freetextbox.com/default.aspx) for a few projects and it has worked well. I think it works for novice and advanced users.

BrianK on May 14, 2008 06:40 AM

I think one of the biggest concerns with allowing raw html is all the crazy things people can put into your website.

XSS, ugly images, ads, annoying ads, spam, and the like.

Special markup has the function of limiting what people can do.

Jeff Davis on May 14, 2008 06:42 AM

so if someone wanted to come to your site to learn more about html they'd be screwed?

is the site intended for 1337 programmers to come and get more 1337 or are you intending on allowing beginners to come and learn too?

you've completely gone off the rails on this one, especially if you're considering writing your own hybrid mark up language.

burnside on May 14, 2008 06:44 AM

I have a function for my forums that strips out <script>,<img>,etc and everything in between the tags. I have a small warning for the user on what tags not to use.

BrianK on May 14, 2008 06:46 AM

> Ironically enough, there isn't a way to make something bold* in
> (modern semantic) HTML since, as you pointed out in a previous blog
> entry, HTML is the 'model', not the view :)

That's a pretty good point. Perhaps a standard style sheet could be set up, which posters could reference?

Then again, what you are *supposed* to be using is tags like em (emphasis), and strong (strong emphasis), and let the user's browser do that however the user wants such things presented (boldface, underlineing, big font, yelling the word, whatever).

This is precisely why I don't use WYSIWYG editors for HTML. They invaribly have tons of style buttons and almost no proper emlement buttons. If your development tool completely misses the point of the language, the results can't be good.

T.E.D. on May 14, 2008 06:46 AM

+1 BBCode

easy to parse. easy to remember.

Joe Beam on May 14, 2008 06:47 AM

Stupid me, I meant to list script,img tags etc in my previous post. They got filtered.

BrianK on May 14, 2008 06:47 AM

Keep it simple! Consider the most common use, probably a short post of a few paragraphs, some bold, a link, a code section, and a list. In these cases any of the markup languages result in a much simpler and easier to understand post than would be with HTML.

Yes, with HTML you get the "I can do anything" but don't focus on the edge cases and ignore what people will be using it for 99% of the time. I've been writing HTML since '95 and one place I don't want to see it is in a forum (offhand I can't think of any forums I frequent that actually use HTML).

At this point there may well be more people familiar with with the Wiki syntax than with HTML...

Dave on May 14, 2008 06:47 AM

Direct HTML input is the autobahn to invalid XHTML.

http://iamacamera.org/default.aspx?section=develop&id=73

In ten years, we will look back with nostalgia at the days when we left comments on your site via direct HTML input -- the way we fondly recall bygone years when we configured our ISDN modems and put jumpers on hard drives to designate them master/slave.

Direct HTML input is at best, [i]quaint[/i], but by no means a long-term viable solution to online markup entry.

Carl Camera on May 14, 2008 06:49 AM

@James

http://code.google.com/p/syntaxhighlighter/ this JavaScript library seems to be the best way to document code with syntax highlighting, automatic line numbering and copy and paste support. I use it in a lot of my documentation.

Robert S. Robbins on May 14, 2008 06:50 AM

"As for developers like Martin who don't know HTML, I'd say you should be prepared to learn. This isn't Swahili we are talking about. Learning a markup language for a real developer should be trivial."

It may be trivial, but it should also be optional. The interface should never get in the way of usability. Jeff makes that very point himself in "Reducing User Interface Friction" (<a href="http://www.codinghorror.com/blog/archives/000866.html">http://www.codinghorror.com/blog/archives/000866.html</a>;)

"Reduced interface friction goes a long way toward explaining the popularity of services like twitter and tumbr. What's the minimum amount of effort a user can expend to produce something? The answer could be a key competitive advantage.

That single input box on the Google homepage starts to look more and more like an optimal user experience. It might be unrealistic to reduce your application's UI to a single text box-- but you should continually strive to reduce the friction of your user interface."

Martin Wallace on May 14, 2008 06:51 AM

Please don't use wikipedia as a model markup language. It's badly defined to the extent that the only 'compliant' parser is mediawiki itself, which consists basically of a long series of regexes. It's a huge shame that one of the largest consolidated sources of information on the web is all written in a language that's extremely difficult to robustly parse.

As far as using HTML goes, it depends on your target audience. For stackoverflow, I would agree, but for more lay-person sites, HTML seems unnecessarily complicated. One forum my wife and I both post on uses a subset of HTML, and I've lost track of how many times I've had to tell my wife what the syntax for links is. "a href" is second-nature to us, but it's not intuitive if you're not already familiar with HTML.

Nick Johnson on May 14, 2008 06:57 AM

I think making it HTML or even subsetted HTML would work well. But my first thought was that people are going to mess up your layout. True, posts or pages should be content-centered, but all the more reason to limit the freedom of users to a small but large enough set of layout items. Look at Myspace! It's eye- and brainhurt, because everybody puts in their own fonts, sizes, colors, background images, etc... Of course you can limit all this, but I think you will have only 5 to 10 remaining html tags, and then consider these in lightweight ML's. Looking up markup syntax is a bitch, but if you put the syntax of the 5 or 10 most used (and probably 95% of the time, are only used) near the editing-field, it doesn't matter if it's html, BBCode, or whatever imo.

Just please don't let people change fonts, font size, add their own smileys, and css mods.

You will have a select crowd of intelligent people writing articles and solutions to problems. But the questions themselves are going to be asked by Co0dingNewb015. Maybe people with a low post count can have even a subset of your subset of layout items.

Now I'm just rambling

Ps: You're right Jeff. Nobody reads 200 blog comments, except you (Podcast 1). I tried to see I wasn't 'double posting' but stopped around 30 something.

joon on May 14, 2008 07:00 AM

I think anyone who calls himself a web developer should be proficient in HTML. Not just good or familiar with it but proficient. Take Visual Studio for example. I see too many developers squeak by working in Design Mode and when work in Design Mode breaks down (as it often does) they are lost in the sea of code in source mode. I don't even use Design mode. I code entirely in source mode. Its a sad state in our profession when a good percentage of developers can't "debug" HTML code. Sorry about the detour.

BrianK on May 14, 2008 07:02 AM

Whatever markup you go for, please make sure you only offer a limited but useful set of formatting/style tags.
One of the problems with sites that have a lot of user formatted content is that they have a horrible inconsistent mix of styles, layout and structure that makes flicking through the site a constantly jarring experience.

Things I'd want when posting a question/answer/article:

- the ability to include real source code (without having to alter it to remove HTML characters etc) AND have all my formatting/indenting preserved AND have the code automatically coloured in the post.

- include images/diagrams (without having to host them myself on some other site).

- link to other articles on the same site and to relevant external sites (e.g. Sun, MSDN, W3c etc).

- attach example source files.

Graham Stewart on May 14, 2008 07:08 AM

(My point being that none of the goals above are entirely satisfied by HTML, or most of the other simple markups)

Graham Stewart on May 14, 2008 07:09 AM

See reddit's comment box. Little expandable notes on how to use markdown (very handy as a reference when you forget something). And as someone mentioned near the top, the official markdown engine supports html tags. Best of both worlds.

dude on May 14, 2008 07:09 AM

I agree with sticking with HTML, if you aren't going to toolbar the text widget.

One good example is Lifehacker (and probably other Gawker blogs). They have a live comment preview system that uses HTML markup. Nice.

Otherwise, do it with less friction. Give me some toolbar icons.

piyo on May 14, 2008 07:11 AM

As others have said, if source-code is going to be included in messages inline, and I think this is a highly desirable feature, raw HTML is not a good choice, for three reasons:

1. You have to escape special characters, which means at the very least splattering < and &amp; (or should that be &amp;lt; and &amp;amp;?) everywhere.
2. You need explicit <br> or <p> tags. (Again, should I write <br> and <p>?)
3. You need painful contortions of &nbsp; &nbsp; to get indentation right. (&amp;nbsp;?) It's annoying to read C++ without indentation, but it's generally impossible to try to guess what Python code with the indentation stripped out is supposed to be.

All of this makes it hard to paste in source code, and hard to edit it in-place. Even if you allow <pre> tags, it makes it pretty nasty to embed HTML code which might contain </pre> somewhere.

Maybe you should go for 78 columns in a monospace font for everything. ;)

Weeble on May 14, 2008 07:14 AM

Meh. I vote for pre tags around the lot, and autolinking of urls. Plaintext is THE humane WYSIWYG markup language.

james on May 14, 2008 07:17 AM

I've done sites with comments allowed in HTML. It works good, the only issue I run into is that people like to do things to screw up your layout when they leave open table tags/divs (which you probably should then setup some system to make sure their tags are closed) and I've also ran into issues where spammers put in Javascript redirects or popups. So HTML isn't perfect either.

The traditionally [B]Bulletin Board[/B] format is also widely known so that it won't take a user looking up things. Or limit the HTML a user can use.

Kris on May 14, 2008 07:17 AM

Jeff,

If you want a fun challenge, figure out how to make the form input color-coded a'la Visual Studio or Expression Web. HTML is easier to read and write with all the blue and red tags.

Zack on May 14, 2008 07:20 AM

Flickr allows simple HTML tags such as:
<a href=&quot;URL&quot;>link</a>
<strong>strong</strong>
<b>bold</b>
<blockquote>blockquote</blockquote>
<em>emphasis</em>
<i>italic</i>
<img src=&quot;URL&quot;>
<u>underlined</u>
<s>strike</s>
<del>deleted</del>

Ali Karbassi on May 14, 2008 07:23 AM

I read through maybe the first 15 comments which were mostly anti-HTML (to some extent), so I'll chime in some encouragement. HTML makes a _lot_ of sense for your purposes, and all these esoteric things are quite annoying in the end. (I have on two separate Wiki systems inadvertently created links to nonexistant pages just by using formatting marks that seemed innocuous at the time, for one example. In general, remembering _which_ fake-HTML the current textbox wants is the problem.)

I'm a big fan of just saying "these are the tags I want to allow," then maybe extending them with extra attributes or use cases as needed (e.g. <a page="lightweight markup language">LWML</a> or <a>lightweight markup language</a>). No need to have two syntaxes floating around (HTML + Markdown, I guess, is popular with lots of people.) Making sure the input comes out as well-formed XHTML is a solved problem, to be sure.

Domenic Denicola on May 14, 2008 07:36 AM

Wow, uh, your comment box strips out angle-bracketed phrases, instead of passing them through. Well, here's an ironic rephrasing of the first sentence of my second paragraph...

I'm a big fan of just saying "these are the tags I want to allow," then maybe extending them with extra attributes or use cases as needed (e.g. [a page="lightweight markup language"]LWML[/a] or [a]lightweight markup language[/a]).

Domenic Denicola on May 14, 2008 07:38 AM

I'm a fan of Markdown myself. It's easy to learn and already accepts a lot of conventions that pre-date HTML (like asterisks and underscores). I *know* HTML, but that doesn't mean I want to use it. In fact, HTML is so annoying to type, I would rather use a graphical editor and clean up any mistakes afterwords.

Also, I agree with those who said:

1. Not all programmers do any sort of markup. You should offer a graphical editor, and the option to turn it off.

2. If you offer any sort of HTML, it has to be a small subset.

Rhywun on May 14, 2008 07:42 AM

My websites use BBCode because the module I use for forums supports that. I never quite understood why BBCode because you end up using much of the same syntax as HTML except you use square brackets instead of angle brackets.

<b>Bold Phrase</b>
[b]Bold Phrase[/b]

What's the difference? Why create an entirely new syntax when one is already available and well documented?

Textile was written (and it isn't from troff!) with the idea that marked up text should be readable as plain text. Underline (now italicize) by putting underscores around something. Bold by putting asterisks around it. Make a list by putting asterisks in front of each line. Simple to understand, clean, and easy. Unfortunately, not very powerful.

I personally prefer to enter things in HTML. I know it, and I don't find it all that unreadable. What I really can't stand is each site having different standards. I don't mind learning something, but I hate learning to do the same thing dozens of different ways. HTML is standard and that's good enough for me.

My suggestion: A modified HTML. One where you don't put <p> for paragraphs breaks and things in the format of http://xxxx.yyy or xxxxx@yyyy.zzz are automatically linked. But at the same time allows you add a bit of HTML for the more complex stuff.

That way, can type your entire comment without a lick of markup code, but if you know you want to emphasize a word here or there or add a link, you know how to do it. That'll satisfy everyone.

David W. on May 14, 2008 07:43 AM

I was researching on the exact same topic today for my project and I've chosen markup specification from

http://www.wikicreole.org/

Particularly I liked their reasoning and father of wiki is behind that too I think.

lubos on May 14, 2008 07:47 AM

+1 markdown or wysiwyg.

XML derivatives were made for ease of parsing, not ease of use. The rule [Don't make me think] is superseded by [Don't make me do extra work]. Of course, optimally you'd just give a wysiwyg textbox with options to switch to markdown view. Just as 90% of your readership knows/should know HTML, 98% of your readership knows/should know how to use a wysiwyg text editor. Even if somehow 90% of your readers knew or should know emacs, it doesn't give you license to require knowledge of emacs commands at stackoverflow.

Of course, I completely understand if Joel completely overrode your objections to build a new markup language that cross-compiles to VBscript, javascript, PHP, XHTML, Markdown, ARM, SPARC, and is hot-pluggable as a Linux kernel module. Otherwise, I might have just heard 50 thousand heads exploding in the distance.

Jimmy on May 14, 2008 07:52 AM

The more I think about it the more I think a basic WYSIWYG editor is the only real way to go.

It requires minimal thought to use and allows you to properly support the various features needed for a useful coding site (e.g. press the "Insert Code Block" button and get prompted to select which language it is, so that syntax-colouring can be applied)

Graham Stewart on May 14, 2008 07:55 AM

I think it's a good idea to provide options. There's no guarantee that EVERY user of stackoverflow is going to be comfortable using HTML, especially if they just want to write a quick post. For example, even if you enforce HTML, then parse newlines as BR and P automatically; don't make me think.

Let users set up their markup preference in their profile, be it HTML, BBCode, Markdown, whatever.

I also agree that colored syntax for code blocks is a great idea, since the whole focus of this project is on code.

Erick on May 14, 2008 07:55 AM

First, why do you assume every programmer is familiar with HTML? Your site will appeal to a WIDE range of developers who may never have written HTML.

Second, HTML is very broad. Do you really want your users entering inline styles? Reusing your parent CSS classes? re-arranging your layout with relative and absolute positioning? You certainly don't want users to enter javascript of any kind.

Third, can you really whitelist HTML? Can you deal with all the clever XSS hacks? (<a href="http://ha.ckers.org/xss.html">http://ha.ckers.org/xss.html</a>;). If so, you have crippled HTML to the point that it resembles lightweight markup, except your users won't know in advance which parts of HTML will work

I would love to offer a WYSIWYG editor + friendly editable markup that doesn't open up big XSS holes. If you make that work with HTML please let us know how you did it.

Mark Porter on May 14, 2008 07:56 AM

Can't we just type all our comments in Wasabi?

Martin Wallace on May 14, 2008 07:57 AM

Cool, let's make the users generate their own POST command too.

Enabling HTML editing is great, but requiring it for simple formatting just adds friction to the communication process. +1 for the GUI editor.

Kevin Dente on May 14, 2008 08:00 AM

I use Markdown. Clean syntax, particularly for linking, and it gives me the freedom to use HTML if I want. Works great for me.

Having done a fair amount of Wiki work, I absolutely hate how MediaWiki formats tables, though I find most of the rest of it's syntax at least tolerable.

Markdown is, in my opinion, the best compromise between light-weight formatting, and the raw power of HTML.

Jeff Craig on May 14, 2008 08:00 AM

Imagine you want code-coloring. So instead of

<source lang=qbasic>
10 PRINT "I ROCK AT BASIC!"
20 GOTO 10
</source>

you have to write

<pre>
<span class="codeLineNumber">10</span> <span class="codeStatement">PRINT</span> <span class="codeString">PRINT</span>
<span class="codeLineNumber">10</span> <span class="codeStatement">GOTO</span> <span class="codeNumber">10</span>
</pre>

?

That's ugly.

Matthias on May 14, 2008 08:02 AM

One benefit of using something like Markdown is you automatically get things like escaping and potentially code coloring, which is arguably a very important aspect for stack overflow. I personally use reStructuredText for most my HTML editing because it takes care of the HTML aspects for me such as escaping XML and coloring code.

Eric Larson on May 14, 2008 08:03 AM

WTF? do I have to use &amp;lt;???

Imagine you want code-coloring. So instead of

<source lang=qbasic>
10 PRINT "I ROCK AT BASIC!"
20 GOTO 10
</source>

you have to write

<pre>
<span class="codeLineNumber">10</span> <span class="codeStatement">PRINT</span> <span class="codeString">PRINT</span>
<span class="codeLineNumber">10</span> <span class="codeStatement">GOTO</span> <span class="codeNumber">10</span>
</pre>

?

That's ugly.

Matthias on May 14, 2008 08:04 AM

We use MediaWiki in our internal Intranet, and we found that the Wiki Syntax is hard for non-technical users, but technical users usually "got it" after a week or so. I think it's one of the cleanest Syntax, because of it's headings (==), it's tables ({|bla) and it's lists (* ).
BBCode is a bad solution for a non-existant problem in my opinion, as it is essentially HTML with square brackets.

Bare HTML works fine, but keep in mind that there are multiple ways to do lists.
<ul>
<li>Bla
<li>Blu
</ul>

works, but without the closing </li> tags, you are not XHTML Compliant anymore. You could either:
* Live with that
* Write a parser that tries to fix that, with all the bug testing and fixing that goes along with that
* Use another syntax

It should be noted that Wiki Syntax != Wiki Syntax. Pretty much every Wiki Software has it's own Syntax, that is not 100% compatible with other Wiki systems.

Markdown looks like my favorite: It exactly does what is needed, with an intuitive syntax.

Michael on May 14, 2008 08:10 AM

YES! HTML markup is king! Don't make us learn another markup language! Everyone who disagrees with you (and me) is crazy and/or an idiot!

Peter on May 14, 2008 08:12 AM

I'm less worried about bold and italic text than for code, I would love to see some code coloring (keywords like int in different color for example), but that's a lot of work, but it will be sweet.

Juan Zamudio on May 14, 2008 08:14 AM

My 15 year old learned HTML for his MySpace page.

Charles on May 14, 2008 08:15 AM

Jeff, why you don't use the wiki technology of fogbugz?

Eduardo Diaz on May 14, 2008 08:16 AM

now i'm really confused, XML has angle bracket tax and HTML doesn't. not only that but, as I type this I get: "Your comments: (no HTML)". Hm?

:/

/mp

Mauricio Pastrana on May 14, 2008 08:16 AM

Sorry, Jeff. I have to call shenanigans.

In "The Angle Bracket Tax," you had fairly harsh words about working with tags within a human-read document. You pointed out how XML tags can degrade a document's readability, because they add extranneous noise around the text. You also envisioned an ideal world where the tags are hidden, created and managed in the background.

Fast forward today and you appear to say the exact opposite, only we're talking about HTML instead of XML. The loss in readability is now worth it because the layout becomes much more precise.

You were pining after interfaces that hide tags a few days ago. In the XML argument, you said "You might argue that XML was never intended to be human readable, that XML should be automagically generated via friendly tools behind the scenes, never exposed to a single living human eye. It's a spectacularly grand vision." If I replace the word "XML" with "HTML", your vision becomes reality, as there are countless WYSIWYG HTML editors on the market today. But today's post puts you firmly in the camp of inline markup editing.

Personally, I prefer inline editing to WYSIWYG, and XML over fancy, fuzzy markup replacement. I also think that XML is a wonderful way to facilitate communication among disparate systems. It may not have been the original intent, but sure as hell is an awesome side-benefit.

I think you agree with me, but first you need to clarify your position.

Frank on May 14, 2008 08:17 AM

This, written by Jeff just this week about XML. Somehow, I don't see how requiring HTML will escape this criticism either.

"Wouldn't it be nice to have easily readable, understandable data and configuration files, without all those sharp, pointy angle brackets jabbing you directly in your ever-lovin' eyeballs?"

K|O|G|I on May 14, 2008 08:18 AM

"If you can sling code, a little bit of presentation markup is child's play."

Clearly you don't play around on forums very much. You're delusional if you think your site will be primarily good programmers. It will be 10% good programmers and 90% noobz and script-kiddies like everywhere else.

Use BBCode, HTML, or whatever, but don't expect the users to understand it. Personally, I don't care - I must know 50 different markups used on different forums - you just figure it out, and if you're not smart enough, you don't.

Jasmine on May 14, 2008 08:18 AM

Yes, you should use HTML for stackoverflow. I'm not sure if it's the best choice for CMSs in general, but for programmers it is the better choice. While I think something like <a href="http://haml.hamptoncatlin.com/">Haml</a> would be fun and interesting, HTML would provide the perfect barrier to entry - not to easy and not to difficult. Like you said, programmers should know it.

It is extremely annoying when I enter a comment somewhere, include an HTML link, and the comment is rendered with the href value as a link and the other HTML converted to angle brackets and crap in the comment. It's made worse when I can't edit it.

Lance Fisher on May 14, 2008 08:19 AM

Too funny. Your argument in favor of ubiquity and convention was exactly my point against your argument yesterday in your anti-XML post.

dinah on May 14, 2008 08:23 AM

HTML is fine by me. If you don't know it, now is as good a time as any to learn it.

PaulG. on May 14, 2008 08:25 AM

Whatever mark-up you go for you should also allow HTML, if just to accommodate those nice IDE's which allow you to copy code as HTML (automatic highlighters suck).

[ICR] on May 14, 2008 08:28 AM

Seriously...how advanced comments to you normaly write on a forum?

I never, ever, use any more than these(bb-code).
[b]
[img]
[url] (often automaticly generated from correct urls)
[code]
[quote]

For these simple things html is overkill, first of all you would have to create a huge whitelist, the simple [b]-tag can be written in hundred different ways using html. A whitelist for css would be even harder to write, imagine parsing font-size:9999 etc.

Secondly, the code-tag usualy does server-side syntax highlighting, same thing with quote-tags, it can be used to link to the original message. Doing this with classic html-tags would be realy confusing.


Syntax highlighting is also a (the only) good reason why to use WYSWYG-editors, these usually(?) allow you to paste pre-formatted text directly from your IDE (At least the one on the msdn-forum does this even though that editor sucks in thousand other ways).

crazy ivan on May 14, 2008 08:31 AM

Jeff,

What you describe sounds very similar to what Dan Brettle has written in NeatHTML. Have you heard of this?

From Dan's description:
"NeatHtml™ is a highly-portable open source website component that displays untrusted content securely, efficiently, and accessibly. Untrusted content is any content that is not trusted by the website owner. Typical examples include blog comments, forum posts, or user pages on social networking sites. NeatHtml uses an “accept only known good” (whitelist) approach to security to help prevent attacks which are not yet known."

You can read more about it @ http://www.brettle.com/neathtml

I think he strikes it right on the nail. Allow use of HTML but keep it safe.

CyteShoppe on May 14, 2008 08:33 AM

I'm with you 100% on scrubbed HTML. It's easiest to implement /and/ explain ("You can use HTML.") Most novices already know HTML. It's like learning your ABCs these days.

If you don't know HTML, does that really matter? Seriously now. People can read plain text just as well. This will be a wiki fer chrissakes. If your plain text is /that/ much of an eyesore, the other 1337 HTML h4xx0rz can pretty it up.

Furthermore, you *learn* before you *teach*. If you don't know HTML, fine. You can learn it by, I dunno, /using the site/. Read the relevant HTML literature, which is sure to be present.

Chuck Rector on May 14, 2008 08:34 AM

My vote would go to HTML with a couple of minor extensions that would handle most comments with no markup at all. First, treat a blank line as an implicit paragraph boundary. Second, treat an unadorned URL as a link.

To avoid problems with parsing more complex HTML, these extensions should only be active at the root level and deactivated inside any open HTML tags.

Beyond this, any markup system you choose will require users to type something. HTML is much more widely understood than most of the alternative markup languages - especially amongst programmers.

Stephen C. Steel on May 14, 2008 08:34 AM

I think, as others have suggested, that you have stumbled upon some of the benefits of XML in thinking about markup that you seemed to overlook when discussing XML in the last entry. The existing tools and standards make very quick work of the types of things you want to do: white list certain tags, validate input, make sure it's well-formed, etc. Just write a simple XML Schema(or DTD or RELAX NG) and validate the input against it.

Mike Ivanov on May 14, 2008 08:34 AM

What about consistency? If you use something like Markdown, every title, list and emphasized text will look the same. If you allow HTML, you're going to have bold title, italic titles, different types of headers, maybe a few font tags and whatnot; all kinds of lists, and all kinds of emphasized texts.

I used to have my own markup language, but I switched to Markdown for all of my projects.

LKM on May 14, 2008 08:34 AM

Add another solid vote for Markdown. It mimics what I'd naturally do for formatting in a text-only document (except for the headings bit, but that's rarely needed in a Q&A forum in any case). Plus, if you can't think of how to do something, HTML syntax is fully allowed as well.

I don't know about you, but the main formatting I'll ever do in a page are:

*mildly* emphasized text
**strongly** emphasized text
* Bullet lists
1. Numbered lists

The only time I hit the Markdown manual after discovering it the first time was to confirm that it really was as easy and intuitive as that. And the headings bits :)

HTML is obviously the lingua franca of the web, but that doesn't make it easy to read. If I want to read content embedded in HTML, I put it into a browser. If I want to write content embedded in HTML, I write it in markdown (multimarkdown, actually, which is a minor variant on markdown) then paste the generated HTML into the web page. HTML is good for doing all the other ancilliary stuff around the content, but always gets in the way o the content itself.

Of the options presented here, Textile and Markdown are the most transparent markups, IMHO.

The only thing I'd add is that, please, as is obvious in the broken discussion about XML, make sure you just escape the HTML of unrecognized tags, not filter them out!

Tom Dibble on May 14, 2008 08:35 AM

Also, I'd strongly advise you to not do Yet Another Tweaked HTML Version. Going into a forum which speaks in HTML you have to read the manual every single time: are double-line-breaks automagically converted to <p>? Are stray tags automagically escaped? How does this particular site support quoting?

Going into a forum which speaks (ick!) BBCode the specifics are generally assured (although the quoting syntax changes inexplicably).

Going into a forum which speaks markdown or textile, and I know precisely what I'm getting.

Remember: your site will not likely be the only one people type in throughout their day. Make the experience adhere to a common standard. You users will thank you.

Tom Dibble on May 14, 2008 08:40 AM

I'm glad you're looking for alternatives to wikipedia and bbcode markup, but I'm not sure Joel Spolsky's Wikipedia page in basic HTML would be any less intimidating.

"<p>Spolsky grew up in <a href="/wiki/Albuquerque%2C_New_Mexico" title="Albuquerque, New Mexico">Albuquerque</a>, <a href="/wiki/New_Mexico" title="New Mexico">New Mexico</a>"

Using html as the editor language doesn't change the need for a 'help' page, at the very least showing allowable tags (<script> not being one of them), which means I'd need to look up what I can/cannot use.

If I wanted to make your basic html example, is that a <div> or a <span> being used for the code snippet? What's it's id or class? As a programmer, I don't think I'd necessarily be sure on the details. Does <p> need a closing </p>? How about <li> inside <ul>? Should it be <br> or <br />? Is it <b> or the css font-style: bold;?

<p>I think for a wiki, you don't want more than one way to do things. Should the heading be <h2>, or will it end up being <font size=+5> half the time? Are you sure tables are only used for tabular data and not as a layout format everywhere? Policing content seems bad enough, I'd hate to have to police meta-content as well.

<p>Finally, for each paragraph in basic html, I need to begin it with a <p>? That's terrible for writing flow and though process, but maybe thats just me.

Samson Yeung on May 14, 2008 08:42 AM

Still rebuilding Code Project, Jeff? Seriously, we have thousands of regular posters, some of whom have posted thousands of messages over the last nine years, and some who've posted tens or even hundreds of good articles. Yes, there's some dross and we're trying to filter out some of the crap before it gets posted; we're finding that traditional editors don't scale up to this level of activity, so allowing long-serving members to take a first pass on the article queue.

CP uses HTML for its articles and forum posts. Over the years the blacklist of allowed HTML has tightened as people have abused it, but generally the model has been 'trust the poster'. It may now have changed to a whitelist in the ASP.NET rewrite, I don't know.

Mike Dimmick on May 14, 2008 08:47 AM

I think you are making an assumption that everyone who goes to stackoverflow want's to comment knows HTML..

I am a DBA but read your site quite a bit and don't know a single command in HTML. I know many developers in large companies (I work for a bank) that never use HTML either so why not make it easy on all of us with a "friendly-but-irritating HTML GUI browser layout control".. but you can use html if you want to.

CHOICE IS GOOD!!

-jfc-

JFC on May 14, 2008 08:49 AM

OMG... are you building StackOverflow by commitee?

Just build it already!

j/k :)

Jonny on May 14, 2008 08:51 AM

Markdown looks easier on the eyes, but I'd need a cheat sheet handy for a while since I've never actually had to use it.

With HTML I have only one hesitation: embeded > < and &. Even programmers forget to encode these (or find them all) and if code samples are going to be a frequent occurance, then mistakes are going to happen. A lot.

You could help this by making "<code>" tags or something which will ignore *ALL* markup between them. Assume that everything surrounded by <code> is to be taken literally.

Clinton Pierce on May 14, 2008 08:52 AM

"Presentation markup" is an oxymoron. Markup is for tagging content to capture meaning, not style. If you want to give users control over presentation, then you don't want a markup language, you want a formatting language.

Modern HTML is primarily a markup language. As HTML evolved there was a push to get away from presentation and back to pure markup. Bold and italic tags persist for compatibility reasons, but in an ideal world, they'd be history. HTML is the wrong choice for specifying presentation.

Most text doesn't require any styling. Fancy formatting can enhance text, but it shouldn't be necessary to express the idea. Perhaps the best solution is not to give the user any ability to control the presentation, hoping that they'll instead focus on the content. Barring that, I'd go for some very simple formatting conventions.

Adrian on May 14, 2008 08:54 AM

Of course, my previous comment was mangled because I assumed the "no HTML" instruction meant that if I used HTML-like things in my text they'd be ignored. Text is text, right?

UI FAIL.

Embedded <, > and &amp; are going to cause all kinds of problems. Perhaps a <code> tag that causes everything inside it to be automagically encoded. Because even good programmers will make mistakes and forget to encode characters, or just not find them all in their code sample.

Clinton Pierce on May 14, 2008 08:55 AM

I support your idea of using HTML. A rather simple white-list of accepted tags (dismissing all others) and a help-page listing them all should be good enough. Instead of the typical WYSIWYG-toolbar you could provide a small list of allowed tags for quick reference on what works and what doesn't.

Additionally, you can provide predefined formatting styles via CSS classes and IDs that you allow people to use (with example on the aforementioned page on what they look like), again dismissing any other classes or IDs, as they are going to use inline-CSS anyway, if not just basic tags, hehe.

About the issue with annoying paragraph-tags, think of blogger.com; a carriage return in user input is transformed into a paragraph-tag and, when editing an entry, re-transformed into a carriage return. Now that is user-friendly. :)

Mephane on May 14, 2008 08:59 AM

Another recommendation for Markdown. As others have noted, the ability to include HTML is a huge bonus.

Also check out PHP Markdown Extra:

http://michelf.com/projects/php-markdown/extra/

some very nice additional features including Fenced Code Blocks which will be handy for stackoverflow.com.

Go on Jeff - give Markdown some prominence on stackoverflow.com - maybe it will start to gain critical mass. Your readers helped choose the name for the new site - maybe you could poll us for a choice of comment markup?

Tom A on May 14, 2008 09:01 AM

People should not have to write in HTML, but they should have the option to edit the HTML.

There are plenty of good WYSIWYG editors. Keep life simple.

Steve on May 14, 2008 09:04 AM

HTML shouldn't be used for anything like this. Lightweight markup languages let the users concentrate on the content and not on the syntax. Although this should be a programmers site, that doesn't mean one shouldn't care about usability. And by the way: I'm pretty sure, that there are some excellent C++ or Python programmers out there, who have little to no experience with HTML. Just my two cents.

Florian Potschka on May 14, 2008 09:09 AM

"Flickr allows simple HTML tags such as:
<a href="URL">link</a>
<strong>strong</strong>
<b>bold</b>
<blockquote>blockquote</blockquote>
<em>emphasis</em>
<i>italic</i>
<img src="URL">
<u>underlined</u>
<s>strike</s>
<del>deleted</del>

Ali Karbassi on May 14, 2008 07:23 AM"

That, plus a '[code]' tag should be more than sufficient for everyones needs. Who the hell uses HTML tables in comments? And if you forget anything, you can just View -> Source to remind yourself. ;-)

As for Wikipedia's markup:-- the only reason Wikipedia is so well organized is because of the constant layout editing of a few, hardcore users. Remember when the Wikipedia founder said that all those edits were for content, not layout? Meaning that around 500 users were inputting nearly all the information into Wikipedia! Hah...

P.S: doesn't having "orange" as a constant undermine the purpose of a CAPTCHA?

transciber on May 14, 2008 09:11 AM

I've got to add my voice to the "no HTML as default" crowd. I'm not a "real" developer, so maybe I'm outside your target demo, but I follow codinghorror pretty religiously, and several other developer-oriented sites as well.

HTML is just clunky. I don't see how 7 keystrokes (with repeated press/release on the shift key) to bold something vs. 1 ctrl-B keystroke is defensible. I can code it, sure, but I sure don't like to. (BBCode is scarcely better in this regard.)

Posting to forums is about speed, not precision. I can count the times I've needed to add a table to a forum post on one hand. But boldface? Italics? All the time. HTML makes me pay a hefty toll on the roads I drive every day to subsidize that bridge I only cross once in a blue moon.

Personally, I like Textile. It's based on the ersatz formatting people used for years in plain text e-mail, so it's pretty familiar. Spcifying link aliases is dead simple. And it's got the edge in speed. 2 unshifted keystrokes to bold or italicize text is a reasonable compromise in a plain text editor. And that's what I'm doing 90% of the time in a forum post. (I also like TiddlyWiki's code formatting token - three open braces on their own line to start, three close braces on their own line to end. Nice and quick.)

Jim Doria on May 14, 2008 09:12 AM

Could you please, please, *please* escape < and > instead of stripping them in the comments? Or at the very least place a reminder that you *do* strip them instead of just "no HTML"?

Adam on May 14, 2008 09:18 AM

>P.S: doesn't having "orange" as a constant undermine the purpose of a CAPTCHA?

This has been asked a million times, and answered a million times. Go look it up!

Adam on May 14, 2008 09:19 AM

Just as with licenses (http://www.codinghorror.com/blog/archives/000833.html), just pick a markup language, any markup language. They are a necessary evil.

Hoffmann on May 14, 2008 09:23 AM

In making the case for HTML as a lingua franca, you're also making the case for using XML, something you disputed in a previous post, particularly as it relates to "Conversion" and "What People Know".

Graham on May 14, 2008 09:26 AM

One of the reasons I like Markdown so much is that you can mix it with HTML, and Markdown's parser doesn't puke.

If you choose to implement Markdown syntax verbatim, you can allow people to use a combination of Markdown and HTML with nearly no additional work over just allowing HTML.

Nifty, no?

Darren on May 14, 2008 09:33 AM

CodeProject has used HTML as an option on its forums for years. In fact, it's the *only* option if you want formatted text - the choices are HTML (escape everything yourself, with newlines and emoticons converted) or raw text (output is exactly what you type).

As you suspect, this is great for those of us who are comfortable with HTML. However, there are problems:
* Not all programmers know HTML, and those that do aren't always comfortable with it.
* It's verbose. I'm typing in a bullet list here - while i'm actually anal enough about formatting to take the time to enter the proper tags, it's nowhere near as fast as just indenting and typing asterisks.
* It makes simple things difficult. Not just bullet lists, but bold, italics, special characters like angle brackets and ampersands - pasting in code samples often requires a lot of escaping.
* It puts the responsibility for proper styling on the user. The syntax for a multi-line block of code and a single keyword are different in HTML. Which one will be used? Both? Neither? Good formatting is nice, but even among users who do have a working knowledge of HTML, expecting good semantic markup is often too much to ask.
* You can't use it raw anyway. Security concerns, broken HTML (mismatched tags, pasted in from Word or just sloppy typing, etc.), CSS that isn't compatible with the site layout... You pretty much need to have a good (== error-tolerant) HTML parser server-side.
* You aren't really accepting HTML documents anyway - you're accepting snippets, with strict rules on what's actually allowed. This, combined with the need to accept malformed markup, strip either all CSS or just the more dangerous styles, pretty much kills the idea that what a user can type in is the same, predictable, no-magic-involved HTML they might be using day-to-day.
* Verbose, unreliable linking. HTML links are very powerful, but if you expect most links to be to other resources *on the same site*, then they end up adding a lot of extra typing and unnecessarily fragile.
* Many of us have been trained by the many blogs and forums that don't deal well with HTML to avoid it - so even when new users *could* use HTML, many of us won't out of fear that our replies will be mangled. This isn't so much an argument against HTML as it is an argument *for* including a visible syntax reference or WYSIWYG editor.

There is one big advantage though, and you mentioned it already: the rules for processing *well-formed* HTML are reasonably stable and will likely remain that way for the foreseeable future. The extra work required from users for marking up aspects of their text *can* pay off then by removing a lot of ambiguity and help to keep things stable over time. Whether this is worth the tradeoffs is another matter; i suspect that a good editor can do a lot to reduce the stress and frustration of markup, escaping, etc.

Shog9 on May 14, 2008 09:39 AM

I think you should use a control. I don't want to type in any formatting, just provide a simple control that applys the formatting for me.

Allow for:

Limited styling: (Bold, Italics, (Font Size and Maybe Color)
Allow for Pasting of code: (HTML PRE Tag)
Allow for hyperlinking, maybe lists

That is the basics.

Jon Raynor on May 14, 2008 09:39 AM

What happened to you being in favour of skin-ability like you argued for on the podcast? Markdown looks good to me though :)

Martin Clarke on May 14, 2008 09:43 AM

I kind of like [b]some bold text[/b] -style. Its easy and easier to remember because almost all use it. If I should try to remember something else, that would be more difficult. I should be able to select text and click B-button and the text becomes surrounded with [b][/b].

Silvercode on May 14, 2008 09:45 AM

It's worth pointing out that wikis, at least popular ones like Wikipedia, have a de facto division of labor. Some study found from the statistics that:

1) The majority of edits are very small (in terms of diff), and made by a small group of people who make a lot of small edits (call them editors).

2) The majority of content/words on the the wiki came from very large edits, made by a large group of people who contribute very rarely, often just once (call them contributors).

Essentially, wikis have the same writing process any other collaborative effort---encyclopedias, newspapers, etc. have. Contributors supply large, content rich, but less than perfect content, which is then swarmed upon by copy editors, fact checkers, decision making editors, etc. who polish it into finished form.

Since the people who will presumably do most of the formatting, cross linking, citation-adding and the like are probably a small core of people willing to put in significant time, the biggest challenge is how to make it easy for someone with useful specialist info to easily add content without much of a learning curve, even if the result is less than ideal.

Matthew L. on May 14, 2008 09:46 AM

Personally, I like Markdown but if your going for something simple, why not go all the way and just limit comments to plain text? What's the big deal with all the fancy formatting? People get by on newsgroups, email, chat and SMS just fine without special formatting.

Whatever you do support, I'm betting most people will not use the features so you're worrying about a feature that only the minority will ever use. Lack of formatting certainly hasn't been an issue with Coding Horror's comments.

Just a thought...

David Avraamides on May 14, 2008 09:49 AM

So if you go for HTML then how would you specify what language your code is?

You'll end up writing something like:

<h1>How to set a text on a label<h1>
<p>Here is how to do this:</p>
<code lang="x-csharp">
/* Here is teh codez */
myLabel.Text = "Hello World";
</code>

Yuk!

Graham Stewart on May 14, 2008 09:57 AM

A dropdown list with "Textile", "Markdown", "HTML", etc. would do. Parse to HTML for storage. Let users choose the preferred markup in their profiles. Set textile default. This simple :).

alex on May 14, 2008 10:15 AM

I'm less worried about bold and italic text than for code, I would love to see some code coloring (keywords like int in different color for example), but that's a lot of work, but it will be sweet.

Juan Zamudio on May 14, 2008 10:19 AM

Sorry for the double post.

Juan Zamudio on May 14, 2008 10:21 AM

Besides being a perfect example of the *wrong* approach to designing what should be an extremely simple web component (i.e. insisting people write markup to decorate their comments) - I mean, do you want your site to be open to beginners and those that want to learn about development? Guess not - anyway:

"Incidentally, if you haven't ever edited a Wikipedia article, you should. I consider it a rite of passage, a sort of internet merit badge for anyone who is serious about their online presence."

To me this is madness. You really want the sheep of this site to go and flock to Wikipedia *randomly* and without clear purpose, and edit some poor article? At least qualify it so people who might think your special Merit Badge is worth earning realize that they might affect others with this action.

SpongeJim on May 14, 2008 10:28 AM

One of the main themes of your website has been "usability". Hell, just a few days ago you went off on how bad XML was. Now you're proposing to use a subset of it, with the exact same difficulties? Because people "should already know it"?

Sorry, Jeff, you're off your rocker on this one. WYSIWYG exists for a reason. If I have to type "a href=blahblah" everytime I want to show a link, I just won't use your site. Whether we should know it or not is irrelevant. Whether it's easier to use is. If you really need HTML editing, include it as an extra option, but it should by no means be the only choice, or even the default.

Brandon on May 14, 2008 10:29 AM

Jeff,

Re "If the source and destination are the web, why not use the native markup language of the web?"... we invented higher level languages so that we *don't* have to write everything in the exact representation it's consumed. Compilers take our higher level code and translate it into machine code. How does "if the source and destination are CPU, why not use the native machine code of the CPU" sound to you? It sounds really obsolete to me.

Re "If you're a programmer, you damn well better know HTML", that's really myopic. How does "if you're a programmer, you damn well better know C" sound to you? You personally don't know C. I doubt that makes you less of a programmer by and of itself.

Anyway, I really hope you end up using some lightweight markup language (and thus adding more mass to it; nothing reaches "critical mass" without people supporting it beforehand) instead of HTML. You are making a community site at Stack Overflow so you have to stop thinking about *your* brain cells and start thinking about brain cells of *all* programmers that you would like to use your site. You want only web programmers? I wish you all the luck. But personally I was hoping that for less specific target group.

Ivan on May 14, 2008 10:30 AM

> Of the four markups you presented, the only one that was readable
> enough that I didn't have to refer back to the rendered version to
> see what was going on was the BBcode. (For a couple I'm still not
> sure how the first quoted section's end is delimited). But BBcode
> is practically html with square brackets, so why bother?

I vote for BBCode. The reason for the square brackets is that it lets you quote HTML/XML code snippets without any effort. On a programming site, that's a win.

rblaa on May 14, 2008 10:31 AM

How about a WYSIWYG Silverlight or Flash control? You don't want search engines crawling that page anyway.

Zack on May 14, 2008 10:38 AM

@Zack: what about people who don't have Silverlight or Flash? There are plenty of pretty good HTML editors. TinyMCE is one of them.

Cristian Ciupitu on May 14, 2008 10:50 AM

I meant HTML editors written in HTML + JavaScript (that don't need Flash, ActiveX etc.).

Cristian Ciupitu on May 14, 2008 10:52 AM

As we are currently writing our personal blog, we have to decide which way to go as well. We prefer a lightweight markup language instead of writing our articles in HTML.

Getting used to the syntax is a process of writing one or two articles and you are familiar with it if you don´t use different markup languages on different websites ;).

For user comments I second the idea of keeping it the most simple possible.

Martin Czura on May 14, 2008 10:53 AM

Even knowing html, I just find it painful to have to write all this markup. I don't understand what is so hard about Markdown? Do your users really need anything fancy?

Also, realize this about Markdown (and similar): it was not written as an easy-to-parse unambiguous markup language. It was written as a way to make writing formatted text *easy* and highly legible. You should not be stopped by some undetermined corners cases in the markup. When I write a comment, I am not writing code. These are 2 different things, and you should not apply the thinking of one to the other, I believe.

Just my 2 cents!

charles on May 14, 2008 10:55 AM

Another vote for TinyMCE/FCKEditor with "View Source" enabled, ideally remembering which mode you used last, as Blogger does.

I'd also recommend an "automatic line breaks" checkbox for source view; even Slashdot does autobreaks by default now in its Ajax comment form.

Braden on May 14, 2008 11:01 AM

Tangentially, I highly recommend HTML Purifier ( http://htmlpurifier.org ).

Braden on May 14, 2008 11:07 AM

Seems to me the conclusion you've come to is that a WYSIWYG HTML editor is what is required. No weird code and wiki functionality a button-click away. And most allow you to drop into the unadorned HTML.

Does MS-Word make you type codes? No, the focus is the content. This seems like a no-brainer to me.

And not every programmer knows HTML. Some do systems/device programming or windows forms applications, exclusively. Gasp! I know.

Robert Barth on May 14, 2008 11:16 AM

the link for 'why doesn't wiki do html' is broken, it should be http://c2.com/cgi/wiki?WhyDoesntWikiDoHtml

Wilfred on May 14, 2008 11:22 AM

I agree with your decision. I've had this fight with clients and former bosses so many times. Whether good programmers should know HTML or not is irrelevant; the fact is, a larger percentage of your readership knows HTML than knows any other markup language, I can assure you. You please the most, and the rest will have to catch up.

I would suggest adding some classes/ids for use in markup, though. Perhaps list styles etc. These can be documented briefly on the site, and anyone who knows HTML will know how to use them.

Someone mentioned the benefits of having an abbreviated link tag so that you didn't have to remember how to type an entire URI to a page; but if you plan your URIs well and use some URI mapping/rewrite magic, this shouldn't be an issue; URIs will be simple enough to remember or paste with little fuss.

Lucas Oman on May 14, 2008 11:41 AM

It's not that writing HTML for your post is hard (it's not) or that it takes a lot of time (it doesn't). You need a WYSIWYG editor for the site beacuse it forces you to focus on the content and not the presentation. Also, this allows you to more easily apply a consistent style across the website.

Jim Greco on May 14, 2008 11:58 AM

I have to throw my weight behind Markdown (as so many of the above posters have done). Of the examples you showed it is the least verbose (wikipedia's format is horrible in my opinion, extremely verbose)

Writing html in my text editor is fine, but I don't really relish the idea of inputting straight html into a web form. If I post in your community site, I'm not trying to write my own page, just trying to enter some of my thoughts.

As one of the previous posters mentioned, Markdown supports straight html, so if they are right, then Markdown seems a flexible option.

Or I can put it another way: I would feel less inclined to post on stack overflow if I had to write html to do it. Markdown on the other hand, wouldn't bother me.

Justin Standard on May 14, 2008 12:19 PM

If you did use html, could you also have a standard no frills text only mode? I hate using break and paragraph tags when an "enter" would do nicely.

brian on May 14, 2008 12:31 PM

I agree that HTML should be allowed in forms, but the problem is XSS. When you come up with a really good way to allow XHTML (attributes, too!) and prevent XSS in a bulletproof manor, please do share. I've been wanting a solution for this problem for quite some time. I even asked Haacked to explain how he does it in Subtext quite a while ago (I'm not even sure how effective it is in Subtext). While he agreed it would make an interesting blog post, he evidently does not have the time to put it together (which I completely understand).

While I am on the topic, this is the only PHP "solution" I could find: http://shiflett.org/blog/2007/mar/allowing-html-and-preventing-xss

Josh Stodola on May 14, 2008 12:43 PM

On 4: No, they don't. Which is why HTML is a reasonable choice, since whatever HTML they need to learn to make a comment is very limited and quite simple to grasp.

On 6: Who says Wiki markup is easier to learn than the subset of HTML required to post simple comments on a blog?

Anders Sandvig on May 14, 2008 12:49 PM

If you define what can be used, I'll be happy. If you just say that "HTML is allowed" and I have to guess which tags are disabled, that will be annoying.

Joseph on May 14, 2008 12:51 PM

Looks like you just dropped down to the lowest common denominator. Just because we all know HTML doesn't make it the best choice! Are you going to let us post CSS with our HTML? What attributes will be allowed - any IE specific ones? Where are you going to draw the line?

There, I've played Devil's Adv