I <3 Steve McConnell*
Coding Horror
programming and human factors
by Jeff Atwood

May 13, 2008

Is HTML a Humane Markup Language?

One of the things we're thinking about while building stackoverflow.com is how to let users style the questions and answers they're entering on the site. Nothing's decided at this point, but we definitely won't be giving users one of those friendly-but-irritating HTML GUI browser layout controls.

an example HTML GUI editor

I have one iron-clad design guide: this is a site for programmers, so they should be comfortable with basic markup. None of that nancy-boy GUI toolbar handholding nonsense for us, thankyouverymuch. If you can sling code, a little bit of presentation markup is child's play.

We will support some sort of markup language to style the questions and answers. But what markup language?

I mentioned in podcast #4 that we consider Wikipedia a defining influence. Let's see how Wikipedia handles markup syntax. This is what the edit page for Joel Spolsky's Wikipedia entry looks like:

Wikipedia Edit page for Joel Spolsky entry

It's an effective markup language, but I think you'll agree that it's more intimidating than humane. Wikipedia's How to Edit a Page and the accompanying Wikipedia syntax cheatsheet helps. Some. I'd argue that writing a Wikipedia entry is a step beyond mere presentational markup; it's almost like coding, as you weave the article into the Wikipedia gestalt. (Incidentally, if you haven't ever edited a Wikipedia article, you should. I consider it a rite of passage, a sort of internet merit badge for anyone who is serious about their online presence.)

Let's consider a simpler example. What we're looking for is some kind of middle ground, a humane text format. Let's start with some basic HTML.

Lightweight Markup Languages

According to Wikipedia:

A lightweight markup language is a markup language with a simple syntax, designed to be easy for a human to enter with a simple text editor, and easy to read in its raw form.

Some examples are:

  • Markdown
  • Textile
  • BBCode
  • Wikipedia

Markup should also extend to code:

10 PRINT "I ROCK AT BASIC!"
20 GOTO 10

Here's what that looks like expressed in a variety of lightweight markup languages. Bear in mind that each of these will produce HTML equivalent to the above.

Textile Markdown
h1. Lightweight Markup Languages

According to *Wikipedia*:

bq. A "lightweight markup language":http://is.gd/gns
is a markup language with a simple syntax, designed 
to be easy for a human to enter with a simple text 
editor, and easy to read in its raw form. 

Some examples are:

* Markdown
* Textile
* BBCode
* Wikipedia

Markup should also extend to _code_: 

pre. 10 PRINT "I ROCK AT BASIC!"
20 GOTO 10
Lightweight Markup Languages
============================

According to **Wikipedia**:

> A [lightweight markup language](http://is.gd/gns)
is a markup language with a simple syntax, designed 
to be easy for a human to enter with a simple text 
editor, and easy to read in its raw form. 

Some examples are:

* Markdown
* Textile
* BBCode
* Wikipedia

Markup should also extend to _code_: 

    10 PRINT "I ROCK AT BASIC!"
    20 GOTO 10
Wikipedia BBCode
==Lightweight Markup Languages==

According to '''Wikipedia''':

:A [[lightweight markup language]]
is a markup language with a simple syntax, designed 
to be easy for a human to enter with a simple text 
editor, and easy to read in its raw form. 

Some examples are:

* Markdown
* Textile
* BBCode
* Wikipedia

Markup should also extend to ''code'': 

<source lang=qbasic>
10 PRINT "I ROCK AT BASIC!"
20 GOTO 10
</source>
[size=150]Lightweight Markup Languages[/size]

According to [b]Wikipedia[/b]:

[quote]
A [url=http://is.gd/gns]lightweight markup language[/url]
is a markup language with a simple syntax, designed 
to be easy for a human to enter with a simple text 
editor, and easy to read in its raw form. 
[/quote]

Some examples are:

[list]
[*]Markdown
[*]Textile
[*]BBCode
[*]Wikipedia
[/list]

Markup should also extend to [i]code[/i]: 

[code]
10 PRINT "I ROCK AT BASIC!"
20 GOTO 10
[/code]

None of these lightweight markup languages are particularly difficult to understand -- and they're easy on the eyes, as promised. But I still had to look up the reference syntax for each one and map it to the HTML that I already know by heart. I also found them disturbingly close to "magic" for some of the formatting rules, to the point that I wished I could just write literal HTML and get exactly what I want without guessing how the parser is going to interpret my fake-plain-text.

Which leads directly to this question: why not just stick with what we already know and use HTML? This c2 wiki page titled Why Doesn't Wiki Do HTML? makes the case that -- at least for Wiki content -- you're better off leaving HTML behind:

  1. In a Wiki, the emphasis is on content, not presentation. Simple Wiki markup rules let people focus on expressing their ideas.
  2. Why not use a domain-specific markup language designed to do "the simplest thing that could possibly work"?
  3. Some HTML tags are difficult to work with and can break the flow of your thoughts. The table tag, for example.
  4. Does the average user really need total HTML and CSS layout power?
  5. Allowing the full range of HTML tags can lead to major security vulnerabilities.
  6. Many people don't know HTML. A simple Wiki markup language is easier to learn.

I'm not sure I agree with all of this, but it can make sense in the context of a full-blown Wiki. It's worth considering.

After all this research on humane markup languages, much to my chagrin, I've come full circle. I now no longer think humane markup languages make sense for most uses. I agree with the guy at fileformat.info -- HTML is generally the better choice:

  • Simplicity

    If the source and destination are the web, why not use the native markup language of the web?

  • Readability

    HTML is a bit less readable than the lightweight markup languages, it's true. But basic HTML is not onerous to read, particularly if we hide the repetitive paragraph tags.

  • Security

    With a bit of careful coding, it is possible to whitelist specific HTML tags that you will allow. This way you avoid exposing yourself to risky/vulnerable tags.

  • Conversion

    It's not at all clear that any existing lightweight markup language has critical mass, with the possible exception of Wikipedia's flavor. On the other hand, text parsers and tools will always understand HTML.

  • What people know

    A lot more people know HTML than any given flavor of humane text. If you're a programmer, you damn well better know HTML. For the handful of wiki-like functions we may need, it's possible to add some optional attributes to the HTML tags. And wouldn't that be easier to learn than some weird, pseudo-ASCII derivation of HTML?

I do think we'll adopt some of the cleverer functions of Textile and Markdown, insofar as they remove mundane HTML markup scutwork. But in general, I'd much rather rely on a subset of trusty old HTML than expend brain cells trying to remember the fake-HTML way to make something bold, or create a hyperlink. HTML isn't perfect, but it's an eminently reasonable humane markup language.

Posted by Jeff Atwood    View blog reactions
« Cleaning Your Display and Keyboard
Oh Yeah? Fork You! »
Comments

As others have said, if source-code is going to be included in messages inline, and I think this is a highly desirable feature, raw HTML is not a good choice, for three reasons:

1. You have to escape special characters, which means at the very least splattering and amp; (or should that be amp;lt; and amp;amp;?) everywhere.
2. You need explicit br or p tags. (Again, should I write br and p?)
3. You need painful contortions of nbsp; nbsp; to get indentation right. (amp;nbsp;?) It's annoying to read C++ without indentation, but it's generally impossible to try to guess what Python code with the indentation stripped out is supposed to be.

All of this makes it hard to paste in source code, and hard to edit it in-place. Even if you allow pre tags, it makes it pretty nasty to embed HTML code which might contain /pre somewhere.

Maybe you should go for 78 columns in a monospace font for everything. ;)

Weeble on May 14, 2008 8:14 AM

Meh. I vote for pre tags around the lot, and autolinking of urls. Plaintext is THE humane WYSIWYG markup language.

james on May 14, 2008 8:17 AM

I've done sites with comments allowed in HTML. It works good, the only issue I run into is that people like to do things to screw up your layout when they leave open table tags/divs (which you probably should then setup some system to make sure their tags are closed) and I've also ran into issues where spammers put in Javascript redirects or popups. So HTML isn't perfect either.

The traditionally [B]Bulletin Board[/B] format is also widely known so that it won't take a user looking up things. Or limit the HTML a user can use.

Kris on May 14, 2008 8:17 AM

Jeff,

If you want a fun challenge, figure out how to make the form input color-coded a'la Visual Studio or Expression Web. HTML is easier to read and write with all the blue and red tags.

Zack on May 14, 2008 8:20 AM

Flickr allows simple HTML tags such as:
a href=quot;URLquot;link/a
strongstrong/strong
bbold/b
blockquoteblockquote/blockquote
ememphasis/em
iitalic/i
img src=quot;URLquot;
uunderlined/u
sstrike/s
deldeleted/del

Ali Karbassi on May 14, 2008 8:23 AM

I read through maybe the first 15 comments which were mostly anti-HTML (to some extent), so I'll chime in some encouragement. HTML makes a _lot_ of sense for your purposes, and all these esoteric things are quite annoying in the end. (I have on two separate Wiki systems inadvertently created links to nonexistant pages just by using formatting marks that seemed innocuous at the time, for one example. In general, remembering _which_ fake-HTML the current textbox wants is the problem.)

I'm a big fan of just saying "these are the tags I want to allow," then maybe extending them with extra attributes or use cases as needed (e.g. a page="lightweight markup language"LWML/a or alightweight markup language/a). No need to have two syntaxes floating around (HTML + Markdown, I guess, is popular with lots of people.) Making sure the input comes out as well-formed XHTML is a solved problem, to be sure.

Domenic Denicola on May 14, 2008 8:36 AM

Wow, uh, your comment box strips out angle-bracketed phrases, instead of passing them through. Well, here's an ironic rephrasing of the first sentence of my second paragraph...

I'm a big fan of just saying "these are the tags I want to allow," then maybe extending them with extra attributes or use cases as needed (e.g. [a page="lightweight markup language"]LWML[/a] or [a]lightweight markup language[/a]).

Domenic Denicola on May 14, 2008 8:38 AM

I'm a fan of Markdown myself. It's easy to learn and already accepts a lot of conventions that pre-date HTML (like asterisks and underscores). I *know* HTML, but that doesn't mean I want to use it. In fact, HTML is so annoying to type, I would rather use a graphical editor and clean up any mistakes afterwords.

Also, I agree with those who said:

1. Not all programmers do any sort of markup. You should offer a graphical editor, and the option to turn it off.

2. If you offer any sort of HTML, it has to be a small subset.

Rhywun on May 14, 2008 8:42 AM

My websites use BBCode because the module I use for forums supports that. I never quite understood why BBCode because you end up using much of the same syntax as HTML except you use square brackets instead of angle brackets.

bBold Phrase/b
[b]Bold Phrase[/b]

What's the difference? Why create an entirely new syntax when one is already available and well documented?

Textile was written (and it isn't from troff!) with the idea that marked up text should be readable as plain text. Underline (now italicize) by putting underscores around something. Bold by putting asterisks around it. Make a list by putting asterisks in front of each line. Simple to understand, clean, and easy. Unfortunately, not very powerful.

I personally prefer to enter things in HTML. I know it, and I don't find it all that unreadable. What I really can't stand is each site having different standards. I don't mind learning something, but I hate learning to do the same thing dozens of different ways. HTML is standard and that's good enough for me.

My suggestion: A modified HTML. One where you don't put p for paragraphs breaks and things in the format of http://xxxx.yyy or xxxxx@yyyy.zzz are automatically linked. But at the same time allows you add a bit of HTML for the more complex stuff.

That way, can type your entire comment without a lick of markup code, but if you know you want to emphasize a word here or there or add a link, you know how to do it. That'll satisfy everyone.

David W. on May 14, 2008 8:43 AM

I was researching on the exact same topic today for my project and I've chosen markup specification from

http://www.wikicreole.org/

Particularly I liked their reasoning and father of wiki is behind that too I think.

lubos on May 14, 2008 8:47 AM

+1 markdown or wysiwyg.

XML derivatives were made for ease of parsing, not ease of use. The rule [Don't make me think] is superseded by [Don't make me do extra work]. Of course, optimally you'd just give a wysiwyg textbox with options to switch to markdown view. Just as 90% of your readership knows/should know HTML, 98% of your readership knows/should know how to use a wysiwyg text editor. Even if somehow 90% of your readers knew or should know emacs, it doesn't give you license to require knowledge of emacs commands at stackoverflow.

Of course, I completely understand if Joel completely overrode your objections to build a new markup language that cross-compiles to VBscript, javascript, PHP, XHTML, Markdown, ARM, SPARC, and is hot-pluggable as a Linux kernel module. Otherwise, I might have just heard 50 thousand heads exploding in the distance.

Jimmy on May 14, 2008 8:52 AM

The more I think about it the more I think a basic WYSIWYG editor is the only real way to go.

It requires minimal thought to use and allows you to properly support the various features needed for a useful coding site (e.g. press the "Insert Code Block" button and get prompted to select which language it is, so that syntax-colouring can be applied)

Graham Stewart on May 14, 2008 8:55 AM

I think it's a good idea to provide options. There's no guarantee that EVERY user of stackoverflow is going to be comfortable using HTML, especially if they just want to write a quick post. For example, even if you enforce HTML, then parse newlines as BR and P automatically; don't make me think.

Let users set up their markup preference in their profile, be it HTML, BBCode, Markdown, whatever.

I also agree that colored syntax for code blocks is a great idea, since the whole focus of this project is on code.

Erick on May 14, 2008 8:55 AM

First, why do you assume every programmer is familiar with HTML? Your site will appeal to a WIDE range of developers who may never have written HTML.

Second, HTML is very broad. Do you really want your users entering inline styles? Reusing your parent CSS classes? re-arranging your layout with relative and absolute positioning? You certainly don't want users to enter javascript of any kind.

Third, can you really whitelist HTML? Can you deal with all the clever XSS hacks? (a href="http://ha.ckers.org/xss.html"http://ha.ckers.org/xss.html/a). If so, you have crippled HTML to the point that it resembles lightweight markup, except your users won't know in advance which parts of HTML will work

I would love to offer a WYSIWYG editor + friendly editable markup that doesn't open up big XSS holes. If you make that work with HTML please let us know how you did it.

Mark Porter on May 14, 2008 8:56 AM

Can't we just type all our comments in Wasabi?

Martin Wallace on May 14, 2008 8:57 AM

Cool, let's make the users generate their own POST command too.

Enabling HTML editing is great, but requiring it for simple formatting just adds friction to the communication process. +1 for the GUI editor.

Kevin Dente on May 14, 2008 9:00 AM

I use Markdown. Clean syntax, particularly for linking, and it gives me the freedom to use HTML if I want. Works great for me.

Having done a fair amount of Wiki work, I absolutely hate how MediaWiki formats tables, though I find most of the rest of it's syntax at least tolerable.

Markdown is, in my opinion, the best compromise between light-weight formatting, and the raw power of HTML.

Jeff Craig on May 14, 2008 9:00 AM

Imagine you want code-coloring. So instead of

source lang=qbasic
10 PRINT "I ROCK AT BASIC!"
20 GOTO 10
/source

you have to write

pre
span class="codeLineNumber"10/span span class="codeStatement"PRINT/span span class="codeString"PRINT/span
span class="codeLineNumber"10/span span class="codeStatement"GOTO/span span class="codeNumber"10/span
/pre

?

That's ugly.

Matthias on May 14, 2008 9:02 AM

One benefit of using something like Markdown is you automatically get things like escaping and potentially code coloring, which is arguably a very important aspect for stack overflow. I personally use reStructuredText for most my HTML editing because it takes care of the HTML aspects for me such as escaping XML and coloring code.

Eric Larson on May 14, 2008 9:03 AM

WTF? do I have to use amp;lt;???

Imagine you want code-coloring. So instead of

source lang=qbasic
10 PRINT "I ROCK AT BASIC!"
20 GOTO 10
/source

you have to write

pre
span class="codeLineNumber"10/span span class="codeStatement"PRINT/span span class="codeString"PRINT/span
span class="codeLineNumber"10/span span class="codeStatement"GOTO/span span class="codeNumber"10/span
/pre

?

That's ugly.

Matthias on May 14, 2008 9:04 AM

We use MediaWiki in our internal Intranet, and we found that the Wiki Syntax is hard for non-technical users, but technical users usually "got it" after a week or so. I think it's one of the cleanest Syntax, because of it's headings (==), it's tables ({|bla) and it's lists (* ).
BBCode is a bad solution for a non-existant problem in my opinion, as it is essentially HTML with square brackets.

Bare HTML works fine, but keep in mind that there are multiple ways to do lists.
ul
liBla
liBlu
/ul

works, but without the closing /li tags, you are not XHTML Compliant anymore. You could either:
* Live with that
* Write a parser that tries to fix that, with all the bug testing and fixing that goes along with that
* Use another syntax

It should be noted that Wiki Syntax != Wiki Syntax. Pretty much every Wiki Software has it's own Syntax, that is not 100% compatible with other Wiki systems.

Markdown looks like my favorite: It exactly does what is needed, with an intuitive syntax.

Michael on May 14, 2008 9:10 AM

YES! HTML markup is king! Don't make us learn another markup language! Everyone who disagrees with you (and me) is crazy and/or an idiot!

Peter on May 14, 2008 9:12 AM

I'm less worried about bold and italic text than for code, I would love to see some code coloring (keywords like int in different color for example), but that's a lot of work, but it will be sweet.

Juan Zamudio on May 14, 2008 9:14 AM

I generally avoid using markup in posts to any website (other than a wiki) simply because I have no idea what they're using for markup unless I'm familiar with the particular bb software they're using or I hit one of those idiot buttons on the text entry box to see what pops up. I think it's generally a mistake not to include at least something to give people a reminder of what they can use on the site, if you want them to use it at all.

Vizeroth on May 14, 2008 9:14 AM

My 15 year old learned HTML for his MySpace page.

Charles on May 14, 2008 9:15 AM

Jeff, why you don't use the wiki technology of fogbugz?

Eduardo Diaz on May 14, 2008 9:16 AM

now i'm really confused, XML has angle bracket tax and HTML doesn't. not only that but, as I type this I get: "Your comments: (no HTML)". Hm?

:/

/mp

Mauricio Pastrana on May 14, 2008 9:16 AM

Sorry, Jeff. I have to call shenanigans.

In "The Angle Bracket Tax," you had fairly harsh words about working with tags within a human-read document. You pointed out how XML tags can degrade a document's readability, because they add extranneous noise around the text. You also envisioned an ideal world where the tags are hidden, created and managed in the background.

Fast forward today and you appear to say the exact opposite, only we're talking about HTML instead of XML. The loss in readability is now worth it because the layout becomes much more precise.

You were pining after interfaces that hide tags a few days ago. In the XML argument, you said "You might argue that XML was never intended to be human readable, that XML should be automagically generated via friendly tools behind the scenes, never exposed to a single living human eye. It's a spectacularly grand vision." If I replace the word "XML" with "HTML", your vision becomes reality, as there are countless WYSIWYG HTML editors on the market today. But today's post puts you firmly in the camp of inline markup editing.

Personally, I prefer inline editing to WYSIWYG, and XML over fancy, fuzzy markup replacement. I also think that XML is a wonderful way to facilitate communication among disparate systems. It may not have been the original intent, but sure as hell is an awesome side-benefit.

I think you agree with me, but first you need to clarify your position.

Frank on May 14, 2008 9:17 AM

This, written by Jeff just this week about XML. Somehow, I don't see how requiring HTML will escape this criticism either.

"Wouldn't it be nice to have easily readable, understandable data and configuration files, without all those sharp, pointy angle brackets jabbing you directly in your ever-lovin' eyeballs?"

K|O|G|I on May 14, 2008 9:18 AM

"If you can sling code, a little bit of presentation markup is child's play."

Clearly you don't play around on forums very much. You're delusional if you think your site will be primarily good programmers. It will be 10% good programmers and 90% noobz and script-kiddies like everywhere else.

Use BBCode, HTML, or whatever, but don't expect the users to understand it. Personally, I don't care - I must know 50 different markups used on different forums - you just figure it out, and if you're not smart enough, you don't.

Jasmine on May 14, 2008 9:18 AM

Yes, you should use HTML for stackoverflow. I'm not sure if it's the best choice for CMSs in general, but for programmers it is the better choice. While I think something like a href="http://haml.hamptoncatlin.com/"Haml/a would be fun and interesting, HTML would provide the perfect barrier to entry - not to easy and not to difficult. Like you said, programmers should know it.

It is extremely annoying when I enter a comment somewhere, include an HTML link, and the comment is rendered with the href value as a link and the other HTML converted to angle brackets and crap in the comment. It's made worse when I can't edit it.

Lance Fisher on May 14, 2008 9:19 AM

Too funny. Your argument in favor of ubiquity and convention was exactly my point against your argument yesterday in your anti-XML post.

dinah on May 14, 2008 9:23 AM

HTML is fine by me. If you don't know it, now is as good a time as any to learn it.

PaulG. on May 14, 2008 9:25 AM

Whatever mark-up you go for you should also allow HTML, if just to accommodate those nice IDE's which allow you to copy code as HTML (automatic highlighters suck).

[ICR] on May 14, 2008 9:28 AM

Seriously...how advanced comments to you normaly write on a forum?

I never, ever, use any more than these(bb-code).
[b]
[img]
[url] (often automaticly generated from correct urls)
[code]
[quote]

For these simple things html is overkill, first of all you would have to create a huge whitelist, the simple [b]-tag can be written in hundred different ways using html. A whitelist for css would be even harder to write, imagine parsing font-size:9999 etc.

Secondly, the code-tag usualy does server-side syntax highlighting, same thing with quote-tags, it can be used to link to the original message. Doing this with classic html-tags would be realy confusing.


Syntax highlighting is also a (the only) good reason why to use WYSWYG-editors, these usually(?) allow you to paste pre-formatted text directly from your IDE (At least the one on the msdn-forum does this even though that editor sucks in thousand other ways).

crazy ivan on May 14, 2008 9:31 AM

Jeff,

What you describe sounds very similar to what Dan Brettle has written in NeatHTML. Have you heard of this?

From Dan's description:
"NeatHtml™ is a highly-portable open source website component that displays untrusted content securely, efficiently, and accessibly. Untrusted content is any content that is not trusted by the website owner. Typical examples include blog comments, forum posts, or user pages on social networking sites. NeatHtml uses an “accept only known good” (whitelist) approach to security to help prevent attacks which are not yet known."

You can read more about it @ http://www.brettle.com/neathtml

I think he strikes it right on the nail. Allow use of HTML but keep it safe.

CyteShoppe on May 14, 2008 9:33 AM

I'm with you 100% on scrubbed HTML. It's easiest to implement /and/ explain ("You can use HTML.") Most novices already know HTML. It's like learning your ABCs these days.

If you don't know HTML, does that really matter? Seriously now. People can read plain text just as well. This will be a wiki fer chrissakes. If your plain text is /that/ much of an eyesore, the other 1337 HTML h4xx0rz can pretty it up.

Furthermore, you *learn* before you *teach*. If you don't know HTML, fine. You can learn it by, I dunno, /using the site/. Read the relevant HTML literature, which is sure to be present.

Chuck Rector on May 14, 2008 9:34 AM

My vote would go to HTML with a couple of minor extensions that would handle most comments with no markup at all. First, treat a blank line as an implicit paragraph boundary. Second, treat an unadorned URL as a link.

To avoid problems with parsing more complex HTML, these extensions should only be active at the root level and deactivated inside any open HTML tags.

Beyond this, any markup system you choose will require users to type something. HTML is much more widely understood than most of the alternative markup languages - especially amongst programmers.

Stephen C. Steel on May 14, 2008 9:34 AM

I think, as others have suggested, that you have stumbled upon some of the benefits of XML in thinking about markup that you seemed to overlook when discussing XML in the last entry. The existing tools and standards make very quick work of the types of things you want to do: white list certain tags, validate input, make sure it's well-formed, etc. Just write a simple XML Schema(or DTD or RELAX NG) and validate the input against it.

Mike Ivanov on May 14, 2008 9:34 AM

What about consistency? If you use something like Markdown, every title, list and emphasized text will look the same. If you allow HTML, you're going to have bold title, italic titles, different types of headers, maybe a few font tags and whatnot; all kinds of lists, and all kinds of emphasized texts.

I used to have my own markup language, but I switched to Markdown for all of my projects.

LKM on May 14, 2008 9:34 AM

Add another solid vote for Markdown. It mimics what I'd naturally do for formatting in a text-only document (except for the headings bit, but that's rarely needed in a QA forum in any case). Plus, if you can't think of how to do something, HTML syntax is fully allowed as well.

I don't know about you, but the main formatting I'll ever do in a page are:

*mildly* emphasized text
**strongly** emphasized text
* Bullet lists
1. Numbered lists

The only time I hit the Markdown manual after discovering it the first time was to confirm that it really was as easy and intuitive as that. And the headings bits :)

HTML is obviously the lingua franca of the web, but that doesn't make it easy to read. If I want to read content embedded in HTML, I put it into a browser. If I want to write content embedded in HTML, I write it in markdown (multimarkdown, actually, which is a minor variant on markdown) then paste the generated HTML into the web page. HTML is good for doing all the other ancilliary stuff around the content, but always gets in the way o the content itself.

Of the options presented here, Textile and Markdown are the most transparent markups, IMHO.

The only thing I'd add is that, please, as is obvious in the broken discussion about XML, make sure you just escape the HTML of unrecognized tags, not filter them out!

Tom Dibble on May 14, 2008 9:35 AM

Also, I'd strongly advise you to not do Yet Another Tweaked HTML Version. Going into a forum which speaks in HTML you have to read the manual every single time: are double-line-breaks automagically converted to p? Are stray tags automagically escaped? How does this particular site support quoting?

Going into a forum which speaks (ick!) BBCode the specifics are generally assured (although the quoting syntax changes inexplicably).

Going into a forum which speaks markdown or textile, and I know precisely what I'm getting.

Remember: your site will not likely be the only one people type in throughout their day. Make the experience adhere to a common standard. You users will thank you.

Tom Dibble on May 14, 2008 9:40 AM

What's wrong with plain old text? It's simple, it's easy, and there are established conventions (hello Usenet!) for *bold* and /italics/ and _underline_ (oh, and RAISED VOICE as well, mustn't forget that). Don't even need to make it pretty -- just about anyone with half a brain can parse such "markup" directly.

As for programmers being able to use HTML, well, yes, but that's a long way from _liking_ it. Besides, a decent programmer ought to be able to pick up a minimal markup language. A programmer that is put off by having to pick up something new isn't really much of a programmer, so the whole "programmers won't have a problem" argument is bogus.

If you really *do* want a good presentation language, use TeX. It's established, widely known, respected, and does a better job than HTML ever will....

SJS on May 14, 2008 9:45 AM

Still rebuilding Code Project, Jeff? Seriously, we have thousands of regular posters, some of whom have posted thousands of messages over the last nine years, and some who've posted tens or even hundreds of good articles. Yes, there's some dross and we're trying to filter out some of the crap before it gets posted; we're finding that traditional editors don't scale up to this level of activity, so allowing long-serving members to take a first pass on the article queue.

CP uses HTML for its articles and forum posts. Over the years the blacklist of allowed HTML has tightened as people have abused it, but generally the model has been 'trust the poster'. It may now have changed to a whitelist in the ASP.NET rewrite, I don't know.

Mike Dimmick on May 14, 2008 9:47 AM

I think you are making an assumption that everyone who goes to stackoverflow want's to comment knows HTML..

I am a DBA but read your site quite a bit and don't know a single command in HTML. I know many developers in large companies (I work for a bank) that never use HTML either so why not make it easy on all of us with a "friendly-but-irritating HTML GUI browser layout control".. but you can use html if you want to.

CHOICE IS GOOD!!

-jfc-

JFC on May 14, 2008 9:49 AM

OMG... are you building StackOverflow by commitee?

Just build it already!

j/k :)

Jonny on May 14, 2008 9:51 AM

"Presentation markup" is an oxymoron. Markup is for tagging content to capture meaning, not style. If you want to give users control over presentation, then you don't want a markup language, you want a formatting language.

Modern HTML is primarily a markup language. As HTML evolved there was a push to get away from presentation and back to pure markup. Bold and italic tags persist for compatibility reasons, but in an ideal world, they'd be history. HTML is the wrong choice for specifying presentation.

Most text doesn't require any styling. Fancy formatting can enhance text, but it shouldn't be necessary to express the idea. Perhaps the best solution is not to give the user any ability to control the presentation, hoping that they'll instead focus on the content. Barring that, I'd go for some very simple formatting conventions.

Adrian on May 14, 2008 9:54 AM

I support your idea of using HTML. A rather simple white-list of accepted tags (dismissing all others) and a help-page listing them all should be good enough. Instead of the typical WYSIWYG-toolbar you could provide a small list of allowed tags for quick reference on what works and what doesn't.

Additionally, you can provide predefined formatting styles via CSS classes and IDs that you allow people to use (with example on the aforementioned page on what they look like), again dismissing any other classes or IDs, as they are going to use inline-CSS anyway, if not just basic tags, hehe.

About the issue with annoying paragraph-tags, think of blogger.com; a carriage return in user input is transformed into a paragraph-tag and, when editing an entry, re-transformed into a carriage return. Now that is user-friendly. :)

Mephane on May 14, 2008 9:59 AM

Another recommendation for Markdown. As others have noted, the ability to include HTML is a huge bonus.

Also check out PHP Markdown Extra:

http://michelf.com/projects/php-markdown/extra/

some very nice additional features including Fenced Code Blocks which will be handy for stackoverflow.com.

Go on Jeff - give Markdown some prominence on stackoverflow.com - maybe it will start to gain critical mass. Your readers helped choose the name for the new site - maybe you could poll us for a choice of comment markup?

Tom A on May 14, 2008 10:01 AM

People should not have to write in HTML, but they should have the option to edit the HTML.

There are plenty of good WYSIWYG editors. Keep life simple.

Steve on May 14, 2008 10:04 AM

HTML shouldn't be used for anything like this. Lightweight markup languages let the users concentrate on the content and not on the syntax. Although this should be a programmers site, that doesn't mean one shouldn't care about usability. And by the way: I'm pretty sure, that there are some excellent C++ or Python programmers out there, who have little to no experience with HTML. Just my two cents.

Florian Potschka on May 14, 2008 10:09 AM

"Flickr allows simple HTML tags such as:
a href="URL"link/a
strongstrong/strong
bbold/b
blockquoteblockquote/blockquote
ememphasis/em
iitalic/i
img src="URL"
uunderlined/u
sstrike/s
deldeleted/del

Ali Karbassi on May 14, 2008 07:23 AM"

That, plus a '[code]' tag should be more than sufficient for everyones needs. Who the hell uses HTML tables in comments? And if you forget anything, you can just View - Source to remind yourself. ;-)

As for Wikipedia's markup:-- the only reason Wikipedia is so well organized is because of the constant layout editing of a few, hardcore users. Remember when the Wikipedia founder said that all those edits were for content, not layout? Meaning that around 500 users were inputting nearly all the information into Wikipedia! Hah...

P.S: doesn't having "orange" as a constant undermine the purpose of a CAPTCHA?

transciber on May 14, 2008 10:11 AM

I've got to add my voice to the "no HTML as default" crowd. I'm not a "real" developer, so maybe I'm outside your target demo, but I follow codinghorror pretty religiously, and several other developer-oriented sites as well.

HTML is just clunky. I don't see how 7 keystrokes (with repeated press/release on the shift key) to bold something vs. 1 ctrl-B keystroke is defensible. I can code it, sure, but I sure don't like to. (BBCode is scarcely better in this regard.)

Posting to forums is about speed, not precision. I can count the times I've needed to add a table to a forum post on one hand. But boldface? Italics? All the time. HTML makes me pay a hefty toll on the roads I drive every day to subsidize that bridge I only cross once in a blue moon.

Personally, I like Textile. It's based on the ersatz formatting people used for years in plain text e-mail, so it's pretty familiar. Spcifying link aliases is dead simple. And it's got the edge in speed. 2 unshifted keystrokes to bold or italicize text is a reasonable compromise in a plain text editor. And that's what I'm doing 90% of the time in a forum post. (I also like TiddlyWiki's code formatting token - three open braces on their own line to start, three close braces on their own line to end. Nice and quick.)

Jim Doria on May 14, 2008 10:12 AM

Could you please, please, *please* escape and instead of stripping them in the comments? Or at the very least place a reminder that you *do* strip them instead of just "no HTML"?

Adam on May 14, 2008 10:18 AM

P.S: doesn't having "orange" as a constant undermine the purpose of a CAPTCHA?

This has been asked a million times, and answered a million times. Go look it up!

Adam on May 14, 2008 10:19 AM

Just as with licenses (http://www.codinghorror.com/blog/archives/000833.html), just pick a markup language, any markup language. They are a necessary evil.

Hoffmann on May 14, 2008 10:23 AM

In making the case for HTML as a lingua franca, you're also making the case for using XML, something you disputed in a previous post, particularly as it relates to "Conversion" and "What People Know".

Graham on May 14, 2008 10:26 AM

One of the reasons I like Markdown so much is that you can mix it with HTML, and Markdown's parser doesn't puke.

If you choose to implement Markdown syntax verbatim, you can allow people to use a combination of Markdown and HTML with nearly no additional work over just allowing HTML.

Nifty, no?

Darren on May 14, 2008 10:33 AM

CodeProject has used HTML as an option on its forums for years. In fact, it's the *only* option if you want formatted text - the choices are HTML (escape everything yourself, with newlines and emoticons converted) or raw text (output is exactly what you type).

As you suspect, this is great for those of us who are comfortable with HTML. However, there are problems:
* Not all programmers know HTML, and those that do aren't always comfortable with it.
* It's verbose. I'm typing in a bullet list here - while i'm actually anal enough about formatting to take the time to enter the proper tags, it's nowhere near as fast as just indenting and typing asterisks.
* It makes simple things difficult. Not just bullet lists, but bold, italics, special characters like angle brackets and ampersands - pasting in code samples often requires a lot of escaping.
* It puts the responsibility for proper styling on the user. The syntax for a multi-line block of code and a single keyword are different in HTML. Which one will be used? Both? Neither? Good formatting is nice, but even among users who do have a working knowledge of HTML, expecting good semantic markup is often too much to ask.
* You can't use it raw anyway. Security concerns, broken HTML (mismatched tags, pasted in from Word or just sloppy typing, etc.), CSS that isn't compatible with the site layout... You pretty much need to have a good (== error-tolerant) HTML parser server-side.
* You aren't really accepting HTML documents anyway - you're accepting snippets, with strict rules on what's actually allowed. This, combined with the need to accept malformed markup, strip either all CSS or just the more dangerous styles, pretty much kills the idea that what a user can type in is the same, predictable, no-magic-involved HTML they might be using day-to-day.
* Verbose, unreliable linking. HTML links are very powerful, but if you expect most links to be to other resources *on the same site*, then they end up adding a lot of extra typing and unnecessarily fragile.
* Many of us have been trained by the many blogs and forums that don't deal well with HTML to avoid it - so even when new users *could* use HTML, many of us won't out of fear that our replies will be mangled. This isn't so much an argument against HTML as it is an argument *for* including a visible syntax reference or WYSIWYG editor.

There is one big advantage though, and you mentioned it already: the rules for processing *well-formed* HTML are reasonably stable and will likely remain that way for the foreseeable future. The extra work required from users for marking up aspects of their text *can* pay off then by removing a lot of ambiguity and help to keep things stable over time. Whether this is worth the tradeoffs is another matter; i suspect that a good editor can do a lot to reduce the stress and frustration of markup, escaping, etc.

Shog9 on May 14, 2008 10:39 AM

What happened to you being in favour of skin-ability like you argued for on the podcast? Markdown looks good to me though :)

Martin Clarke on May 14, 2008 10:43 AM

In the course of time, you will have to embrace the idea of having "friendly-but-irritating HTML GUI browser layout controls" because you will find out that many of your users are complaining.
Simplicity is the keyword I think. Just use some good HTML editor such as TinyMCE.
Let me tell you Jeff, most of the users(even if you think they should know HTML), will screw up the mark up.

Niyaz PK on May 14, 2008 10:44 AM

I kind of like [b]some bold text[/b] -style. Its easy and easier to remember because almost all use it. If I should try to remember something else, that would be more difficult. I should be able to select text and click B-button and the text becomes surrounded with [b][/b].

Silvercode on May 14, 2008 10:45 AM

It's worth pointing out that wikis, at least popular ones like Wikipedia, have a de facto division of labor. Some study found from the statistics that:

1) The majority of edits are very small (in terms of diff), and made by a small group of people who make a lot of small edits (call them editors).

2) The majority of content/words on the the wiki came from very large edits, made by a large group of people who contribute very rarely, often just once (call them contributors).

Essentially, wikis have the same writing process any other collaborative effort---encyclopedias, newspapers, etc. have. Contributors supply large, content rich, but less than perfect content, which is then swarmed upon by copy editors, fact checkers, decision making editors, etc. who polish it into finished form.

Since the people who will presumably do most of the formatting, cross linking, citation-adding and the like are probably a small core of people willing to put in significant time, the biggest challenge is how to make it easy for someone with useful specialist info to easily add content without much of a learning curve, even if the result is less than ideal.

Matthew L. on May 14, 2008 10:46 AM

Personally, I like Markdown but if your going for something simple, why not go all the way and just limit comments to plain text? What's the big deal with all the fancy formatting? People get by on newsgroups, email, chat and SMS just fine without special formatting.

Whatever you do support, I'm betting most people will not use the features so you're worrying about a feature that only the minority will ever use. Lack of formatting certainly hasn't been an issue with Coding Horror's comments.

Just a thought...

David Avraamides on May 14, 2008 10:49 AM

So if you go for HTML then how would you specify what language your code is?

You'll end up writing something like:

h1How to set a text on a labelh1
pHere is how to do this:/p
code lang="x-csharp"
/* Here is teh codez */
myLabel.Text = "Hello World";
/code

Yuk!

Graham Stewart on May 14, 2008 10:57 AM

interesting situation Jeff --

i think if you choose html you'll be trapped with that decision, because it's hard to convert from html to anything else.

if you start with markdown or textile, then you can store them as markdown or textile for now, then if you later change your mind, you can convert them to html.

i went with markdown on a skunkowrks text-adventure game i'm writing.

the amount of time taken from choosing markdown, to finding available c# implementation, to having it working in my project was literally thirty minutes.

best of luck
lb

secretGeek.net on May 14, 2008 11:06 AM

A dropdown list with "Textile", "Markdown", "HTML", etc. would do. Parse to HTML for storage. Let users choose the preferred markup in their profiles. Set textile default. This simple :).

alex on May 14, 2008 11:15 AM

REALLY agree.Those WYSIWTF editors screw up the markup even more than users. I don't even word processors.

Example: Type "hello world". Make "hello" bold. Delete world (and the space). Type some more "o"s after "hello" (making helloooo). Those extra o's may or may not be bold. I can never tell beforehand.

In a markup language, I always know if I have the cursor before or after the end tag.

Nicolas on May 14, 2008 11:15 AM

I'm less worried about bold and italic text than for code, I would love to see some code coloring (keywords like int in different color for example), but that's a lot of work, but it will be sweet.

Juan Zamudio on May 14, 2008 11:19 AM

Sorry for the double post.

Juan Zamudio on May 14, 2008 11:21 AM

Besides being a perfect example of the *wrong* approach to designing what should be an extremely simple web component (i.e. insisting people write markup to decorate their comments) - I mean, do you want your site to be open to beginners and those that want to learn about development? Guess not - anyway:

"Incidentally, if you haven't ever edited a Wikipedia article, you should. I consider it a rite of passage, a sort of internet merit badge for anyone who is serious about their online presence."

To me this is madness. You really want the sheep of this site to go and flock to Wikipedia *randomly* and without clear purpose, and edit some poor article? At least qualify it so people who might think your special Merit Badge is worth earning realize that they might affect others with this action.

SpongeJim on May 14, 2008 11:28 AM

Jeff,

Re "If the source and destination are the web, why not use the native markup language of the web?"... we invented higher level languages so that we *don't* have to write everything in the exact representation it's consumed. Compilers take our higher level code and translate it into machine code. How does "if the source and destination are CPU, why not use the native machine code of the CPU" sound to you? It sounds really obsolete to me.

Re "If you're a programmer, you damn well better know HTML", that's really myopic. How does "if you're a programmer, you damn well better know C" sound to you? You personally don't know C. I doubt that makes you less of a programmer by and of itself.

Anyway, I really hope you end up using some lightweight markup language (and thus adding more mass to it; nothing reaches "critical mass" without people supporting it beforehand) instead of HTML. You are making a community site at Stack Overflow so you have to stop thinking about *your* brain cells and start thinking about brain cells of *all* programmers that you would like to use your site. You want only web programmers? I wish you all the luck. But personally I was hoping that for less specific target group.

Ivan on May 14, 2008 11:30 AM

Of the four markups you presented, the only one that was readable
enough that I didn't have to refer back to the rendered version to
see what was going on was the BBcode. (For a couple I'm still not
sure how the first quoted section's end is delimited). But BBcode
is practically html with square brackets, so why bother?

I vote for BBCode. The reason for the square brackets is that it lets you quote HTML/XML code snippets without any effort. On a programming site, that's a win.

rblaa on May 14, 2008 11:31 AM

How about a WYSIWYG Silverlight or Flash control? You don't want search engines crawling that page anyway.

Zack on May 14, 2008 11:38 AM

@Zack: what about people who don't have Silverlight or Flash? There are plenty of pretty good HTML editors. TinyMCE is one of them.

Cristian Ciupitu on May 14, 2008 11:50 AM

I meant HTML editors written in HTML + JavaScript (that don't need Flash, ActiveX etc.).

Cristian Ciupitu on May 14, 2008 11:52 AM

As we are currently writing our personal blog, we have to decide which way to go as well. We prefer a lightweight markup language instead of writing our articles in HTML.

Getting used to the syntax is a process of writing one or two articles and you are familiar with it if you dont use different markup languages on different websites ;).

For user comments I second the idea of keeping it the most simple possible.

Martin Czura on May 14, 2008 11:53 AM

Even knowing html, I just find it painful to have to write all this markup. I don't understand what is so hard about Markdown? Do your users really need anything fancy?

Also, realize this about Markdown (and similar): it was not written as an easy-to-parse unambiguous markup language. It was written as a way to make writing formatted text *easy* and highly legible. You should not be stopped by some undetermined corners cases in the markup. When I write a comment, I am not writing code. These are 2 different things, and you should not apply the thinking of one to the other, I believe.

Just my 2 cents!

charles on May 14, 2008 11:55 AM

One day you rant about the "angle bracket tax" in XML and the next day you want a markup language based on... angle brackets? Have you become a fully fledged schizophrenic lately?

Chris Nahr on May 14, 2008 11:58 AM

Another vote for TinyMCE/FCKEditor with "View Source" enabled, ideally remembering which mode you used last, as Blogger does.

I'd also recommend an "automatic line breaks" checkbox for source view; even Slashdot does autobreaks by default now in its Ajax comment form.

Braden on May 14, 2008 12:01 PM

Tangentially, I highly recommend HTML Purifier ( http://htmlpurifier.org ).

Braden on May 14, 2008 12:07 PM

HTML is a data format that works especifically with data, it doesn't care about the format or the actual rendering or handling of this data, it only describes it in a way the browser decides to handle. At least originally.
You'd strip away most features that you don't need, you wouldn't have CSS in it, because HTML is about data (words) and saying what to do with it, CSS cares about how you do it, but that's what the website would do. I can also think of a few tags that are downright annoying (if you remember the horror of 90s websites you know the pain of blink and marquee).
And as well you might remove tags that don't make sense in the context, say DIV doesn't make much sense in a forum comment, or remove attributes that aren't necesary.
You could also add more tags and attributes especific to your job.
To give an example, say wikipedia cross-referencing
a cref="Red Power Ranger"Jason Lee Scott/a
It might have it's problems, but most of those would be abuses of the system, and those are fixed by the community, as wikipedia has shown. The important thing is that when I read that I realize I'm making a link, and that it's not a normal link, but a cross-reference link.

Now for those who complain about the readability of a:
A bit of trivia that might be wrong (can't find a source that says it outright). a originally stands for "anchor". You'd put an anchor on the text between a tags, and could "link" that anchor to other anchors or HTMLs through a "hyper-reference" (href). It doesn't make as much sense now because links have become much more powerful. Also it's not that human readable, but we are talking about the days where 28kbps was blazing fast, anchor is 5 bytes heavier than a, and I'm not talking about hyper-reference or anything of the sort. So really it's one of those things that happens when you grab a very specific language and use it for something it wasn't meant to handle (applications and dynamic web-sites instead of powerful hyperlinked documents).

Charlie Lobo on May 14, 2008 12:11 PM

Seems to me the conclusion you've come to is that a WYSIWYG HTML editor is what is required. No weird code and wiki functionality a button-click away. And most allow you to drop into the unadorned HTML.

Does MS-Word make you type codes? No, the focus is the content. This seems like a no-brainer to me.

And not every programmer knows HTML. Some do systems/device programming or windows forms applications, exclusively. Gasp! I know.

Robert Barth on May 14, 2008 12:16 PM

the link for 'why doesn't wiki do html' is broken, it should be http://c2.com/cgi/wiki?WhyDoesntWikiDoHtml

Wilfred on May 14, 2008 12:22 PM

I agree with your decision. I've had this fight with clients and former bosses so many times. Whether good programmers should know HTML or not is irrelevant; the fact is, a larger percentage of your readership knows HTML than knows any other markup language, I can assure you. You please the most, and the rest will have to catch up.

I would suggest adding some classes/ids for use in markup, though. Perhaps list styles etc. These can be documented briefly on the site, and anyone who knows HTML will know how to use them.

Someone mentioned the benefits of having an abbreviated link tag so that you didn't have to remember how to type an entire URI to a page; but if you plan your URIs well and use some URI mapping/rewrite magic, this shouldn't be an issue; URIs will be simple enough to remember or paste with little fuss.

Lucas Oman on May 14, 2008 12:41 PM

It's not that writing HTML for your post is hard (it's not) or that it takes a lot of time (it doesn't). You need a WYSIWYG editor for the site beacuse it forces you to focus on the content and not the presentation. Also, this allows you to more easily apply a consistent style across the website.

Jim Greco on May 14, 2008 12:58 PM

I have to throw my weight behind Markdown (as so many of the above posters have done). Of the examples you showed it is the least verbose (wikipedia's format is horrible in my opinion, extremely verbose)

Writing html in my text editor is fine, but I don't really relish the idea of inputting straight html into a web form. If I post in your community site, I'm not trying to write my own page, just trying to enter some of my thoughts.

As one of the previous posters mentioned, Markdown supports straight html, so if they are right, then Markdown seems a flexible option.

Or I can put it another way: I would feel less inclined to post on stack overflow if I had to write html to do it. Markdown on the other hand, wouldn't bother me.

Justin Standard on May 14, 2008 1:19 PM

If you did use html, could you also have a standard no frills text only mode? I hate using break and paragraph tags when an "enter" would do nicely.

brian on May 14, 2008 1:31 PM

I agree that HTML should be allowed in forms, but the problem is XSS. When you come up with a really good way to allow XHTML (attributes, too!) and prevent XSS in a bulletproof manor, please do share. I've been wanting a solution for this problem for quite some time. I even asked Haacked to explain how he does it in Subtext quite a while ago (I'm not even sure how effective it is in Subtext). While he agreed it would make an interesting blog post, he evidently does not have the time to put it together (which I completely understand).

While I am on the topic, this is the only PHP "solution" I could find: http://shiflett.org/blog/2007/mar/allowing-html-and-preventing-xss

Josh Stodola on May 14, 2008 1:43 PM

On 4: No, they don't. Which is why HTML is a reasonable choice, since whatever HTML they need to learn to make a comment is very limited and quite simple to grasp.

On 6: Who says Wiki markup is easier to learn than the subset of HTML required to post simple comments on a blog?

Anders Sandvig on May 14, 2008 1:49 PM

If you define what can be used, I'll be happy. If you just say that "HTML is allowed" and I have to guess which tags are disabled, that will be annoying.

Joseph on May 14, 2008 1:51 PM

Looks like you just dropped down to the lowest common denominator. Just because we all know HTML doesn't make it the best choice! Are you going to let us post CSS with our HTML? What attributes will be allowed - any IE specific ones? Where are you going to draw the line?

There, I've played Devil's Advocate and posted the counter-argument first. I hate HTML, but I think it's the best choice for stackoverflow.com due to the reader-base.

Rick on May 14, 2008 1:56 PM

I agree!!
Just use HTML!!!

If you cannot code a bit of HTML then you're in the wrong business...

Jonathan on May 15, 2008 2:04 AM

HTML is a fine markup language but It does have several flows that I think make it unfit for stack overflow:

1. If I want to type in a question or answer in English than I don’t want to mess about with the p tag, the br tag or nbsp, I just want to write it in plain text and have it saved with all whitespace intact (for example, I didn’t use angle brackets or the ampersand character in this paragraph because I don’t know how your blog software handles it).

2. Since it’s a programming forum there is a good chance the answer will contain some XML, typing ampersand, g, t, semicolon to start a tag is both tedious and will prevent me from proofreading my text.

3. And then there’s code, just try to type a medium length code block in HTML, you have to think about whitespace, you have to take care to replace some operators with HTML entities and if you copy-paste from an IDE you will either get miss-formatted plain text or HTML with more syntax highlighting markup then a forum should support.

You should make the forum easy to use, not but roadblocks in front of people trying to post questions and answers, formatting code in HTML (and, let’s face it, formatting in HTML in general) is too much work and not a good use of the posters time.

And also, not every programmer today works on web applications and a lot of programmers don’t know HTML well enough to format code.

I would go with plain-text and taking care to preserve white-space, maybe with automatic turning Urls into links like Joel’s forums, everything else will just get in the way.

NirD on May 15, 2008 2:25 AM

Absolutely! - limit HTML tags to the most basic units that you want to allow, and let people get on with things. I mean, is the bold b/b tag that much more difficult to remember than psuedo html [b][/b].

It's about time someone did a sanity check on pseudo-html in forums/blog software etc.

Goatslayer on May 15, 2008 2:29 AM

Wiki syntax. Ugh. I hate wiki syntax with a passion.

I've still yet to meet a person who has an easier time understanding wikisyntax vs HTML.

http://internetducttape.com/2007/09/12/wiki-mistakes-building-wikis-that-dont-suck/

engtech on May 15, 2008 2:40 AM

Also, you might want to look at the kind of software ecosystem that has risen around pasties (eg: http://pastie.caboo.se ). There should be a REST API for posting up code so that people can write plugins for their text editors, shell scripts for the console, etc.

engtech on May 15, 2008 2:53 AM

I thought about this problem a while back and I reached the conclusion that regular HTML + Tidy + stripping end of lines can keep you away from most of the problems in security category.

HN on May 15, 2008 2:54 AM

If you can type plain text and it is not reformatted reinterpreted or mangled then fine

If you can type code without the same and without having to escape characters or specially mark it then fine

HTML is a bad choice since it mangles plain text so you have to think about the text you are typing and not just the content!

Either use plain text only, or a minimal markup language (e.g. bbCode, Wiki) with known restrictions and a very simple syntax, or go for a full blown formatting language (e.g. TeX)

The comment above about Python "it's generally impossible to try to guess what Python code with the indentation stripped out is supposed to be." sums up my dislike of Python ....

Jaster on May 15, 2008 3:20 AM

@Jeff:
"I find that b, i, code, pre, blockquote, li, etc are simple to use and don't obscure the underlying content."

Mmmmm.... hardly obscured at all...

pre lang="x-csharp"
foreach (KeyValuePairamp;lt;int,stringamp;gt; kvp in messages)
{
if ( kvp.Key amp;gt; 0 amp;amp;amp;amp; kvp.Key amp;lt; 10 )
Debug.WriteLine(kvp.Key + amp;quot;-amp;amp;gt;amp;quot; + kvp.Value);
}
/pre

Assuming you want a minimal barrier to pasting in source code (absolutely essential in my opinion) then you'd have to automatically handle that HTML-encoding for us.

That would be better, but then how do you tell the difference between b when someone is trying to bold a line of code and b literally appearing as a generic-type or in some HTML/XML source code?

No, any way that you do HTML input it is going to involve character-escaping.
WYSIWYG is the only sensible way to go in my opinion.

Also I notice that you didn't include img on that list. So no way to illustrate articles with useful images then? (I'm thinking class/architecture/sequence diagrams, UI images, etc).
Is stackoverflow purely intended for keybashers or is it for engineers as well?

Graham Stewart on May 15, 2008 3:23 AM

«Back | More comments»

The comments to this entry are closed.

Content (c) 2012 . Logo image used with permission of the author. (c) 1993 Steven C. McConnell. All Rights Reserved.