Is HTML a Humane Markup Language?

May 13, 2008

One of the things we're thinking about while building stackoverflow.com is how to let users style the questions and answers they're entering on the site. Nothing's decided at this point, but we definitely won't be giving users one of those friendly-but-irritating HTML GUI browser layout controls.

an example HTML GUI editor

I have one iron-clad design guide: this is a site for programmers, so they should be comfortable with basic markup. None of that nancy-boy GUI toolbar handholding nonsense for us, thankyouverymuch. If you can sling code, a little bit of presentation markup is child's play.

We will support some sort of markup language to style the questions and answers. But what markup language?

I mentioned in podcast #4 that we consider Wikipedia a defining influence. Let's see how Wikipedia handles markup syntax. This is what the edit page for Joel Spolsky's Wikipedia entry looks like:

Wikipedia Edit page for Joel Spolsky entry

It's an effective markup language, but I think you'll agree that it's more intimidating than humane. Wikipedia's How to Edit a Page and the accompanying Wikipedia syntax cheatsheet helps. Some. I'd argue that writing a Wikipedia entry is a step beyond mere presentational markup; it's almost like coding, as you weave the article into the Wikipedia gestalt. (Incidentally, if you haven't ever edited a Wikipedia article, you should. I consider it a rite of passage, a sort of internet merit badge for anyone who is serious about their online presence.)

Let's consider a simpler example. What we're looking for is some kind of middle ground, a humane text format. Let's start with some basic HTML.

Lightweight Markup Languages

According to Wikipedia:

A lightweight markup language is a markup language with a simple syntax, designed to be easy for a human to enter with a simple text editor, and easy to read in its raw form.

Some examples are:

  • Markdown
  • Textile
  • BBCode
  • Wikipedia

Markup should also extend to code:

10 PRINT "I ROCK AT BASIC!"
20 GOTO 10

Here's what that looks like expressed in a variety of lightweight markup languages. Bear in mind that each of these will produce HTML equivalent to the above.

Textile Markdown
h1. Lightweight Markup Languages

According to *Wikipedia*:

bq. A "lightweight markup language":http://is.gd/gns
is a markup language with a simple syntax, designed 
to be easy for a human to enter with a simple text 
editor, and easy to read in its raw form. 

Some examples are:

* Markdown
* Textile
* BBCode
* Wikipedia

Markup should also extend to _code_: 

pre. 10 PRINT "I ROCK AT BASIC!"
20 GOTO 10
Lightweight Markup Languages
============================

According to **Wikipedia**:

> A [lightweight markup language](http://is.gd/gns)
is a markup language with a simple syntax, designed 
to be easy for a human to enter with a simple text 
editor, and easy to read in its raw form. 

Some examples are:

* Markdown
* Textile
* BBCode
* Wikipedia

Markup should also extend to _code_: 

    10 PRINT "I ROCK AT BASIC!"
    20 GOTO 10
Wikipedia BBCode
==Lightweight Markup Languages==

According to '''Wikipedia''':

:A [[lightweight markup language]]
is a markup language with a simple syntax, designed 
to be easy for a human to enter with a simple text 
editor, and easy to read in its raw form. 

Some examples are:

* Markdown
* Textile
* BBCode
* Wikipedia

Markup should also extend to ''code'': 

<source lang=qbasic>
10 PRINT "I ROCK AT BASIC!"
20 GOTO 10
</source>
[size=150]Lightweight Markup Languages[/size]

According to [b]Wikipedia[/b]:

[quote]
A [url=http://is.gd/gns]lightweight markup language[/url]
is a markup language with a simple syntax, designed 
to be easy for a human to enter with a simple text 
editor, and easy to read in its raw form. 
[/quote]

Some examples are:

[list]
[*]Markdown
[*]Textile
[*]BBCode
[*]Wikipedia
[/list]

Markup should also extend to [i]code[/i]: 

[code]
10 PRINT "I ROCK AT BASIC!"
20 GOTO 10
[/code]

None of these lightweight markup languages are particularly difficult to understand -- and they're easy on the eyes, as promised. But I still had to look up the reference syntax for each one and map it to the HTML that I already know by heart. I also found them disturbingly close to "magic" for some of the formatting rules, to the point that I wished I could just write literal HTML and get exactly what I want without guessing how the parser is going to interpret my fake-plain-text.

Which leads directly to this question: why not just stick with what we already know and use HTML? This c2 wiki page titled Why Doesn't Wiki Do HTML? makes the case that -- at least for Wiki content -- you're better off leaving HTML behind:

  1. In a Wiki, the emphasis is on content, not presentation. Simple Wiki markup rules let people focus on expressing their ideas.
  2. Why not use a domain-specific markup language designed to do "the simplest thing that could possibly work"?
  3. Some HTML tags are difficult to work with and can break the flow of your thoughts. The table tag, for example.
  4. Does the average user really need total HTML and CSS layout power?
  5. Allowing the full range of HTML tags can lead to major security vulnerabilities.
  6. Many people don't know HTML. A simple Wiki markup language is easier to learn.

I'm not sure I agree with all of this, but it can make sense in the context of a full-blown Wiki. It's worth considering.

After all this research on humane markup languages, much to my chagrin, I've come full circle. I now no longer think humane markup languages make sense for most uses. I agree with the guy at fileformat.info -- HTML is generally the better choice:

  • Simplicity

    If the source and destination are the web, why not use the native markup language of the web?

  • Readability

    HTML is a bit less readable than the lightweight markup languages, it's true. But basic HTML is not onerous to read, particularly if we hide the repetitive paragraph tags.

  • Security

    With a bit of careful coding, it is possible to whitelist specific HTML tags that you will allow. This way you avoid exposing yourself to risky/vulnerable tags.

  • Conversion

    It's not at all clear that any existing lightweight markup language has critical mass, with the possible exception of Wikipedia's flavor. On the other hand, text parsers and tools will always understand HTML.

  • What people know

    A lot more people know HTML than any given flavor of humane text. If you're a programmer, you damn well better know HTML. For the handful of wiki-like functions we may need, it's possible to add some optional attributes to the HTML tags. And wouldn't that be easier to learn than some weird, pseudo-ASCII derivation of HTML?

I do think we'll adopt some of the cleverer functions of Textile and Markdown, insofar as they remove mundane HTML markup scutwork. But in general, I'd much rather rely on a subset of trusty old HTML than expend brain cells trying to remember the fake-HTML way to make something bold, or create a hyperlink. HTML isn't perfect, but it's an eminently reasonable humane markup language.

Posted by Jeff Atwood
295 Comments

Also consider that b and i are not the "right way" to bold and italic text in HTML.
So stackoverflow will be effectively endorsing bad practise before anyone even writes an article.

Graham Stewart on May 15, 2008 3:28 AM

I agree with your article. I'm too lazy to learn all the other lightweight markup languages.

Eng Lee on May 15, 2008 3:42 AM

Don't forget to make the textarea vi'esque, also. Please.

mike on May 15, 2008 3:50 AM

No way around it: provide an UI to format text close the one ms-word provides. Strip down the formating options to the ones absolutely needed for the site, and use some lightweight JavaScript text editor.

When producing quality content it's a bad idea to make ppl switch "languages" used to write down their thoughts.

Pointernil on May 15, 2008 3:56 AM

As long as GUI is provided in the editor, I will not care what is being used under the hood :).

So your design decision should focus on bettering the GUI toolbar, so that people don't have to resolve to HTML to write anything. The underlying details are futile. Even a wiki like syntax will do, but do not expect people to learn new languages just for writing in their views.

deepank on May 15, 2008 5:02 AM

Jeff, could you please clarify if you will use some sort of graphical interface?
I have nothing against HTML, but using it as the only way to format a post is not a good idea. First it's quite ridiculous to asume every programmer knows HTML. I know several with no HTML-knowledge at all, and I only know HTML because I worked two years as a PHP programmer.
Second, even if you know HTML, it's not sure that you know the needed tag. For example, I've never used i or pre before.
And how do you define which language a pasted code-snippet is? I think that's quite important for correct highlighting.

So please add a way to layout your post without knowledge of your html-dialect!

prengel on May 15, 2008 5:37 AM

I've always been a fan of Textile for writing blog entries, since it's so similar to the posting style I have for email and the good old Fidonet days.

Having to pause to write long HTML tags seriously disturbs my flow of thought when I'm writing a blog entry. Textile works much better and doesn't get in the way of my thinking.

Johan Svensson on May 15, 2008 6:04 AM

My two cents: DON'T USE TEXTILE

We (unfortunately) use Textile on one of our web applications (Sassins.com) and it causes some big problems. Namely, when someone writes text like "I -generally- use dashes", they really didn't want to strike out the words between the dashes. That's the problem with Textile.

It's happened a bunch of times on our website.

Also, thanks for the blog, Jeff! I enjoy it.

David Grayson on May 15, 2008 6:21 AM

Can I suggest, again, that the problem here is in the links? On everything else it really doesn't matter. Pick the one with the easiest links. BBCode is pretty straightforward, in my opinion, when it comes to links. Textile requires me to re-look up exactly what order things go in because it is exactly opposite of what I expect it to be. Markdown is similarly confused. BBCode is basically HTML but lighter and without all of the tag/property/class nonsense that you're not going to want to allow in your comments anyway. Wiki code wouldn't be bad, but unless you're talking about just internal links, I don't see it as being as intuitive as BBCode.

BBCode!! do it! you are convinced!!

Shmork on May 15, 2008 6:47 AM

If you are going to make us use HTML at the very least give us an HTML GUI to help with tables and such.


Also, you missed one thing about Wiki Markup (or other simplified markup) versus HTML: Because your choices are more limited it makes pages look consistent.

If you let everyone willy-nilly toss in whatever HTML they like, your site WILL be ugly because all of the pages/articles will look different.

Rickasaurus on May 15, 2008 7:23 AM

About six months ago I wrote some thoughts of my own on the subject (http://jamesmckay.net/2007/11/is-it-time-to-kill-off-wikitext/), and I came to the conclusion that it was probably time for wikitext and markup languages to make way for rich text editors.

I still think that rich text editors could be more widely used on wikis and blogs and the like, but I must admit that there are some things that rich editors simply can't handle. Source code is the primary culprit. Finding a decent way of inputting source code on my own blog has been the source of a few headaches and in the end I resorted to writing my own WordPress plugin to handle it, and turning off the built in rich text editor entirely.

I somehow wonder if the two approaches could be combined with a bit of JavaScript jiggery-pokery. I'm quite impressed with the way it's done on the ASP.NET forums: you normally use a rich text editor, but you can click on a toolbar button to enter some plain text source code, and it somehow gets preserved when you post your response. It isn't perfect of course -- my main gripe is that there is no obvious way of telling where the source code blocks start and end -- but it's certainly an idea that's worth looking into.

Out of the four markup languages that you mention, personally I think wikitext has the cleanest syntax. I'd avoid BBCode in particular -- it is *very* heavily used by spammers -- you can block about 80% of spam messages simply by checking for URLs in BBCode format.

I'd also disagree with your assessment of considering HTML to be more secure than the others. Yes you can have a whitelist, but as with any of the other formats, you need to be very careful with your parser to avoid canonicalisation vulnerabilities from jiggery-pokery with encodings and the like.

James McKay on May 15, 2008 7:32 AM

why not use wysiwyg editor like FCKEditor? you can write html source in it or use word like editor (which hides tags completly and is good for people who dont know or dont care about html).

mart on May 15, 2008 8:33 AM

Why don't the rich text editors just support the html CODE / or PRE / tags?

Sound to me like that would be a better way to go.

Chris Lively on May 15, 2008 8:35 AM

I don't really care what choice you make as long as:

You have a live preview of the comment.

You have a clear list of the approved codes with examples.

Code doesn't wrap.

I can create the comment in my text editor and then paste it in the comment area.

CuriousRustColoredApe on May 15, 2008 8:52 AM

I have had the exact same thoughts for years now!
At the Uni I work at, almost all people who get degrees there need to do Foundation Computing, which includes HTML. You've therefore got all coders, almost anyone with a degree, and anyone else that has had to learn it before (i personally had to learn HTML when I was 15 so i could edit a myspace-like page I had back in 2000). Whereas for any wiki language, you only have users of that particular site who will know it.

XTremeEd on May 15, 2008 9:02 AM

Pick something that works and go with it. I'm guessing that sooner rather than later, you're going to have more important things to spend your time on as you ramp up your company.

Bruce on May 15, 2008 9:48 AM

As there are already over a hundred comments before mine, this is pointless, but I'd definitely cast a vote in favor of a subset of HTML, and ideally a good RTE that can optionally be used. (The YUI rich text editor is a great choice. Those guys are maniacal about cross browser support.)

If you do support HTML, take a look at the way that Wordpress formats blog entries and comments. Line-breaks and paragraphs are inserted intelligently, "bad" things are stripped out, and everything else Just Works.

Isaac Z. Schlueter on May 15, 2008 10:26 AM

Again I repeat:
It's not about using HTML v2.0 as the w3 defines it. It's about using the HTML conventions of coding. Think about how many languages follow certain conventions, such as calling functions name(arguments) and arrays name[index]. And since the underlying code is HTML you want to show the best possible way how the transformation occurs.

Now some problems I though about a bit: say that we have a code tag. No say that there is an article that explains how to use the code tag, how would you write the next
"
code
... (your code)
/code
"
I mean as literally that, it would code something like
code
code
... (your code)
/code
/code

How would it know where the code ends?
But using a purely WYSIWYG editor doesn't solve it. A person could still "inject" malicious code, and make things like the above happen. Also the biggest problem is that "What You See is What You Get" != "What You See Is What You Want" which maybe should be more important. As programmers we'd feel a GUI as a limitation, like suddenly having to wear diapers to your high school graduation, it'd be not cool (unless that's your kinda stuff).
Using both doesn't solve anything either, it makes it worse, WYSIWYG editors interpret code one way, code is badly written if not by hand, etc. etc. etc.
It'd be easier to just have coding and a preview (maybe as you go) of the code. A little guide on the side, I mean the target audience is programmers, it's ok to expect them to express their ideas better as a code than as a bunch of click and selections on their code.

Charlie Lobo on May 15, 2008 10:56 AM

if you'd pick HTML I think you'll also need to study some more about XSS too, particularly if you have features that allow user to input HTML. I think they often make easy target for an XSS attack.

I've seen numerous pro-coders left an XSS hole open. That's why XSS and phishing is so darn popular. Fire up any security audit tools and every one of 'em would ofter XSS scans for a trial.

just wants you to be uber-careful with accepting HTML, that is...

chakrit on May 15, 2008 11:56 AM

I hate Markdown. The link syntax is the worst I've ever seen.

I still prefer BBcode for all of my formatting. It's too bad that not many things other than forums support it.

atomicthumbs on May 15, 2008 12:14 PM

One huge benefit of (most) lightweight markup languages is that they meet the behavior most people expect (or assume...) much better, e.g. they don't strip out whitespace or ignore (i.e. not link) raw URLs or interpret angled brackets etc. On the other hand some markup languages get too smart and start doing unexpected things...

So a suitable markup language for commenting on a public site (IMHO) handles normal, email-style formatting gracefully, provides a convenient shorthand for common, non-basic formatting, and allow (a subset of) HTML to be embedded (if you really need to be flexible).

Eric Jain on May 15, 2008 12:18 PM

Thank you for the great comments and links! I have a browser instance full of links I'll be exploring now. Yes, I do read all the comments, as I said in Podcast #1. :)

Also, I apologize for the way this old Movable Type system strips out content between HTML tags. I should upgrade one of these days.

live comment preview system

Absolutely. I think this is essential. And incidentally, an HTML-subset makes this pretty easy to achieve through JavaScript..

Your argument in favor of ubiquity and convention was exactly my point against your argument yesterday in your anti-XML post.

http://www.dehora.net/journal/2008/05/15/blubml/#comments

I realize there is some cognitive dissonance here.

I fully set out *intending* to pick one of the lightweight markup languages (Textile, Markdown, BBCode, Wikipedia, etc), but after struggling to understand their rules and peculiarities, I couldn't get past the ease and ubiquity of HTML. I kept coming back to it. I can't say that about XML after working with the alternatives. XML is everything and nothing; HTML is a very clearly defined set of tags that do very specific things.

The key words, though, are "subset of .. HTML" -- along with inferring the paragraph tag. I find that b, i, code, pre, blockquote, li, etc are simple to use and don't obscure the underlying content.

Jeff Atwood on May 15, 2008 1:43 PM

LiveJournal took that approach, actually. Their in-page editor has two modes: "just HTML" and "convenient HTML". Both modes limit the HTML tags allowed to a list of about 20 or so. The "just HTML" mode does nothing to the HTML typed except strip out disallowed tags. The "convenient HTML" mode only does a couple of "so common you don't want to have to keep typing them over and over again" things, like replacing \n characters with br/ tags. They added a couple of their own convenience tags for commonly-desired functionality (lj-cut, to collapse long / annoying / spoiler sections of text; lj user="" to reference another LJ user; etc). Pretty simple, but works absolutely _great_.

Thought on May 15, 2008 1:57 PM

One thing not touched upon is what format you actually store this user inputted information. Important, if you change your choice of markup language at later date due to user feed back or security concerns.

ian_scho on May 16, 2008 2:29 AM

@Shmork:
"BBCode is basically HTML but lighter and without all of the tag/property/class nonsense"

So if you don't have attributes then how do you specify what language your source code is in BBCode?
Surely you'd need something like [code language="csharp"]?
Knowing the language will be essential if we want syntax-highlighting.
Looking at the four humane samples from Jeff, only the Wikipedia-format apparently supported that.

@Charlie Lobo:
"As programmers we'd feel a GUI as a limitation"

I'm not sure how only having bold, italic, code and pre HTML tags is any less limiting than being forced to use a GUI. And as many others point out, limitation is a good thing the articles will have a more consistent layout.

Graham Stewart on May 16, 2008 2:50 AM

What's wrong with a friendly toolbar? Just because we're programmers doesn't mean we actually enjoy having to try to figure out where the help link is so we can learn what weird variant of 'make this word bold' markup you've picked.

Why are they irritating? Highlight text, click 'bold', job done.

Making me think to achieve something so trivial is irritating.

@Charlie Lobo
As programmers we'd feel a GUI as a limitation, like suddenly having
to wear diapers to your high school graduation, it'd be not cool
(unless that's your kinda stuff).

What are we, persistent masochists? This isn't a programming text editor religious war - it's a way of putting a comment on a website.

izb on May 16, 2008 5:09 AM

"On a programming website it is entirely possible
that someone might want to actually post some HTML
or XML. Or even run into issues with code in other
languages: e.g. "if (ab) "

True. So you have to be able to turn HTML off. Preferably, you should be able to turn it on if you want it.

"On the other hand, the only thing people really *need* is links."

"I would have thought that on a programming website
the only thing people need is the ability to post
their example source code without it getting mangled."

I'm getting pedantic here, but given a choice between posting code with no links allowed, versus posting links to code plus whatever other links I want, I'd take the second.

But there's no reason to face that choice so it's silly of me to argue which is more important.

J Thomas on May 16, 2008 5:26 AM

Go WYSIWYG so I don't need to do the preview step. There are good options for WYSIWYG editors these days. Enable just the heading level, bullets and other structural elements so that pages remain clean and sans awful bright red giant text.

wioota on May 16, 2008 5:27 AM

Whatever you do, make it easy for users.

HTML is too much typing.

Stuff people have to look up is no good either.

On the other hand, the only thing people really *need* is links. Everything else they can do without if they can't figure out how to do it. If I *really* want some bold text, I'll figure out how to do it or maybe put it in CAPS or whatever. If it isn't worth finding out how to do it then I don't need it.

So if you want to make it easy for users enough to do some extra work, then:

1. Allow limited HTML, not enough to be easy to break things but enough to provide the formatting you want.

2. Allow something like Markup, notably *bold* _italics_

* bullets

And a few others. Many of these are things people were using for emphasis before.

3. Put a toolbar on your online editor that does stuff.

The more ways that work, the larger will be the minority of your users that are satisfied. Provided the different approaches don't get in each other's way. Nobody will use HTML by accident and nobody will use a toolbar by accident, so those are two good complementary approaches. I doubt Markup would be much trouble either.

Speaking for myself, Markup is best for every Markup command I remember, because it's a keystroke or two in the middle of my typing. No stopping for a mouse, and less bother than HTML. But of course people who already know HTML and don't know any Markup will find it harder. No doubt egyptian scribes who were used to hieroglyphics believed that was easier than writing in demotic.

J Thomas on May 16, 2008 5:30 AM

I guess I should mention, I learned how to use Markdown in a few minutes without ever seeing its name or discussing it with anybody. Far far easier than HTML and easier to use too, though of course limited.

I'm not sure I approve of this idea that nobody should ever learn to do anything effective that's new. If we'd had this attitude and this technology when the automobile was invented, we'd probably all be driving cars now with reins. You turn by pulling one or the other, you brake by pulling both, you start up by shaking them, and the car would have voice recognition circuits to respond to Gee Haw Giddyup etc. You'd switch to high gear by kicking a spot below the saddle.

J Thomas on May 16, 2008 5:57 AM

So anyway, HTML should be fine provided you can turn it off. Or better yet, have it off by default and you can turn it on. So people won't write OVER IF EXIT THEN and find out they were writing HTML and didn't know it.

J Thomas on May 16, 2008 6:10 AM

If you allow HTML, even with a carefully constructed whitelist, I guarrantee someone will figure out how to do XSS.

HTML is just too complicated to secure.

Sean on May 16, 2008 6:13 AM

I agree the word is definately subset of html you don't want every post you see to be different font's and sizes and ... conform generally like wikipedia does to a standard then you can make design changes if needs be using your style sheets. Users generally like page formatting to remain the same.

Make sure there is a preview for the HTML my favourite types are the ones like on http://www.w3schools.com/ code examples where it keeps it on the same page so you can view your HTML and output at the same time.

pete on May 16, 2008 7:12 AM

While I really like the simplicity of latex, my personal preference is for a gui editor that only allows the basic: link, strong, italics, underline, ordered/unordered lists and -for sites targeting developers, a "code" style.

jake on May 16, 2008 7:58 AM

@J Thomas:
"Nobody will use HTML by accident..."

On a programming website it is entirely possible that someone might want to actually post some HTML or XML. Or even run into issues with code in other languages: e.g. "if (ab) "

"On the other hand, the only thing people really *need* is links."

I would have thought that on a programming website the only thing people need is the ability to post their example source code without it getting mangled.

Graham Stewart on May 16, 2008 8:06 AM

+1 for HTML with no wysiwyg editor

Yes, there are good programmers that don't know HTML because they haven't needed to learn it. However, the programmers that can't be bothered to learn the maybe half-dozen or so tags that would be whitelisted on stackoverflow probably aren't going to get that much out of the site anyway.

Alex on May 16, 2008 8:49 AM

I am shocked. Comments usually flow minutes after each blog. This blog was posted 3 days ago and not one single comment?

Scot McPherson on May 16, 2008 10:56 AM

Oh geez, god love the cache.

Scot McPherson on May 16, 2008 10:57 AM

Some days I think the cache is actively working against me to frustrate me, so you are not alone, Scot!

Adam on May 16, 2008 11:17 AM

Whoa! Too many comments. Can't wade through them all.

I only have one concern if HTML markup is allowed (or any in-line markup is allowed).

a. Don't do smileys.

b. Provide some filtering that doesn't mistake use of , , and for tags and entity references unless they pass more scrutiny than that. There are programmers who do not know HTML and when you are not thinking in HTML it is easy to use those characters without appreciating what is happening.

c. You do need to do something about explicit new-lines, but don't fall into the trap of having newlines break recognition of an element tag (e.g., between attributes or elsewhere) or an attribute-value string. This is difficult to get right.

d. pre should work.

I have mixed feelings about all of this myself. But if you go with allowing HTML (or another scheme with HTML mixed in) I think this guidance matters.

orcmid on May 16, 2008 12:19 PM

OK, I guess I have 4 concerns. Another one is that you have a filter on these comments that deletes explicit less-than and greater-than marks; so some of the previous comment has missing text and makes no sense. So when you say no HTML it would have been smarter to escape anything that looks like HTML (escape, less-than, and greater-than are the key characters) before splicing the comment into the page. It looks like you are filtering out the characters instead, which behavies badly on false positives, aye?

orcmid on May 16, 2008 12:26 PM

http://namb.la/popular/tech.html

Here, you can't secure html, no matter how smart you think you are. Better to escape it all.

Sean on May 16, 2008 12:47 PM

I'm with the many people who want at least the option to use a lightweight markup language. Using HTML seems like elitism and a "because we can" mentality. I don't want to have to mess with ampersand codes and manual lines breaks, I just want to write my comment. If I need special HTML code, you can give me the option to include it inline like in Markdown, but don't make everyone use it for all comments all the time.

Rory on May 17, 2008 3:37 AM

I vote for no styling. (BTW, the stackoverflow.com home page has 2 paragraphs, 3 p tags, no closing paragraph tags.)

Ross on May 17, 2008 5:33 AM

I have one iron-clad design guide: this is a site for programmers, so they should be comfortable with basic markup.

There you have it. It's a site for programmers, not average users. So understanding HTML is a valid requirement.

As for whitelisting certain HTML tags and forbidding others, I think that's a good idea. Of course TABLES will be a pain, but you can't have it all.

If the site uses XHTML, it should be possible to use XSL to transform the content into other formats with comparatively little effort. Just be sure the system outputs valid code. (For some reason I keep thinking I saw someone say the lightweight markup languages had the advantage here because they could be exported to LaTeX, etc. but now I can't find the comment.)

Is there any reason you can't provide a live preview of what the final page will look like? Maybe use JavaScript to update the content of an IFRAME or something.

matt on May 17, 2008 7:17 AM

@Ross: Actually you don't need /p closing tags for valid HTML 4.01 Transitional.
But you do make a good point: even on that very simple page there are validation errors (e.g. the alt attribute is specified twice, some tags are improperly closed with /).

Apparently even those that can "sling code" and are "comfortable with basic markup" can make mistakes.

So if you do decide to enter HTML directly then not only will you have to filter it for XSS and blacklisted tags and provide character escaping, but you'd also have to run it through an HTML validator.

WYSIWYG FTW!

Graham Stewart on May 17, 2008 7:23 AM

Harley, I think I got what you're saying but let me spell it out at great length to make sure.

If I put malicious HTML into a message then it potentially affects everybody who sees it. So the server side has to be carefully written so it allows proper HTML but erases all malicious HTML. Anything that gets past the special server code will affect all users.

You're talking like there's no way for the server to cooperate with users who don't want that to happen and who don't fully trust the server to stop it, who would rather simply have HTML disabled in messages sent to them.

Isn't this starting to sound like a traditional problem? We have a powerful system, and then we want to let random users use some of the power, but we can't trust all of them to use it wisely. And the least trustworthy of them are good at wiggling past the safeguards we set up to stop them. Lots of complexity for a zero-sum game, where we try to provide good power but not bad power to everybody and the bad guys try to find ways to abuse it....

J Thomas on May 18, 2008 8:17 AM

wow.. stackoverflow.com will be some kind of oldschool site i possible never ever use. There are some causes why nearly noone uses html for communities and the web evolves to wysiwyg solutions.

First: HTML with whitelists would be something you need to learn for one page only. The next page which tries something similiar could have other whitelisted tags. Why should I think about how to write something on a site? For me it's about the content. and thinking about tags, codcomments etc. it's just thinking about something which hasn't much in common with the pure meaning of the comments.

Second: Why the hell should programmers know html? I know a lot of programmers which have nothing to do with the web. The only thing where html is needed for them is e.g. something like a chm file, and even for that they are using a wysiwyg editor with applied styles of the company they are working for.

Third: Why do you want the user to take care about conversations? You want someone to use your site. So take care of things in the back. As customer i would fire a programmer which tolds me that i need to do somethingcomplicated just because of data conversation.

Fourth: If you want to go oldschool and forget everything about user friendly interfaces, why don't you just use a newsgroup?

Wolfgang on May 18, 2008 8:19 AM

@Graham Steward
So if you don't have attributes then how do you specify what language your source code is in BBCode?
Surely you'd need something like [code language="csharp"]?

[code="csharp"]

BBCode allows tags to have a single attribute like that. Tags don't seem to have specific attributes, like you posted, but some tags can have an attribute specified on the tag name (like BBCode's url tag) for a specific effect or just have a general effect without having anything defined.

In the case of a custom [code] tag, having the attribute undefined could just be like a pre tag without any highlighting.

Harley on May 18, 2008 8:25 AM

Are people saying that the problem with HTML is that no matter how you try you can't keep people from writing malicious HTML code that will annoy or damage other users?

If so, then you could let users turn off HTML or take their chances. And if they have HTML turned off but some other markup language turned on, then it would be good to translate a limited set of HTML to the other markup language. The malicious code that doesn't translate could be ignored or left intact for readers to puzzle over.

It makes some sense to allow a subset of HTML for people who want it, since they want it so much. But I'm starting to wonder -- people who won't spend 30 seconds to find out how to do things but insist that the layout be arranged in a complex inefficient pattern to suit what they're already used to -- would people like that actually be an asset?

Won't they tend to be the sort who do everything they can to delay progress? "We don't need any new programming languages because all real programmers already know C." Etc.

J Thomas on May 18, 2008 9:39 AM

The problem with having an option for users to "disable viewing HTML in posts" is that it forces the server to do additional parsing when it is displaying posts instead of just plugging in the post content. And since viewing happens more often than posting, it's less work overall for the server to parse things out of a post when it's being posted instead of when it's being viewed.

It also would likely dissuade users because it'd be a form of admitting that malicious code could be posted by another user and "you're responsible for your own safety on this forum". Which is the kind of thing that gets a site "This site may harm your computer" in Google searches.

Really, if formatting with HTML is desired, a whitelist is the best bet to prevent malicious HTML, if it's set up correctly, but it may leave very little in options for formatting, probably not much more than something like BBCode offers in the first place.

hypotheticalAlso, would you have to do your own line breaks or paragraph tags as HTML, or is that done for you (which isn't like writing HTML at all)?/hypothetical

Also, it'd probably has to also strip out things like inline JavaScript (like, oh, a href="#" onclick="for(;;) { alert("Hello World"); }"Click me/a, which is pretty tame as far as malicious code would go). In which case, it'd be even more like, say, BBCode, just with and instead of [ and ], so then, why not just use something that's established and just modify it slightly (and leave in the GUI editor style, for users that prefer it like that)?

Harley on May 18, 2008 10:29 AM

@J Thomas
Um, the goal of malicious HTML is that it would be posted to affect anyone viewing it, not to affect the person writing it. And the goal for the forum system should be to prevent it from being posted so viewers don't get affected by any malicious HTML, whether or not they are registered. Registered users shouldn't have to protect themselves on someone else's forum, and what about unregistered users?

That's why many things that take text input from viewers, like forums and blogs, don't allow HTML. Like this blog.

Allowing HTML in a forum system means you have to parse every post with a whitelist to break any unallowed HTML and attributes, which may or may not be malicious, but since it's also a programming forum, you also have to have some way to detect HTML to post "as-is", so other users can see the HTML code and not have it actually parse.

Other systems, like BBCode, attempt to bypass that issue by just not allowing HTML at all, so users can only post with specific tags that can't be made into malicious HTML.



Personally, I wouldn't see what would hurt to have a pseudo-WYSIWYG system with the quick formatting bar that, when you click a button, visibly adds the tag to the text of your post (or puts the tags around whatever you have highlighted), which is something some vBulletin forums allow you to do (they call it the Standard Editor, which is basic text with the formatting controls).

Harley on May 18, 2008 11:02 AM

I haven't read the comments above (just too much), but I thing you should have a WYSIWYG editor, independent the choice of markup languages. OK, the contenteditable-implementation results in HTML. But with some clever programming you can convert this to Textile/Markdown/etc.

This way you can offer users the choice between WYSIWYG (even most programmers prefer this) and hand coding (my favorite).

Doekman on May 19, 2008 3:28 AM

How about using vi as the text editor?

http://gpl.internetconnection.net/vi/

Evan on May 19, 2008 5:00 AM

**************

Let them eat SYMANTEC markup, hence html.

Define some standard tags that have CSS classes ready; code, nsfw, idea, spellbee, quote, good, bad, stickittotheman, hippysource etc.

Add to the list as new groups of things become apparent.

Move with the content.

***************

Phil H on May 19, 2008 5:15 AM

BBCode is significantly worse than HTML, I'll agree, and Wikipedia's is truly a brainfuck, but I'm hardpressed to see the problem with Textile and Markdown, seeing as both are handily leaning on Usenet 'formatting' conventions. Textile seems to lean on HTML tags, too.

It's also a little arrogant to think that all developers know HTML, although honestly if they know BBcode they know HTML.

Merus on May 19, 2008 12:53 PM

p class="spellbee sarc"
I think SYMANTECreg; might object :)
/p

Graham Stewart on May 19, 2008 1:37 PM

A good input box should provide both WYSIWYG and markup (either simultaneously or by switching).

WYSIWYG is so easier to use, and it won't stupidly convert anything typed in.
The replacement for typing tags is to have explicit keyboard shortcuts.

Markup is more powerful as it shows what is not directly visible and allows to paste pre-built content.

Musaran on May 20, 2008 5:22 AM

Are you for real? 'Programmers' most important job is to get computers to do useful things for us, things that might have taken a long time or been error-prone to do manually (like typing code). I.e to get computers to work for us.

The GUI is the best thing since sliced bread for getting something done quickly and accurately. The eight simple buttons in your example GUI above will cater for 99% of everything you would need to do in this context. Forcing end-users to type a human message in a textbox using special coded tags for formatting is turning the clock back 20 years in IT.

GM on May 20, 2008 6:10 AM

I agree that HTML is not ideal but the best of a bad lot. The problem with all the other alternatives, nice as some of them are, is that none of them is a standard. I can't tell you how annoyed I am to find, every time I'm coaxed or forced into using a new forum/wiki/CMS/whatever, that it has its own syntax that overlaps enough to be confusing, but not enough to be useful.

Until there's an ISO or ANSI standard for one of these beasts, I don't see anything better than a subset of HTML.

Unless, that is, you're the one who can lock all the wiki developers into a large room and not allow any of them to go to the toilet until they have converged on a single markup language...

Robert Goldman on May 20, 2008 7:16 AM

Check out the extension to wikipedia syntax providing syntax directed source code highlighting in http://en.literateprograms.org/LiteratePrograms:How_to_write_an_article

Get such an approach to work with markdown! You'd have a great combination.

Plus, the noweb markup syntax for interweaving documentation and source code which renders formatted and can be downloaded executable/compilable is wowsah for your venture. I'm guessing.

wowsah!

malcook on May 20, 2008 10:11 AM

"I agree that HTML is not ideal but the best of a bad lot. The problem with all the other alternatives, nice as some of them are, is that none of them is a standard."

If HTML gave you a standard that let people do markup (bold, emphasis, maybe tabs, maybe headers, links, and code that isn't affected by the above) and didn't let people do weird malicious stuff, then this would be a different discussion.

We'd be talking about what we preferred, and developers would try to provide as many of our preferences as they reasonably could and let us choose among those. So you could have a menu: HTML Markdown Wikipedia BBCode Textile WYSIWYG vi Wordpad Applewriter. Pick your poison. Sheer personal preference, and they provide what they can.

But HTML is dangerous, and the question is how to reduce the danger. The consensus is to white out everything except a small subset of HTML that you want users to have. Another approach is to spike the guns -- turn other HTML code into something that displays and isn't executed, and people can see whether they think it's malicious. But then it looks ugly and people who cut-and-paste it might convert it back into a dangerous form. (If programmers can post sample HTML code that doesn't get executed, which we might want to do, that could also get executed later by people who copy it.)

If you don't give users direct access to HTML, that's easier and safer than trying to take it away from them after they use it.

So, if there's no direct access to HTML the choices are to provide plaintext (which is good enough for me, but some readers will go elsewhere). Or provide one or more lightweight editors like Markdown. Or maybe also provide WYSIWYG. The original author said he wouldn't do that last. Apparently he wanted to weed out people who insisted on markup but who weren't willing to use a markup language. It's one way to weed out users. If he only wanted people who knew cryptography he could require users to solve a simple cryptogram before they posted. Like that.

So, the way you get a standard is you provide the best mix of features in your opinion, and over time people will tend to agree about what should always be provided, and they'll get together and make a standard. Make your standard too early and it will standardise the wrong things. Make it too late and everybody already knows what they need. If it isn't time yet for a standard, why go with an HTML standard that doesn't fit the need?

J Thomas on May 21, 2008 8:24 AM

@ Jeff

The key words, though, are "subset of .. HTML" -- along with inferring the paragraph tag. I find that b, i, code, pre, blockquote, li, etc are simple to use and don't obscure the underlying content.

Be aware that b and int are saying something about how the content is formatted. One should use strong and em - the semantic equivalents.

Troels Thomsen on May 26, 2008 3:01 AM

One should learn to encode tags.

My point is, that em should be used instead of i - and strong instead of b - generally.

Troels Thomsen on May 26, 2008 3:02 AM

Why don't you use some thing similar to Latex? You can define some basic commands, and let the users (programmers, supposedly) write their favorite macro. Latex syntax is, I believe, very simple and context emphasized. Furthermore, once the programmers have mastered the basic, they have a very powerful tool to express themselves without the worrying admin with breaking the layout. Furthermore, the thing that I hate the most in HTML (XML in general) is its wordiness and emphasize on the presentation, both of which we don't really need. Last point: programmers usually strive to use the right tool for the right task. If a website by programmers for programmers uses the wrong tool (HTML) for the task (expressing ideas), will that not a bit wrong?

Lam Luu on June 1, 2008 8:16 AM

I read this in my RSS Reader so I decided this before reading the comments above, but I just want to say - after quick eyeballing the four cases its pretty clearly Markdown that is the most 'humane' of all the above.

I definitely would rather write my text comments in Markdown than HTML.

The single first problem with 'pure html' source editing is the carriage returns. In a text box - like for this comment, having to add carriage returns by putting everything into p/p tags gets realyl really old fast. The next thing is you want URLS to be automatically turned into links... once you start down that path you just going to keep going. you might as well face it and use a standard 'easy' markup format - like Markdown. HTML is a *lot* harder than Markdown anywhich way you look at it. Even for programmers.

Miles Thompson on June 4, 2008 9:34 AM

What's up, really? You can no longer read plain text? Are your eyes burnt by TV, comics, flash sites and colourful lights that you can't stand good old black on white? Did you all forget that content is why something is written? I LOVE old sites with a lot of information made by someone who THINKS and hate those new flash-bling-rolloverme-highlight-popup-annoythehelloutofme web2.0 "sites".

Clever guy on June 6, 2008 2:22 AM

I am sorry i don't have the stamina to read all the comments to make sure someone else's not said this before, but here's a suggestion. HTML, BBcode, RST, Textile or MediaWiki?

Solution: all of them!

Really.

Put the burden of syntax on the code, not on the community.

For example, triple single-quotes make bold in WikiMedia, double star+end asterisks make bold in Textile, [b][/b] in BBcode and b/b in HTML. Upon encountering ANY of those, give bold to your poor commenter.

Inconsistency. I know. But it need not be as big as problem as at first it might seem. Provide for the common case and use heuristics to decide the strange formatting. There are lots of shared assumptions between all the markups. Covering what is actually common ground will get you a long way.

In the cases where the various semantics actually diverge, use as many tricks as you can to extract what you can from that.

The bottomline case, obviously, should be to render whatever you're given as pure text -- with all the assumed markup noise removed.

Also, the precise rendering of the different markups should change with time according to trends you perceive in the data! Yes. For example, if at first you assumed blank-started lines where a common sense way to present blockquote, but you start to see more and more people using it for code, change it! Or develop a test where the system tries to find common programming constructs within and if found assumes to be code. Or even more clever ideas. It might break some comments, in a way, but it will still be an overall working system. On the lines of worse-is-better.

Also, some of the code entered by commenters might be altered. I know, "don't mess with my code!", i also feel like this. But for instance -- storing data in HTML might become cumbersome, so you change entered tags to a basic common ground markup. (I am somehow a fan of TXT so i think data should not be stored in an assumed structured way if it is not dealt ONLY in structured ways, and a simple text editor of text area is not structured, by that is just me and my manias). Common mistakes could be corrected. Obviously this has a major risk of screwing up, but i also think with proper care it can be a least common denominator that allows things to just get done with.

You can provide a basic recommended syntax (with what is most common-ground) to allow for "predictability" for users who really want to be in control -- and for those there is always HTML which is clearly specified anyway.

Also, notice that most of the systems out there already have this "interbred" syntax approach i'm forwarding -- at least by making double newlines equal paragraph even if all of the rest of the syntax is HTML.

Finally, take all the code, let it mature by throwing lots of data at it (as you will probably have when your site opens up), then isolate it as a library and open-source it. That would be great.

I would say just my 2 cents, but guess that was quite a big comment, uh?

MarcioRPS on June 10, 2008 12:50 PM

Count yourself lucky that you only have to deal with programming specific markup.

Mathmaticians would need math markup as well:
1) mathml (no)
2) mediawiki markup (somewhere between never and acceptable)
3) custom math markup -- Latex -- image (a little better than 2). There's a php file to let you do that.

Chemists would be very bad. Especially, organic chemistry.

BTW: where's the preview button?

Timbo on June 12, 2008 4:04 AM

Html Attribute for MARQUEE ... -- Marquee Slide Image and Text ---
http://html-lesson.blogspot.com/2008/06/marquee-slide-image-text.html

html htm on July 14, 2008 10:22 AM

Since this site is a site for programmers there is no need for any pseudo-markup-language wich doesn't do anything else then just be typed and then paresed which adds a) load to the server b) load to the brain (even if they are such simple, you need to look how to do what you want - while most would allready know HTML)...

I used to post on a site which is a questing/answer platform (for general questions about nearly everything) which used something alike... I don't know how I should do a list by now... sometimes it works, some other times the listing points are just all written after each other while being seperated by an asterix (*) which should (but doesn't allways) create a new point in that list. This is very odd and doesn't make someone look very competent when ansering a web-related questing ;-)

If you would want to use an Editor (you do not want, I understood it and I understand why), I would recommend using the editor MarkitUp! it offers great features (you can tab right out of the generated markup for example, just write what should be inside and press tab).

It is very user friendly, easy to understand and dosn't do any pseudo WYSIWYG shit.
And you can use it for whichever markup lang you want.

Alex

Xel on July 17, 2008 4:52 AM

Thanks for very interesting article. I really enjoyed reading all of your posts. It’s interesting to read ideas, and observations from someone else’s point of view… makes you think more.

Pharmacy on August 27, 2008 3:30 AM

If you would want to use an Editor (you do not want, I understood it and I understand why), I would recommend using the editor MarkitUp! it offers great features (you can tab right out of the generated markup for example, just write what should be inside and press tab).

It is very user friendly, easy to understand and dosn't do any pseudo WYSIWYG shit.
And you can use it for whichever markup lang you want.

xSS-ErrOr on September 2, 2008 5:38 AM

thank u r information

it very useful

u r blog Is very nice

matthew on October 10, 2008 5:04 AM

HTML is not a human language, but Arabic figures is not user-friendly too. 5 not look like ::.

Using of Markdown, Textile, Wikipedia instead HTML - ridiculous.
Only AGCODe need for security reasons.

fake watches on November 4, 2008 1:58 AM

One solution to this age old problem is a WYSIWYM editor that I'm developing at http://dockion.creationix.com/.

It lets you do the layout in a gui fashion, but the content is in something content oriented like markdown with toolbars to ease when you can't seem to remember the right syntax.

WMD is a great example of how my markdown editor will eventually end up.

Tim on December 6, 2008 9:04 AM

Hi guys. What can you say about a society that says that God is dead and Elvis is alive?
I am from Djibouti and , too, and now am writing in English, tell me right I wrote the following sentence: Find affordable flights and very cheap travel deals.

Regards :o Bethany.

Bethany on April 20, 2009 12:37 PM

I DON'T KNOW THAN YOU ARE SPEAKING, KKKKKKKKKKKKKKKKKKKKKKKKK

FLORO on July 9, 2009 11:56 AM

I want to allow few HTML tags in textarea and restrict others in asp.net application for my website. Can any one suggest me the best way to do that???

DoYouKnow.IN on July 24, 2009 9:28 AM

I want to allow few HTML tags only in TextArea in my asp.net web application. Can anyone suggest me a proper way?

Thanks in Advance
Jimit

DoYouKnow.IN on July 24, 2009 9:32 AM

Jeff, glad to hear you've settled on HTML for the input method. As you say, we developers already (should!) know HTML.

My only issue with use of HTML versus lightweight markup is the few extra characters needed to type out HTML tags, as opposed to the comparatively fewer characters needed to do formatting in the lightweight markup languages. But that's just one more reason that developers should all know how to touch type, right?

It seems like several other commenters are advocating Markdown... one option to accommodate these folks might be to give users an option to have their posts parsed for HTML, for Markdown syntax (or whatever lightweight markup language you choose), or both. Post entry in forums based on the UBB.threads (http://www.ubbcentral.com/) package does this, for example.

Jon Schneider on February 6, 2010 10:25 PM

Take a page from MS' book. Code up some intellisense for whatever markup you use. Even if markup isn't intuitive you can hint the user to where they need to be...

Add some syntax highlighting and users won't know they aren't using something they already know...

JPunyon on February 6, 2010 10:25 PM

Good for you. I'm sick to death of being coaxed into joining some new forum or social networking site or wiki and finding out that I have to learn a new, totally arbitrary set of rules that are kinda sorta like the ones I already know but not quite, and always getting mixed up between the 8 different markup styles.

On the other hand, I don't think it would hurt you to have a WYSIWYG editor - some developers may know HTML but are very slow typists and having to type HTML would slow them down significantly. Just make sure you leave the option for people to use a normal textarea that's not horribly mangled (Community Server, I'm looking at you).

Aaron G on February 6, 2010 10:25 PM

@Jeff
Incidentally, if you haven't ever edited a Wikipedia article, you should. I consider it a rite of passage, a sort of internet merit badge for anyone who is serious about their online presence

I'm not serious about my online presence but I am serious about programming. You've edited wikipedia, I haven't. I know C, you're a web celebrity. Go figure.

Anon on February 6, 2010 10:25 PM

I think making it HTML or even subsetted HTML would work well. But my first thought was that people are going to mess up your layout. True, posts or pages should be content-centered, but all the more reason to limit the freedom of users to a small but large enough set of layout items. Look at Myspace! It's eye- and brainhurt, because everybody puts in their own fonts, sizes, colors, background images, etc... Of course you can limit all this, but I think you will have only 5 to 10 remaining html tags, and then consider these in lightweight ML's. Looking up markup syntax is a bitch, but if you put the syntax of the 5 or 10 most used (and probably 95% of the time, are only used) near the editing-field, it doesn't matter if it's html, BBCode, or whatever imo.

Just please don't let people change fonts, font size, add their own smileys, and css mods.

You will have a select crowd of intelligent people writing articles and solutions to problems. But the questions themselves are going to be asked by Co0dingNewb015. Maybe people with a low post count can have even a subset of your subset of layout items.

Now I'm just rambling

Ps: You're right Jeff. Nobody reads 200 blog comments, except you (Podcast 1). I tried to see I wasn't 'double posting' but stopped around 30 something.

joon on February 6, 2010 10:25 PM

I'm glad you're looking for alternatives to wikipedia and bbcode markup, but I'm not sure Joel Spolsky's Wikipedia page in basic HTML would be any less intimidating.

"pSpolsky grew up in a href="/wiki/Albuquerque%2C_New_Mexico" title="Albuquerque, New Mexico"Albuquerque/a, a href="/wiki/New_Mexico" title="New Mexico"New Mexico/a"

Using html as the editor language doesn't change the need for a 'help' page, at the very least showing allowable tags (script not being one of them), which means I'd need to look up what I can/cannot use.

If I wanted to make your basic html example, is that a div or a span being used for the code snippet? What's it's id or class? As a programmer, I don't think I'd necessarily be sure on the details. Does p need a closing /p? How about li inside ul? Should it be br or br /? Is it b or the css font-style: bold;?

pI think for a wiki, you don't want more than one way to do things. Should the heading be h2, or will it end up being font size=+5 half the time? Are you sure tables are only used for tabular data and not as a layout format everywhere? Policing content seems bad enough, I'd hate to have to police meta-content as well.

pFinally, for each paragraph in basic html, I need to begin it with a p? That's terrible for writing flow and though process, but maybe thats just me.

Samson Yeung on February 6, 2010 10:25 PM

Markdown looks easier on the eyes, but I'd need a cheat sheet handy for a while since I've never actually had to use it.

With HTML I have only one hesitation: embeded and . Even programmers forget to encode these (or find them all) and if code samples are going to be a frequent occurance, then mistakes are going to happen. A lot.

You could help this by making "code" tags or something which will ignore *ALL* markup between them. Assume that everything surrounded by code is to be taken literally.

Clinton Pierce on February 6, 2010 10:25 PM

Of course, my previous comment was mangled because I assumed the "no HTML" instruction meant that if I used HTML-like things in my text they'd be ignored. Text is text, right?

UI FAIL.

Embedded , and amp; are going to cause all kinds of problems. Perhaps a code tag that causes everything inside it to be automagically encoded. Because even good programmers will make mistakes and forget to encode characters, or just not find them all in their code sample.

Clinton Pierce on February 6, 2010 10:25 PM

I think you should use a control. I don't want to type in any formatting, just provide a simple control that applys the formatting for me.

Allow for:

Limited styling: (Bold, Italics, (Font Size and Maybe Color)
Allow for Pasting of code: (HTML PRE Tag)
Allow for hyperlinking, maybe lists

That is the basics.

Jon Raynor on February 6, 2010 10:25 PM

One of the main themes of your website has been "usability". Hell, just a few days ago you went off on how bad XML was. Now you're proposing to use a subset of it, with the exact same difficulties? Because people "should already know it"?

Sorry, Jeff, you're off your rocker on this one. WYSIWYG exists for a reason. If I have to type "a href=blahblah" everytime I want to show a link, I just won't use your site. Whether we should know it or not is irrelevant. Whether it's easier to use is. If you really need HTML editing, include it as an extra option, but it should by no means be the only choice, or even the default.

Brandon on February 6, 2010 10:25 PM

I agree that HTML is the markup to use. It's as simple as following the law of least astonishment, even though it is certainly not the simplest thing that could possibly work. It is ironic that the text field that I'm typing in now has "(no HTML)" at the top, which is pretty lame, or should I say, blame/b. ;-)

Simon on February 6, 2010 10:25 PM

I think you're looking at this from the wrong point of view. You're looking at the technology (various markup syntaxes in this case) whereas you should be looking at the problem you want to solve.

What's the problem?
Problem Statement: Allow [somewhat technical] users the ability to input data which includes some basic format specifications.

What kind of data?
Human-readable text.

Does this "human-readable" text support localization (i18n)?
Hopefully. In that case you need the ability to segregate text blocks based on locale (this block is "us-EN", this block is "de-DE", etc).

Is there non-localizable text?
Yes. Code. But those code blocks are themselves somewhat localizable into different computer languages (C#, C++, PHP, VB, Python, Ruby, Perl, F#, Haskell, etc). It's just a different pool of "locales" if you will.

Do you plan to support externally created content?
By and large, the web browser is a HORRIBLE data-entry application. Word, OpenOffice, whatever is much better (and faster) at producing formatted text. Will you support cut'n'paste of RTF, DOC, ODF, OOXML, etc?

There are lots more questions ;) But the bottom line, focus on the problem and then think of the solution. Don't assume the solution already exists in the form of another technology. This smacks of trying to put the square peg in the round hole.

It may be that one of the aforementioned markup languages is appropriate. It is more likely that bits of each are preferred. HTML (while a reasonable rendering language) impedes the content generation process IMHO. It may be that you need to support multiple input formats, or that you need to create your own variant.

Simon (another one) on February 6, 2010 10:25 PM

The argument could be made that if you are a programmer you should not know html. That way people like me who have done half-serious, but not critical web applications would have to do things properly and use templating engines.

Personally I would keep the paragraph tags, or tell the textarea to put them in automatically somehow. Blogger's composer widget annoys me because you can put in html, but it adds extra line breaks in the preview corresponding to the line breaks in the source, even though html is supposed to ignore whitespace other than a single space character.

John Ferguson on February 6, 2010 10:25 PM

I like your sanitized HTML plan. And I love reading 4000 comments before posting. @frederik and @pierre are crazee!!!

Jon Galloway on February 6, 2010 10:25 PM

What about now that your trilogy extends beyond programmers?

Joe Hopkins on August 4, 2010 10:01 AM

«Back

The comments to this entry are closed.