One of the things we're thinking about while building stackoverflow.com is how to let users style the questions and answers they're entering on the site. Nothing's decided at this point, but we definitely won't be giving users one of those friendly-but-irritating HTML GUI browser layout controls.
I have one iron-clad design guide: this is a site for programmers, so they should be comfortable with basic markup. None of that nancy-boy GUI toolbar handholding nonsense for us, thankyouverymuch. If you can sling code, a little bit of presentation markup is child's play.
We will support some sort of markup language to style the questions and answers. But what markup language?
I mentioned in podcast #4 that we consider Wikipedia a defining influence. Let's see how Wikipedia handles markup syntax. This is what the edit page for Joel Spolsky's Wikipedia entry looks like:
It's an effective markup language, but I think you'll agree that it's more intimidating than humane. Wikipedia's How to Edit a Page and the accompanying Wikipedia syntax cheatsheet helps. Some. I'd argue that writing a Wikipedia entry is a step beyond mere presentational markup; it's almost like coding, as you weave the article into the Wikipedia gestalt. (Incidentally, if you haven't ever edited a Wikipedia article, you should. I consider it a rite of passage, a sort of internet merit badge for anyone who is serious about their online presence.)
Let's consider a simpler example. What we're looking for is some kind of middle ground, a humane text format. Let's start with some basic HTML.
Lightweight Markup LanguagesAccording to Wikipedia: A lightweight markup language is a markup language with a simple syntax, designed to be easy for a human to enter with a simple text editor, and easy to read in its raw form. Some examples are:
Markup should also extend to code: 10 PRINT "I ROCK AT BASIC!" 20 GOTO 10 |
Here's what that looks like expressed in a variety of lightweight markup languages. Bear in mind that each of these will produce HTML equivalent to the above.
| Textile | Markdown |
h1. Lightweight Markup Languages According to *Wikipedia*: bq. A "lightweight markup language":http://is.gd/gns is a markup language with a simple syntax, designed to be easy for a human to enter with a simple text editor, and easy to read in its raw form. Some examples are: * Markdown * Textile * BBCode * Wikipedia Markup should also extend to _code_: pre. 10 PRINT "I ROCK AT BASIC!" 20 GOTO 10 |
Lightweight Markup Languages
============================
According to **Wikipedia**:
> A [lightweight markup language](http://is.gd/gns)
is a markup language with a simple syntax, designed
to be easy for a human to enter with a simple text
editor, and easy to read in its raw form.
Some examples are:
* Markdown
* Textile
* BBCode
* Wikipedia
Markup should also extend to _code_:
10 PRINT "I ROCK AT BASIC!"
20 GOTO 10
|
| Wikipedia | BBCode |
==Lightweight Markup Languages== According to '''Wikipedia''': :A [[lightweight markup language]] is a markup language with a simple syntax, designed to be easy for a human to enter with a simple text editor, and easy to read in its raw form. Some examples are: * Markdown * Textile * BBCode * Wikipedia Markup should also extend to ''code'': <source lang=qbasic> 10 PRINT "I ROCK AT BASIC!" 20 GOTO 10 </source> |
[size=150]Lightweight Markup Languages[/size] According to [b]Wikipedia[/b]: [quote] A [url=http://is.gd/gns]lightweight markup language[/url] is a markup language with a simple syntax, designed to be easy for a human to enter with a simple text editor, and easy to read in its raw form. [/quote] Some examples are: [list] [*]Markdown [*]Textile [*]BBCode [*]Wikipedia [/list] Markup should also extend to [i]code[/i]: [code] 10 PRINT "I ROCK AT BASIC!" 20 GOTO 10 [/code] |
None of these lightweight markup languages are particularly difficult to understand -- and they're easy on the eyes, as promised. But I still had to look up the reference syntax for each one and map it to the HTML that I already know by heart. I also found them disturbingly close to "magic" for some of the formatting rules, to the point that I wished I could just write literal HTML and get exactly what I want without guessing how the parser is going to interpret my fake-plain-text.
Which leads directly to this question: why not just stick with what we already know and use HTML? This c2 wiki page titled Why Doesn't Wiki Do HTML? makes the case that -- at least for Wiki content -- you're better off leaving HTML behind:
I'm not sure I agree with all of this, but it can make sense in the context of a full-blown Wiki. It's worth considering.
After all this research on humane markup languages, much to my chagrin, I've come full circle. I now no longer think humane markup languages make sense for most uses. I agree with the guy at fileformat.info -- HTML is generally the better choice:
If the source and destination are the web, why not use the native markup language of the web?
HTML is a bit less readable than the lightweight markup languages, it's true. But basic HTML is not onerous to read, particularly if we hide the repetitive paragraph tags.
With a bit of careful coding, it is possible to whitelist specific HTML tags that you will allow. This way you avoid exposing yourself to risky/vulnerable tags.
It's not at all clear that any existing lightweight markup language has critical mass, with the possible exception of Wikipedia's flavor. On the other hand, text parsers and tools will always understand HTML.
A lot more people know HTML than any given flavor of humane text. If you're a programmer, you damn well better know HTML. For the handful of wiki-like functions we may need, it's possible to add some optional attributes to the HTML tags. And wouldn't that be easier to learn than some weird, pseudo-ASCII derivation of HTML?
I do think we'll adopt some of the cleverer functions of Textile and Markdown, insofar as they remove mundane HTML markup scutwork. But in general, I'd much rather rely on a subset of trusty old HTML than expend brain cells trying to remember the fake-HTML way to make something bold, or create a hyperlink. HTML isn't perfect, but it's an eminently reasonable humane markup language.
MARKUP
Whitelist of markup is a must; I can't see a way you'd avoid having to parse and validate input anyway. You'd need to consider how to Tidy ( http://www.w3.org/People/Raggett/tidy/ ) the markup, keep the site look and feel consistent and support fixing markup when the page DOCTYPE and user agent behaviours inevitably change. This is perhaps a small risk, but something you want to be able to fix in one place rather than in every back post in the site's history.
SOURCE CODE
This is a programmer's site; making code legible is important; that includes indentation and (most likely) colorization support. You can allow users to carry this burden with tools like jEdit's Code2HTML plugin if you go with HTML markup. You take on a maintenance task if this support goes onto the server - updating the parser/encoder for every syntax change in every programming language.
[As a side note: I've noticed that automatically converting carriage return/linefeed into HTML elements can be result in interesting battles with the software when it comes to source code, depending on the approach chosen.]
Whatever markup you choose, I would create a minimal list of key must-have elements rather than supporting things just because you can. To me, that means supporting source code, links and the ability to paragraph text - pretty much anything else can be omitted to begin with and added as needs are identified.
McDowell on May 14, 2008 2:04 AMLimited subset HTML (explicitly listed).
That is, just basic formatting and hypertext (, , , MAYBE . No need for font and color control and their inevitable massive abuse.).
BBCode is the devil.
Sigivald on May 14, 2008 2:07 AMWYSIWYG with HTML, thank you.
Damian on May 14, 2008 2:53 AMIf you're a programmer, you damn well better know HTML.
WTF is that for a stupid statement?!
I contend that a big portion (if not the majority) of the world-wide programming population doesn't know HTML because the never needed it and never will.
A crapload of code is written in C. Do you know its specs like the back of your hand? No? Oh, well, you're not a programmer then!
W on May 14, 2008 2:53 AMI, as most programmers prefer HTML.
Also it would be really nice to have some kind of syntax highlight for code blocks, it don't need to be anything complex, just to highlight common control structures and strings.
Well, don't go for half measures then - if GUI handholding is worthless crap for incompetent losers, demand people use telnet from a command line and type the binary network protocol themselves. That's the only way that you'll limit contributions to true uber-geeks.
People who will never get laid even if they try to pay for it are definately the best people to ask questions about how to design software that will actually appeal to the mass market.
Bob on May 14, 2008 3:01 AMam I crazy or the url I entered for "website" was changed?
I entered this: https://twitter.com/flupkear
and got this: http://flupkear/
yep, for some reason your blog is changing the Twitter url :S
javier on May 14, 2008 3:04 AM+1 for Markdown.
As others have stated, it's extremely intuitive (I used much of the syntax in plaintext files before even learning about it), and it allows you to drop down the HTML if you want to.
James on May 14, 2008 3:12 AM"I have one iron-clad design guide: this is a site for programmers, so they should be comfortable with basic markup. None of that nancy-boy GUI toolbar handholding nonsense for us, thankyouverymuch. If you can sling code, a little bit of presentation markup is child's play."
jeff
why would U think a programmer has to know markup? As somebody who does I do work in an organisation that has a range of deveopers who span COBOL, C, VB and Java. A good percentage of these would not use markup but would be regarded as valuable programmers.
So are you just confining yourslf to web programmers?
Stephen on May 14, 2008 3:25 AMIve been thinking this too... possibly linking to a subset of the more modern html might be a good thing (allow strong, but not b) for instance.
Stu on May 14, 2008 3:36 AMI also prefer HTML as formatting language. However I also like "some" preprocessing. There should be an option to turn returns into br / or something...
Have you checked out the YUI Rich Text Editor (http://developer.yahoo.com/yui/editor/)? You should provide that for "simple editing". In most cases it will be more than sufficient, because in most cases you'll only need plain text with some highlighting of single words.
BlaM on May 14, 2008 3:47 AMThere are a million opinions here, but I would suggest a markup that doesn't get in the way if the user is just typing a response.
I think markdown does a remarkably good job at letting you just type. In fact I often don't realize sites use it, but then find out that my lists got formatted nicely. That's great.
Questions and answers probably don't need the vast number of special options that say Textile offers. I vote for keeping it simple :)
Adam Sanderson on May 14, 2008 3:48 AM+1 for Markdown. It looks the cleanest to me.
Tom Robinson on May 14, 2008 3:49 AMGo with BBcode. I'm sure everyone on here already uses it in various forums.
tim on May 14, 2008 4:30 AM*TEXTILE* looks the best IMO.
*bold* and _underline_ are pretty standard, even GTalk uses it to format text.
Looks simple enough, however, I prefer to do everything with HTML...
kevin on May 14, 2008 4:41 AMDo yourself a favor and use wiki markup. I used to prefer HTML or BBcode (since I was more used to them). However, at work we now use an internal specialized wiki.
I can tell you that I am a complete convert. Wiki markup takes less keystrokes, less brackets, and less non-obvious syntax.
Lots of people know it these days, its very easy and powerful, and it isn't burdened by a ton of ANGLE BRACKETS.
It also allows you to "force" some continuity throughout the site visually.
I think you will be making a huge mistake if you just use HTML.
TM on May 14, 2008 4:50 AMLet me share my exeperience in this area. I ran into this problem during creating http://dotnettipoftheday.org. The goal was to provide users with ability to enter new tips which may contain C#/VB.NET code. And of course the code should be well formatted for easier reading. I tried JavaScript WYSIWYG editors but they are far from perfect. They don't provide enough options to format code examples. Now, taking into account that all site users are .NET developers, I think that the best solution will be Silverlight WYSIWYG editor. Such editor can give you desktop-like experience and you have enough control over formatting.
kostya.ly on May 14, 2008 5:05 AM+1 for Markup from me.
html is presentation, not content. You should be using Xml :-P
I'm only half joking. I thought this new site was supposed to be about the content. Better to have a DSL (Domain Specific Language) to handle this.
You've got to ask yourself, what do users NEED to write about? What is 'good enough' to satisfy the needs of the most. We don't care about the 1% who want to write their doctoral thesis on the site. We want people to post things quickly, easily, and be nice to read. If you look at the abhorrent mess of websites, you might soon rethink the 'all coders can do html just fine' line of reasoning. MySpace anyone? For popularity, I reckon more people are better at BBCode from boards than they are used to writing raw html, but I still don't want BBCode
Also. because of the already overused angle bracket tax, are we going to have to escape all and or risk that being interpreted as html also?
For coders, by coders.
1. Headings, only 2 levels necessary
2. Code blocks - necessity
- optional ability to indicate language. pretty printing is not a 'nice to have'.
3. Links. external and cross-referencing.
4. Basic markup (bold, italic, highlighted?)
We don't need colors, div's, margins, padding, javascript, alternate fonts (or do we), different size fonts, etc. You don't even need lists or bullets. Than can be done well enough manually. Simple tables might be nice.
Otherwise the postings will look like a big pile of dog crap, and in web 2.0 nobody likes crappy looking websites.
Wikipedia is a good example because they can take their DSL and convert it to anything. In my mind, what you're writing is wikipedia'ish, so look to the leaders, follow their example, and improve where they have made mistakes.
Oh, and add a 'preview' function too.
fluffy on May 14, 2008 5:15 AMWhy does it have to be just one method? Let them choose html-lite, or Textile, or whatever they prefer. That way, you don't have to create a new markup method. Just let the user select one from a (hopefully short)list.
Neil Baylis on May 14, 2008 5:38 AMI'd prefer it if you left the choice of markup up to me for each individual submission. Sometimes I need full HTML to format something properly, often I just want to use plain text. Obviously this doesn't work for a wiki page where there are many contributors, but for individual submissions, choice of markup would allow everyone to write in a format they are familiar with.
The official implementation of Markdown supports HTML in the input, so you can use Markdown, and your users will still be able to use HTML if they want to.
Jeff,
What kind of complex visual structure do you want to appear in your site that cannot be expressed in those simple markup languages?
Besides, the fact that I can program HTML, doesn't mean I don't prefer something simpler if it's available. So, I'd go with some of the other markup languages if you asked me.
Gustavo on May 14, 2008 6:05 AMIsn't the textile language just sort of troff lite? We can leave troff in the horrible bad old days where it belongs, please.
reed on May 14, 2008 6:09 AMThe biggest feature I can see in wikipedia that would seem to be missing in basic HTML is the automatic cross referencing functionality. A user shouldn't have to look up the URL to type [a href="http://en.wikipedia.org/Markup_Languages#Light_Weight"] when the server can figure it out for them from [[lightweight markup language]].
I guess you'll be adding some special syntax to html for those sorts of issues?
Mike on May 14, 2008 6:09 AMWe opted for Markdown in our CMS, because clients in combination with visual editors invariably screwed things up horribly. Although the output would be well-formed, it was inevitably nonsense, and it was far too easy to copy and paste the wrong bits of formatting from Word or somesuch (and lo, if we disabled that bit of functionality, there'd be complaints that they could no longer copy and paste other bits of formatting from Word).
Markdown has a double-pronged advantage for us:
1. It's simple for clients to learn how to mark stuff up properly. Because they have to think at least a tiny bit about the separation between content and formatting, it's easier for them to retrospectively tweak the markup to match what the content's supposed to convey as opposed to what Word made it look like
2. We can stick raw HTML into posts where a client's asked us to do something more complicated than they can manage themselves—Markdown's smart enough to leave the HTML as-is. Our clients, not being programmers, aren't likely to ever put in HTML themselves (and are aware that if they do, they stand a greater risk of screwing up their pages and so caveat emptor).
Works well for us.
Mo on May 14, 2008 6:10 AMHtml is harder to learn then the others when it comes to people without any experience. It has tags and attributes, which can be hard to wrap your mind around. These Lightweight types are easier to use for a beginner. Wikipedia is not a wiki for developers, it is for users who have never made a website before.
Think about your audience, if it's developers, they would be able to use html and have no problems with that. Although, they might then be able to interfere with your site code, which can be quite damaging. Leaving an a tag open, table open, javascript(!) etc.
Though, I have to say that it is a lot easer to do simple styling like bold and italic in bb code than html (specially if you are to make it xhtml strict valid)
Thomas Winsnes on May 14, 2008 6:12 AMI really think you ought to provide a simple wysiwyg editor, with the ability to edit code by hand. There's plenty of free, cross-browser applications available that you simply need to drop in and tell it what tags to allow.
Why make people do the markup by hand just because they can? That's like making a user edit a config file instead of providing them options within the program, just because they can.
I'm glad you at least went with HTML though, so no need to learn a new markup syntax. Especially with the completely _unintuitive_ underscore to mean italics. I can't think of a worse choice. I mean, there's the slash which is s/anted like italics, or the underscore, which looks much like an underline. Ugh. =)
Sammy Larbi on May 14, 2008 6:12 AMAt Pendant's corner over here I have noticed than none of the markup examples would produce the HTML above. Replace "Some examples" with "A Few examples" at first glance.
Pedant on May 14, 2008 6:13 AMUse ReST!
Calvin Spealman on May 14, 2008 6:15 AMWe had the same discussion when we developed a wiki-like interface in our application.
It seemed that Markdown was easier for users to understand than Textile after initial tests.
I would not go the HTML way since it allows users to break any semantic value you could find in their entry.
I would neither create my own language based on both Markdown and Textile, since users, especially blog users, are very used to one of them. You would just create confusion and mistakes.
Vincent on May 14, 2008 6:15 AMAmen. I curse every second that I have to think about (or, God forbid, actually look up) the correct markup to link something or make whitespace non-wrapping or whatever. I already know HTML. You already know HTML. And you, over there, who doesn't already know HTML: the time you spend learning the tiny subset of HTML that you need to post a comment to a web site will be much more worthwhile than spending that same time learning one of the umpteen subtly different "lightweight" markup systems out there.
John on May 14, 2008 6:17 AMTechnically, wiki syntax should be
"[http://is.gd/gns lightweight markup language]", not [[lightweight markup language]]]. But that's because I'm anal. :]
I personally disagree with you, html markup, while easy to understand for us coders, is quite harder to type than Textile or Wikimarkup. (and less pleasing to the eye, imho)
lucasbfr on May 14, 2008 6:22 AMIsn't the point to be able to let people express their ideas quickly and easily? Why not let us use the GUI editor buttons, it's not like we're trying to prove our l33tne55 to anyone; we just want to push text into the computer efficiently.
Failing that, BBCode since it's simple and doesn't create tons of visual clutter - we're writing human readable text with markup, not Perl ;) - unless we're writing about Perl, of course.
Whatver you choose, please let it handle code in a sane way - what I mean is a little scrolling box with the code in rather than a five-screen scrolling mess, not decorated with line numbers, and in a format that can be easily copy and pasted (so no random blank lines or loss of indenting). Oh, it should handle non-wrapping lines of code correctly too without destroying your page template or making the browser have a horizontal scrollbar.
James on May 14, 2008 6:25 AMIn the last sentence it sounds like you were going to write your own markup language, perhaps inspired by some of the above.
If you invent a new markup language, or one which uses a combination of features from other ones, you are doing it wrong.
No matter how clever you think you are, no matter how frustrated you are with existing standards, the world does not need a new markup language. I don't care about the conventions on your site. Unless you expect that I will be using your site more than any other, I want it to work with conventions I learned elsewhere.
Nor do they want to deal with the inevitable bugs your new markup language parser will have. Use an existing standard or don't use any at all. I suggest: Wikipedia, a subset of HTML, or plain old text. These are the only reasonable choices.
Perhaps I misunderstood you, because you seemed to understand this fact, but in the last sentence your "BUT I'M SO MUCH CLEVERER THAN ANYONE ELSE" brain took over.
Neil Kandalgaonkar on May 14, 2008 6:29 AMI've recently been mulling over this very subject, because my company uses a CMS with a *horrible* GUI that completely mangles input, produces invalid markup, etc.
Markdown certainly looks the 'easiest' to learn, although I'm suspecting there's a lot more to it than presented here (off to research later ...)
In my experience, though, however simple the HTML subset, and however much training you give re: elements, attributes, valid nesting, etc., people will always struggle with that most fundamental of beasts: the humble hyperlink. Let's be fair, to a non-developer, a URL is a pretty complex string of syntax. And editors simply resort to copy+pasting. If I were adressing a 'low-tech' audience, I'd seriously consider one of:
a) Denying any out-bound links + clever wiki-style auto-linking
b) Auto-linking allowing out-bound links via a search engine (or similar)
c) Robsust URL parsing looking for obvious errors
+1 Markdown...
It allows HTML and does a very nice job of very easy to use formats...
Jake Good on May 14, 2008 6:30 AMI'd like HTML (and as a result Markdown is good too).
It would be nice to do some slightly pretty with code snippets though.
Why invent something new when there are so many reasonable choices? I agree with those of you that say that an HTML posting syntax would be ideal. If that is not possible for security reasons, please don't invent something new. Let me leverage the time spent learning textile or markdown or whatever existing markup technology you decide to use. My time is valuable and I'd rather spend it conveying a message rather than learning a new way to format a message.
Jay on May 14, 2008 6:34 AMaddressing those wiki points -
1. HTML focuses on content, not presentation - semantic html let people focus on (or sometimes even gain deeper understanding of the format of) their ideas.
2. Why use domain-specific markup when you've already got global markup that serves all your needs?
3. Tables aren't any less difficult to understand then the puzzling mixture of dashes, asterisks, and brackets that wikis employ
4. No - they don't need it, and you don't have to give it to them.
5. Only if you leave yourself open to it... "we're too lazy/busy to address security concerns" is not a good reason.
6. What makes Wiki markup easier to learn then HTML? Why would you learn a new markup language, which will just get converted back to HTML again? Isn't that a little redundant? If people need to learn a markup language, why not learn the one that is universally used in every page on the web?
... I'm not a big fan of wiki markup either - bbs tags are only marginally better.
matt on May 14, 2008 6:36 AMJust a thing: a way to get code coloration is, I think, necessary. Seriously.
Also, bbcode blows (and 9 out of 10 bbcode parsers are purely regex-based translators, thus break down real fast), thanks for not using it.
Masklinn on May 14, 2008 6:36 AMI generally agree that a subset of HTML is fine for formatting. If all you want to do is have lists and paragraphs and bold and italics, it's exactly as clear as almost any other markup language. If you're willing to automatically add p tags on double newlines, most people can muddle through without touching it at all.
However, Mediawiki is a special case, in that the html tags don't actually fully represent what most of the corresponding markup means. As you say, it represents the structure of the data, and the structure you give the data using wiki markup has side effects beyond the formatting you'd get from basic html.
For example, take the 'triple equal sign' - on first glance, it's just an h3. That doesn't tell the whole story, though - there's some deeper meaning to that tag. Not only does it do your h3 formatting, but it also generates a named anchor, and it automatically appends a link to it in the table of contents. It does have the same logical meaning as h3, but it does more - h3 is a subset of triple equal. You could of course impart that power upon h3, but I'd argue that's even more confusing than having a separate syntax.
This doesn't even touch on the templating language or the category system, both of which have no equivalent in html. So with mediawiki, you *know* html won't meet all of your needs - so coming up with a language that does allow for everything only makes sense.
Jeremy T on May 14, 2008 6:39 AMI usually love your posts... but this is exactly the kind of attitude which stops developers from making good UI imo. You've obviously thought about this a lot, but you immediately ruled out all of the best approaches by making a big assumption about your target audience.
Just because you expect every good programmer to be comfortable with markup doesn't make it so... and as you often remind us, there are plenty of bad programmers out there.
Maybe I got it a bit wrong... but I don't think you should expect your users to understand your markup, or even HTML. Showing the markup and allowing the user to edit it is fine, (ala wiki) but not implementing nice buttons and an interface... you shouldn't demand anything of users that isn't necessary imo.
I'll take it all back if you plan on having the buttons as well... but thats not how the post came across. :)
Jheriko on May 14, 2008 6:39 AMAs much as the bugs in Blogger annoy me, the one thing they do right is to allow the user to go to source and edit the HTML. For the users that don’t understand markup, they have a WYSIWYG editor.
Reinventing a markup language is the wrong approach. I've been creating HTLM pages since 1994, and every time I edit a Wikipedia page I roll my eyes because I still have to look-up that URL syntax. I agree with Calvin, use REST if you want to implement "automatic cross referencing functionality" but remember, that is a server-side function, not a mark-up issue.
One recommendation, I would create a White List of HTML you will support. This way you don't have to try to manage a Black List of restricted tags.
Josh Hurley on May 14, 2008 6:43 AMIt seems to me that an assumption is being made that all developers know how to code in HTML. As a desktop developer I rarely, if ever, have to touch web code and hence will have to invest time and effort into learning a whole new 'syntax' if I am expected to format my posts correctly.
While I understand that there will always be a need to have some kind of markup I cannot see the reasoning behind forcing us to hand craft it. If I am popping onto the site to post a question (or indeed an answer) then I probably have that problem space loaded up in my brain. Having to interrupt that and find out the correct way to markup my post seems like a sure fire way of reducing the integrity of that post.
I can see no logical reasoning why you feel the need to forgo a simple GUI driven text box, that requires minimal thought while using, in favour of forcing us to learn whatever you choose to be 'my way is the best way'.
That just smells of elitism.
One of the books on your reading list says it all - "Don't Make Me Think: A Common Sense Approach to Web Usability".
Martin Wallace on May 14, 2008 6:46 AMI've always liked 37signals' solution - give them just a few whitelisted tags for bold, italic, links, quotes. Forget about attributes. Keeps it pretty clean, and they can explain it in a sentence.
Evan on May 14, 2008 6:48 AMIn the spirit of various other articles on this very blog, wouldn't the correct answer be to allow both html _and_ simpler markup?
I know HTML inside out, but given the choice I'd rather write in textile whenever possible.
Jack on May 14, 2008 6:52 AMIsn't the point of software to make things easier? Just because as a programmer, I can code in HTML doesn't mean I want to code in HTML just to write a comment. WYSIWYG is not a bad word.
Although, for a developer site there is a very limited set of markup needed:
* Plain format (the default)
* Lists (ordered/unordered)
* Bold/Italic/underline
* Hyperlinks
* Sourcecode (the biggie for a programming site)
One nice thing would be color formatting sourcecode automatically based on language.
Jeff Cuscutis on May 14, 2008 6:55 AMIf I were You I'd definately go for plain old html but whitelist only a bunxh of tags and limited attributes per tag, and ofcourse validate properly before accepting anything. I've done it before and it's rather simple.
Yet I strongly disagree with the whole "If you're a programmer, you damn well better know HTML" thing; I might be comfortable programming for the web but that doesnt mean all programmers are. I still know some that only do specific FoxPro based stuff, or even just a small subset of c on embedded devices. These people are definately programmers, but don't have any reason to know anything about HTML.
Therefor, and because I am not you, I'd still go for a nicer custom wysiwyg editor to generate nice lean and valid HTML with a code button for advanced users perhaps. I have this love for the KISS principle, but I realize it applies to my users' point of view, not to mine.
Kris
kris on May 14, 2008 6:56 AMI find the simple formatting^ offered by the Australian broadband forum Whirlpool to be quite good. It isn't as full featured as many others but in most instances, its more than enough - they conveniently allow you to enter in raw HTML if that suites your purposes better as well.
^ http://whirlpool.net.au/wiki/?tag=whirlcode
Al on May 14, 2008 6:57 AMI'll note that the blog Making Light uses a subset of html for comment markup, including urls, and the users there seem to have no trouble figuring it out. The users are typically science fiction nerds, but at least 2/3rds do not come from technical backgrounds, but when sufficiently motivated can even figure out html, with prompting.
I'll append a section of prompting, but I had to mangle the angle brackets to abide by the "no HTML" rule you have for comments. (Irony!)
HTML Tags:
[strongStrong[/strong = Strong
[emEmphasized[/em = Emphasized
[a href="http://www.url.com"Linked text[/a = Linked text
Spelling reference:
Tolkien. Minuscule. Gandhi. Millennium. Delany. Embarrassment. Publishers Weekly. Occurrence. Asimov. Weird. Connoisseur. Accommodate. Hierarchy. Deity. Etiquette. Pharaoh. Teresa. Its. Macdonald. Nielsen Hayden. It's. Fluorosphere. More here.
Have you tried out markItUp:
http://markitup.jaysalvat.com/home/
It's an excellent javascript utitity that puts a friendlier face on the standard textarea. I use the HTML version to allow my users to enter snippets of XHTML, but it also works with MarkDown, Textile, etc.
I also wanted control over what tags and attributes the users are allowed to enter, so I wrote an extension that validates the XHTML by parsing it against a list of valid tags/attributes (defined in JSON).
Ben Mills on May 14, 2008 6:58 AMI'm wondering if you have put any though into automatically supporting syntax highlighting for code snippets? I've used several sites that do this (forums.devshed.com, for example), and while it's rarely perfect it can really help when reading through posted questions.
Joel Coehoorn on May 14, 2008 7:00 AMA vote for Textile. If you just want to type, and be able to enter some bold text, lists, headings or links, it works very naturally. Novices have no problem with it either (I use it as the default formatter for a CMS backend). I prefer it (slightly) over MarkDown because I find the way you enter headings in it clunky.
Textile also makes quotes beautiful: ldquo;Like this.rdquo; when you enter "Like this."
Joost on May 14, 2008 7:01 AMDoesn't the conclusion here go against the reasoning in the "XML: The Angle Bracket Tax" post a couple of days ago? I know you can get away without all the closing tags in HTML, so it is slightly better than XML, but to willfully twist your own words:
1. Should HTML be the default choice? The authors of most styled text entry code developed that would probably say NO to this.
2. Is HTML the simplest possible thing that can work for your intended use? NO.
3. Do you know what the HTML alternatives are? YES
4. Wouldn't it be nice to have easily readable, understandable posts, without all those sharp, pointy angle brackets jabbing you directly in your ever-lovin' eyeballs? Ummm, Yes?
As pointed out above, whatever you do it isn't going to actually be HTML. You're going to have to add your own stuff to it and limit it in some ways. I admit my HTML knowledge is basic, but I've no idea how to enter some syntax highlighted javascript in HTML, but i can manage it in mediawiki syntax.
Your link to "why doesn't wiki do HTML" is broken. It should be
http://c2.com/cgi/wiki?WhyDoesntWikiDoHtml
Dave on May 14, 2008 7:02 AMI vote for html format with a WYSIWYG editor, such as one of these:
http://www.geniisoft.com/showcase.nsf/WebEditors
Just because I know html, doesn't mean I always want to type it or any other markup to enter a comment. All of these editors I have looked at supply a Design (lazy) mode and a raw html mode.
Michael Lang on May 14, 2008 7:02 AMI argued the same thing at last year's wikimania conference. Why spend all this work on a common wiki format when we could just use a subset of HTML? For those concerned about usability, we're finally getting some decent rich text editors for HTML textareas. These will be fine for most users and they also already produce valid XHTML. Yeah, I think your completely on the right track.
Though, I do like to use Markdown for some documents. IMHO it's the best of the lightweight formats.
Aaron on May 14, 2008 7:02 AM"This c2 wiki page titled Why Doesn't Wiki Do HTML?..." The link is broken.
Also, I'm sure Wikipedia loves it when you link your image directly to the "Edit" page for Joel's entry. Remember, folks, if you have to make a test edit, please make it to [[Chicken]], not [[Joel Spolsky]]!
Anonymous Cowherd on May 14, 2008 7:05 AM"This c2 wiki page titled Why Doesn't Wiki Do HTML?..." The link is broken.
Also, I'm sure Wikipedia loves it when you link your image directly to the "Edit" page for Joel's entry. Remember, folks, if you have to make a test edit, please make it to [[Chicken]], not [[Joel Spolsky]]!
Anonymous Cowherd on May 14, 2008 7:06 AM@Jheriko: seriously?! Any *competent* (not "good") programmer must be comfortable with markup. Otherwise it's not, *by definition*, competent programmer.
As for bad programmers out there, they're not stackoverflow.com's target audience and the more of them runs away screaming, the better -- their input equals noise and degrade the value of the site for people who actually do have something useful to say
Peter on May 14, 2008 7:07 AMYou could be different and split the difference. Add a few tags to html, like a markdown tag or a tag for the other commonly used internet formatting options, with everything untagged defaulting to a whitespace sensitive version of html (so people don't have to type paragraph tags).
It requires some additional processing work, and that's never fun, but it seems trivial to me to implement and is adaptable to different formats in the future.
Ben on May 14, 2008 7:08 AMSo you are going to use plain-old HTML!
What about new-lines? Are we going to have to type "BR" all over the place?
Also... people are going to want to post HTML code *AS CODE*, without having to type all those escape sequences just to post some example of a DIV that isn't working or whatever.
Of course, all these things COULD be done - we ARE programmers!
But will anyone bother? Or will they just go somewhere else?
Finally, you may be making a mistake in saying that it is "secure, with careful parsing" - this sounds like pride coming before a fall to me!
Syd on May 14, 2008 7:10 AMAn additional consideration to my above thoughts occured to me. You could add pre-processors to tags for each language c++, etc, and they could make an attempt to apply proper indentation and code highlighting that would be more versatile than a language agnostic version.
Ben on May 14, 2008 7:11 AMThe one thing I know that argues against using actual HTML for post styling is if you want people to be able to post markup or even code-- if someone posts an example containing a 'for' loop, the angle brackets can cause all manner of weirdness. And if they try to post some sample HTML, then look out Francis.
So you either have to define a markup pattern that passes through untouched whatever's inside ('code' is a common choice) or else move to something like Textile or Markdown and set it up to encode stuff like angle brackets so it passes through untouched.
Eric Meyer on May 14, 2008 7:11 AMI really think you should just use (or build) a good WYSIWYG editor. As a coder I can write in assembly language, but it doesn't mean I want to.
If you want to offer the ability to alter the raw markup then you can give users that option, but I want my editor fully cooked please...
Chris E on May 14, 2008 7:12 AMI agree with all those that just want to use HTML. It fits the target audience and likely usage model:
Audience = programmers
Usage = occasionally
It's not like I will be on stackoverflow.com for hours on end everyday trying to write programs. There's no need to learn anything extra or new.
Solburn on May 14, 2008 7:16 AM@Jeff:
"I'd much rather rely on a subset of trusty old HTML than expend brain cells trying to remember the fake-HTML way to make something bold"
Ironically enough, there isn't a way to make something bold* in (modern semantic) HTML since, as you pointed out in a previous blog entry, HTML is the 'model', not the view :)
* The b tag was officially "discouraged" way back in 1999 with HTML4.
The current HTML5 working draft doesn't go as far as deprecating it (yet), but it does say "The b element should be used as a last resort".
Try to make a plain jane table.
In HTML it's sane. In light-weight markup lingo, it sucks; they've tried to reduce the tr and td tags even more; Wiki makes it completely incomprehensible.
The "rich" text editors are rich, but not robust. To write HTML fast, you have tag completion and a suggestion system for the values. RTEs fail on this point.
I find it immensely more pleasing to just get the raw HTML, dump it in my text editor of choice, and copy + paste it back again.
Rob Janssen on May 14, 2008 7:21 AMSeeing as how most of the potential users of Stack Overflow currently reside at forums, it might make sense to cater to them and use whatever markup language is most prevalent across the inter-tubes. In my travels I've found the most widely used markup is BBCode or HTML.
BBCode is easy to use and offers the potential to add custom tags to allow special functionality. I think it's your best bet.
Matt Briancon on May 14, 2008 7:24 AMCleverEndQuoteAgain./CleverEndQuote
Adam on May 14, 2008 7:24 AMAnd you'd be building that in VB.NET Jeff?
Or have you jumped ship to C# now too? Is all that "love" you show about VB.NET just empty talk?
VBMan on May 14, 2008 7:25 AMI'd still prefer a row of buttons. What I'm a coder so on your site I have to hand code because I can? Why not combine them, let me code or click a button when I'm lazy.
Mike on May 14, 2008 7:26 AMA problem with HTML is unclosed tags. Leaving, for example, a bold tag open can cause text farther down your page (ie your footer) to render in bold. So if you do allow straight HTML, you'll have to create a script which finds and closes any tags left open. Considering all the different types of tags, this is no easy task. I'd recommend Texttile or Markdown for this reason.
Chad Braun-Duin on May 14, 2008 7:26 AMDarn, was meant to be CleverEndQuoteAgain./CleverEndQuote
Adam on May 14, 2008 7:27 AMActually, what I think matters most is if you provide a good, accesible quick reference on the editing page.
It doesn't really matter which language it describes, as long as it's present and small but complete enough. I mean, people will need to look HTML up too eventually, editing your site.
Adam on May 14, 2008 7:30 AMI agree totally. Of the four markups you presented, the only one that was readable enough that I didn't have to refer back to the rendered version to see what was going on was the BBcode. (For a couple I'm still not sure how the first quoted section's end is delimited). But BBcode is practically html with square brackets, so why bother?
As for developers like Martin who don't know HTML, I'd say you should be prepared to learn. This isn't Swahili we are talking about. Learning a markup language for a real developer should be trivial.
In fact, I'd say lack of basic HTML skills in posts might be a good way to spot the posers.
T.E.D. on May 14, 2008 7:34 AMIt depends what you want to do ?
HTML is fine for just formatting (That's what it is for!) but you then have the problem of cleaning the HTML, filtering broken syntax, and your pages are not a consistent format anymore....
Wiki syntax is more than formatting it adds meaning to the text which as a side effect might format it, e.g. Tables since they are a standard format can be read and processed by the wiki as data, internal links work both ways automatically, categories/tags aggregate data automatically etc ...
Perhaps you should use XML instead? [The Angle Bracket Tax] ;-)
BTW Internal links in mediawiki are [[article]] or [[article#section]] external links are not much harder [http://otherwebsite.net/Light_Weight.htm] but are deliberately avoided ....
I have used freetext (http://freetextbox.com/default.aspx) for a few projects and it has worked well. I think it works for novice and advanced users.
BrianK on May 14, 2008 7:40 AMI think one of the biggest concerns with allowing raw html is all the crazy things people can put into your website.
XSS, ugly images, ads, annoying ads, spam, and the like.
Special markup has the function of limiting what people can do.
Jeff Davis on May 14, 2008 7:42 AMso if someone wanted to come to your site to learn more about html they'd be screwed?
is the site intended for 1337 programmers to come and get more 1337 or are you intending on allowing beginners to come and learn too?
you've completely gone off the rails on this one, especially if you're considering writing your own hybrid mark up language.
burnside on May 14, 2008 7:44 AMI have a function for my forums that strips out script,img,etc and everything in between the tags. I have a small warning for the user on what tags not to use.
BrianK on May 14, 2008 7:46 AM Ironically enough, there isn't a way to make something bold* in
(modern semantic) HTML since, as you pointed out in a previous blog
entry, HTML is the 'model', not the view :)
That's a pretty good point. Perhaps a standard style sheet could be set up, which posters could reference?
Then again, what you are *supposed* to be using is tags like em (emphasis), and strong (strong emphasis), and let the user's browser do that however the user wants such things presented (boldface, underlineing, big font, yelling the word, whatever).
This is precisely why I don't use WYSIWYG editors for HTML. They invaribly have tons of style buttons and almost no proper emlement buttons. If your development tool completely misses the point of the language, the results can't be good.
A problem with Markdown is that it interprets a single underscore as a bold tag. A hassle if you're trying to talk about programming or something that uses underscores.
Jonathan Drain's DD Blog on May 14, 2008 7:46 AM+1 BBCode
easy to parse. easy to remember.
Stupid me, I meant to list script,img tags etc in my previous post. They got filtered.
BrianK on May 14, 2008 7:47 AMKeep it simple! Consider the most common use, probably a short post of a few paragraphs, some bold, a link, a code section, and a list. In these cases any of the markup languages result in a much simpler and easier to understand post than would be with HTML.
Yes, with HTML you get the "I can do anything" but don't focus on the edge cases and ignore what people will be using it for 99% of the time. I've been writing HTML since '95 and one place I don't want to see it is in a forum (offhand I can't think of any forums I frequent that actually use HTML).
At this point there may well be more people familiar with with the Wiki syntax than with HTML...
Dave on May 14, 2008 7:47 AMDirect HTML input is the autobahn to invalid XHTML.
http://iamacamera.org/default.aspx?section=developid=73
In ten years, we will look back with nostalgia at the days when we left comments on your site via direct HTML input -- the way we fondly recall bygone years when we configured our ISDN modems and put jumpers on hard drives to designate them master/slave.
Direct HTML input is at best, [i]quaint[/i], but by no means a long-term viable solution to online markup entry.
@James
http://code.google.com/p/syntaxhighlighter/ this JavaScript library seems to be the best way to document code with syntax highlighting, automatic line numbering and copy and paste support. I use it in a lot of my documentation.
Robert S. Robbins on May 14, 2008 7:50 AM"As for developers like Martin who don't know HTML, I'd say you should be prepared to learn. This isn't Swahili we are talking about. Learning a markup language for a real developer should be trivial."
It may be trivial, but it should also be optional. The interface should never get in the way of usability. Jeff makes that very point himself in "Reducing User Interface Friction" (a href="http://www.codinghorror.com/blog/archives/000866.html"http://www.codinghorror.com/blog/archives/000866.html/a)
"Reduced interface friction goes a long way toward explaining the popularity of services like twitter and tumbr. What's the minimum amount of effort a user can expend to produce something? The answer could be a key competitive advantage.
That single input box on the Google homepage starts to look more and more like an optimal user experience. It might be unrealistic to reduce your application's UI to a single text box-- but you should continually strive to reduce the friction of your user interface."
Martin Wallace on May 14, 2008 7:51 AMPlease don't use wikipedia as a model markup language. It's badly defined to the extent that the only 'compliant' parser is mediawiki itself, which consists basically of a long series of regexes. It's a huge shame that one of the largest consolidated sources of information on the web is all written in a language that's extremely difficult to robustly parse.
As far as using HTML goes, it depends on your target audience. For stackoverflow, I would agree, but for more lay-person sites, HTML seems unnecessarily complicated. One forum my wife and I both post on uses a subset of HTML, and I've lost track of how many times I've had to tell my wife what the syntax for links is. "a href" is second-nature to us, but it's not intuitive if you're not already familiar with HTML.
Nick Johnson on May 14, 2008 7:57 AMI think anyone who calls himself a web developer should be proficient in HTML. Not just good or familiar with it but proficient. Take Visual Studio for example. I see too many developers squeak by working in Design Mode and when work in Design Mode breaks down (as it often does) they are lost in the sea of code in source mode. I don't even use Design mode. I code entirely in source mode. Its a sad state in our profession when a good percentage of developers can't "debug" HTML code. Sorry about the detour.
BrianK on May 14, 2008 8:02 AMThey all look fine for the most part except for how they handle internal and external links. There the Wiki format wins out in terms of being intuitive and easy. It's frankly the most important bit, and I think even HTML screwed that one up -- a href="" is not intuitive, it's possibly the most non-humane way I've ever seen for how to do links.
"I know, let's make the tag to link to external sites the same one as we use to make internal anchor points. And while we're at it, let's use a totally opaque acronym to designate the link element. Because LINK would have been too straightforward."
Shmork on May 14, 2008 8:04 AMWhatever markup you go for, please make sure you only offer a limited but useful set of formatting/style tags.
One of the problems with sites that have a lot of user formatted content is that they have a horrible inconsistent mix of styles, layout and structure that makes flicking through the site a constantly jarring experience.
Things I'd want when posting a question/answer/article:
- the ability to include real source code (without having to alter it to remove HTML characters etc) AND have all my formatting/indenting preserved AND have the code automatically coloured in the post.
- include images/diagrams (without having to host them myself on some other site).
- link to other articles on the same site and to relevant external sites (e.g. Sun, MSDN, W3c etc).
- attach example source files.
Graham Stewart on May 14, 2008 8:08 AM(My point being that none of the goals above are entirely satisfied by HTML, or most of the other simple markups)
Graham Stewart on May 14, 2008 8:09 AMSee reddit's comment box. Little expandable notes on how to use markdown (very handy as a reference when you forget something). And as someone mentioned near the top, the official markdown engine supports html tags. Best of both worlds.
dude on May 14, 2008 8:09 AMI agree with sticking with HTML, if you aren't going to toolbar the text widget.
One good example is Lifehacker (and probably other Gawker blogs). They have a live comment preview system that uses HTML markup. Nice.
Otherwise, do it with less friction. Give me some toolbar icons.
piyo on May 14, 2008 8:11 AMThe comments to this entry are closed.
|
|
Traffic Stats |