At some point in any WinForms project, you're bound to need either:
Although I am ambivalent towards HTML, there's no question that it is a far, far better solution than the nasty, crusty old Rich Text Format. RTF is HTML gone stupid. If you're ever bored and want to take on a brain-meltingly difficult project, just try writing a RTF to HTML converter. Oh sure, it seems easy enough.. but I don't think anyone can appreciate how profoundly irrational RTF is until they actually sit down and work with it in detail. Ugly doesn't begin to cover it. Based on my limited research, RTF seems to have evolved as the de facto document storage format for early versions of Microsoft Word, apparently based on the whims of whatever development team was working on Word that week.
To be fair, the RtfTextBox actually isn't that bad. It's effectively "free" as far as distribution footprint, and it will work for most basic formatting scenarios including URL and mailto: hyperlinks. In fact, Craig Andera just released a servicable enhanced RichTextBox. The only problem is that it's, well, RTF. Just try inserting bulleted text to see what I mean. If you're dead set on a control that renders HTML, there's only one solution I'm aware of in .NET: IE interop. Lots of people are doing it:
And it works. Sort of. Like all heavy duty .NET COM interop, you can't escape the feeling that you're building a giant house of cards, prone to catastrophic failure at the first gentle breeze. There's also the matter of our little friend Microsoft.mshtml.dll, a primary interop assembly weighing in at 7.8 megabytes. And god help you if a user doesn't have IE installed on their system. Inconceivable!
While I'm not against interop per se, it seems like overkill to harness the entire bulk of IE to render a little HTML. What's really depressing is that there are precious few options, interop or otherwise, for getting proper HTML into a WinForms app. What I'd really like to see is a completely managed, lightweight HTML rendering control written entirely in .NET. In other words, something with the basic features of the RtfTextBox, but using standard HTML conventions. I realize HTML rendering is not exactly trivial, but I think a smallish subset of standard HTML would meet my needs just fine.
* Well, not in this version of .NET. The WebBrowser control will be available out of the box in VS.NET 2005, but it's the same exact hunk o' IE interop-- but this time, with a pretty Microsoft ribbon on the top.
Posted by Jeff Atwood View blog reactions
« I want my WSH.NET! Throwing Better .NET Exceptions with SOAP and HTTP »
Maybe it's just me, but I've found most HTML-rendering "rich" text editing controls to be irrational to use -- they just have a flippin' mind of their own, which comes down to the fact that you just never know when you hit Enter whether the thing thinks you want a div or a p or a br or whatever. At least, if Outlook HTML editing and Outlook Express are exemplars of what wrapping MSHTML.dll is like.
mike on October 16, 2004 12:49 AMMy luck Microsoft's version would be some FrontPage derivative that generates/reads very sloppy HTML. I'd tell em not to bother making it if they were going to do that.
HTML rendering isn't too easy, but getting it down to just rendering might drop the size a little bit. Much of the interop size is COM stuff, networking, and a slight bit of useless code that doesn't really deal with HTML rendering.
If I were to go about it, I would probably use XML/XSLT to render HTML content. It's included in the framework and "free" as the RtfTextBox. I'd then use DTD or XSD files to layout the various HTML, etc definitions so that you wouldn't have to recompile your control everytime a new HTML spec came out. Networking should be left out of the control so that you only pass the raw information. That would keep it as lightweight as possible because you don't really have to handle the HTTP protocol stuff within the control itself. There's other portions of .NET that can handle that relatively easily.
I'm speculating here and it's late, so I reserve the right to say whatever I said probably sucks. It's an interesting idea but I have no clue how well it'd work.
Jeremy Brayton on October 16, 2004 02:14 AMSo, a few things:
1) I don't believe that Lutz and Nikhil's code requires an interop DLL. I think they just use COM interop straight up in their own code. I could be wrong there, but the download of Writer is only a few K, not several MB.
2) Pasting bullets from Word into RichTextEditor seems to work fine for me. I didn't do anything special in RichTextEditor to enable this.
3) Like you, HTML interests me much more as a format than RTF. However, the model I'm particularly interested in is the one where the user interacts with the control purely as text, and the application determines formatting. I think of this as the "IDE model".
Overall I agree with you; having an HTML control would make life much easier, as long as it allows interactive editing.
Craig Andera on October 17, 2004 05:08 PMI Can't speak for Lutz's control, I haven't looked at it in a long time. Nikhil's control, however, uses P/Invoke on mshtml. Good stuff.
I wonder if mono is doing anything interesting in this space? There's also the gecko engine to think about - but as you said, rendering html is not a trivial task.
Christopher Frazier on October 18, 2004 03:53 PMLutz's control uses Nikhil's interop wrapper.
Craig on October 18, 2004 08:42 PMThe only *good* thing about the IE PIA is that it somehow compresses (standard zip compression) from 7.8 to less than 1.5MB... I'm not sure if there are a lot of strings in there, or if it's just random chance.
I guess at this point we'll have to wait for (at least) Longhorn to get a really truly managed HTML renderer. Sad, really.
Eric on October 24, 2004 02:21 PMJeff,
Have you seen this commercial control: http://www.netrixcomponent.net?
It's similar in concept to Lutz and Nikhil's code, (no PIA) but seems much more extensive in the features it supports.
I can't find much on-line comment about it tho...
PS. Thanks for this handy page!
Ok. Here we are in October 2006. Guess what... there still isn't a good, lightweight tool to do this yet.
I checked out the Netrixcomponent link... too expensive for a single developer in a minimal use app.
Eric D. Burdo on October 11, 2006 12:41 PMI looked into this recently as well and that WebBrowser control is trouble (or IE-interop prior to .Net 2.0). When you interface with it it's all-or-nothing - you get that progress bar displaying even if what you intend to display, while HTML, is nothing like a webpage (just fancy formatted text), and you get every single security bug in IE imported right into your app - scripting and all.
Basically, the WebBrowser control is another cheap hack on Microsoft's part to, for whatever reason, further avoid using their own Managed code to finally write a browser in .Net. Beats me as to why they keep putting it off - afterall, if they did finally get on their own horse, we'd likely have something as componentized as other parts of .Net - where you can pick and choose the pieces you need (say just HTML functionality, or just scripting, etc) and leave out the other parts that are just performance burdens, UI burdens, and above all, security burdens.
Looks like open source is the way to solve this one. Majestic 12 proposes a solution:
http://www.majestic12.co.uk/projects/html_parser.php
I'm going to give that a shot and if it works nicely and I can add to it a bit I'll ask the guy to get a SourceForge/Subversion tree going.
Chris Moschini on January 23, 2007 11:59 PMSeems that people are still commenting on this issue, so I'll toss in a link. Someone wrote a C# HTML renderer back in 2003 for the Compact Framework, and reading this page reminded me of it. The old homepage is at http://home.nc.rr.com/bshankle/cfhtml/index.html , although the developer has since relocated the file to his own site at http://www.bruceshankle.com/_mgxroot/page_10764.html . It's effectively open-source, and while it's not a complete HTML implementation, it looks like it covers enough to be useful.
Mark Erikson on February 18, 2007 01:49 AMIf you're after a good wysiwyg xhtml editor, I've had great sucess using XStandard: http://www.xstandard.com/
It goes a long way towards helping your users generate "semantically correct" data documents as things like bold and italics are controlled through a flexible style menu. It also avoids depreciated tags like <b> and instead uses the more meaningful ones like <strong>. Another big advantage if your applications are focused on creating content for the web is the fact that you can supply it with a stylesheet to use in its rendering that gives users a real time preview of what it will look like on the site without adding extra weight to the html.
I'm interested to hear your opinion of the tool! I've even considered using it as an 'advanced' text editing control in some of my web applications.
Matt Mousseau on August 22, 2007 08:29 PM| Content (c) 2008 Jeff Atwood. Logo image used with permission of the author. (c) 1993 Steven C. McConnell. All Rights Reserved. |