If you've ever tried to cut and paste code from the VS.NET IDE, you may have noticed that the code generally comes across looking like crap. The root of this problem is that VS.NET copies code into your clipboard in the accursed Rich Text Format. If you were expecting something like standard HTML, think again, bucko!
Brad Abrams posted a quick and dirty workaround to convert the clipboard to HTML using Word. Cory Smith took that workaround and turned it into a VS.NET Macro. It works fairly well, but...
I experimented with Cory's macro, simplifying it slightly, and forcing a standard font. (I normally use a custom font for programming, but not everyone will have that font installed.)
I knew Word's HTML wasn't going to be optimal, but after taking a closer look at it, I was profoundly unhappy with it. The fact that copying and pasting it back into VS.NET resulted in extra line breaks was kind of a showstopper, too. Here's a little taste:
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"> <SPAN style="FONT-SIZE: 9pt; FONT-FAMILY: 'Courier New'; mso-bidi-font-size: 12.0pt"> <o:p></o:p>
If this is Word's idea of "filtered" HTML, I'd hate to see the unfiltered version. And what's up with those empty <o:p> tags all over the place? After I figured out the threading issue preventing me from accessing the clipboard in a macro, I added some code to postfix Word's crazy HTML into something resembling standard, basic HTML. This worked OK.
But then I wondered-- why not convert the native RTF on the clipboard to HTML myself and cut out the middleman? I'm all for using ten ton wrecking balls, but not when they er.. wreck stuff! Fortunately, I've written RTF to HTML converters before, and even more fortunately, VS.NET only uses a tiny subset of RTF to place colored code on the clipboard. Here's the main conversion function:
Private Function RtfToHtml(ByVal rtf As String) As String
Const tabSpaces As String = " "
'-- remove line breaks
rtf = Regex.Replace(rtf, "[\n\r\f]", "")
'-- parse RTF color table
Dim colorTable As New Collections.Hashtable
Dim i As Integer = 1
For Each m As Match In Regex.Matches(rtf, _
"\\red(?<red>\d+)\\green(?<green>\d+)\\blue(?<blue>\d+);")
colorTable.Add(i, HtmlColor(m))
i += 1
Next
'-- remove header and footer RTF tags
rtf = Regex.Replace(rtf, "{\\rtf1[^\s]+\s", "")
rtf = Regex.Replace(rtf, "}$", "")
rtf = Regex.Replace(rtf, "\\deff0{\\fonttbl{\\f\d+[^}]+}}", "")
rtf = Regex.Replace(rtf, "{\\colortbl;(\\red\d+\\green\d+\\blue\d+;)+}", "")
'-- fix escaped C# brackets
rtf = Regex.Replace(rtf, "\\{", "{")
rtf = Regex.Replace(rtf, "\\}", "}")
'-- replace any HTML-specific characters
rtf = Web.HttpUtility.HtmlEncode(rtf)
'-- convert RTF tags to HTML tags
rtf = Regex.Replace(rtf, "\\tab\s", tabSpaces)
rtf = Regex.Replace(rtf, "\\par\s", "<br/>" & Environment.NewLine)
'-- remove unmapped RTF tags
rtf = Regex.Replace(rtf, "\\fs(?<size>\d+)\s", "")
rtf = Regex.Replace(rtf, "\\cb\d+\\highlight\d+\s", "")
'-- map foreground color RTF tags using <font> tag
rtf = Regex.Replace(rtf, "\\cf0\s", "</span><span style='color:black'>")
For Each m As Match In Regex.Matches(rtf, "\\cf(?<num>\d+)\s")
i = Convert.ToInt32(m.Groups("num").Value)
rtf = Regex.Replace(rtf, "\\cf" & i & "\s", _
"</span><span style='color:" & colorTable.Item(i) & "'>")
Next
'-- fix up orphaned spans at start and end
rtf = Regex.Replace(rtf, "(^.*?)</span>", "$1")
rtf = rtf & "</span>"
'-- convert remaining spaces to HTML spaces
rtf = Regex.Replace(rtf, " ", " ")
'-- add wrapping div
rtf = "<div style='font-family:" & CodeFontName & _
"; font-size: " & CodeFontSize & "pt;'>" & _
rtf & "</div>"
Return rtf
End Function
All this RTF spelunking revealed an interesting fact. I've always been disappointed that none of the copied code had background color highlighting. Well, that's because the RTF on the clipboard doesn't contain any of the background colors! The actual background formatting codes are there, but there are absolutely no entries in the RTF color table for them. Weird.
Update 4/2006: I have a much improved RTF conversion macro. This macro is only interesting for historical reasons, or if you need the Word interop conversion.
Anyway, here's the full FormatToHtml macro (zip). It contains the direct RTF clipboard to HTML conversion, as well as the RTF clipboard to Word clipboard to HTML conversion. To get started:
Double-click to run the macro, then paste away..
Posted by Jeff Atwood View blog reactions
« Phantom DOS files in my root Where Are The .NET Blogging Solutions? »
For an even more hard-core "convert the RTF to HTML our own damn selves" solution, try the excellent VS.NET add-in CopySourceAsHtml:
http://www.jtleigh.com/people/colin/software/CopySourceAsHtml/
This is far more sophisticated and feature-rich than my little lightweight RTF to HTML function.
Jeff Atwood on June 14, 2005 09:32 PMThanks for the plug! I'm glad to see you had the same epiphany I did. :)
Colin on June 14, 2005 10:32 PMYeah, it's really nice work! I was looking at the source the other day while working on this.
Jeff Atwood on June 14, 2005 11:23 PMdasBlog includes an Insert Code toolbar button on it's implementation of FreeTextBox that does the formatting for you. You'd probably have to contact Scott Hanselman to find out who wrote it or if you could get it though (assuming you use FreeTextBox).
Chris Wallace on June 15, 2005 06:56 AMI happen to be looking at DasBlog right now. Looks like a colorizing regex engine, on a popup form dedicated to that purpose.
It's using AylarSolutions.Highlight.Highlighter
http://weblogs.asp.net/tjohansen/archive/2003/08/17/24291.aspx
Jeff Atwood on June 15, 2005 03:06 PMI use the squishySyntaxHighlighter . It's very nice, preserves collapsable regions and line numbers and is free.
Scott Schecter on June 19, 2005 11:47 AMHello Jeff,
I have downloaded the zip but can not open it (corrupted).
Please advice,
Mario
Upgrade to the latest WinZip at http://www.winzip.com .. unfortunately I might have saved this with the "extreme compression" that is new to that version of WinZip.
I'll try to save it in the "compatible compression" and re-upload it.
Jeff Atwood on July 11, 2005 01:41 PMI updated the macro tonight. Most of the improvements are in UsingRtfConversion:
- Works under VS.NET 2005 (Thread.ApartmentState.STA must be manually specified when accessing the clipboard)
- Wraps the code snippet in a DIV
- Sets the background color of the code
- Minor HTML formatting improvements
- Option to remove first TAB for heavily indented code (this could be automated, hmm..)
The Word functionality is unchanged!
Jeff Atwood on October 27, 2005 06:15 AMI created a simpler version of this macro here:
http://www.codinghorror.com/blog/archives/000429.html
Jeff Atwood on October 27, 2005 05:28 PMI had to restart my IDE before the macro would work. Dunno if that's something particular to my environment or what, but thought it was worth mentioning if someone else has any problems.
Scott Bellware on December 24, 2005 02:52 PMUsing the RTM of VS 2005 I found that I had to add a reference to System.Web in order to get this to owkr. After that point, it worked great. Thanks for the tool!
Rob Gillen on January 14, 2006 08:51 AMOk, I give up. I installed the macro but now how do I use it?
Thanks.
First, you should be using the new, simpler macro here:
http://www.codinghorror.com/blog/archives/000429.html
Second, once you've installed the Macro, either map it to a keyboard key (Tools, Options, Keyboard) or just double-click on it in the Macro Explorer.
Jeff Atwood on January 27, 2006 07:43 PMI'm unable to use your tool. It is full of errors. I'm getting Regex error (around 21 in number)
Can you provide a step-by-step installation instructions?
anand on February 1, 2006 07:21 AMIt is now working but there is an error in the code
rtf = Web.HttpUtility.HtmlEncode(rtf)
Error 1 'HttpUtility' is not a member of 'Web'.
Is there any way to fix this error?
anand on February 1, 2006 07:36 AMHi Anand, you need to add a reference to the System.Web class in the Macro IDE-- you can do this via the "add reference" menu.
Jeff Atwood on February 1, 2006 04:55 PM| Content (c) 2008 Jeff Atwood. Logo image used with permission of the author. (c) 1993 Steven C. McConnell. All Rights Reserved. |