I <3 Steve McConnell*
Coding Horror
programming and human factors
by Jeff Atwood

July 12, 2006

The Visual Studio IDE and Regular Expressions

The Visual Studio IDE supports searching and replacing with regular expressions, right? Sure it does. It's right there in grey and black in the find and replace dialog. Just tick the "use Regular expressions" checkbox and we're off to the races.

The Visual Studio 2005 find dialog

However, you're in for an unpleasant surprise when you attempt to actually use regular expressions to find anything in Visual Studio. Apparently the Visual Studio IDE has its own bastardized regular expression syntax. Why? Who knows. Probably for arcane backwards compatibility reasons, although I have no idea why you'd want to perpetually carry forward insanity. Evidently it makes people billionaires, so who am I to judge.

God forbid we all learn one standard* regular expression dialect.

At any rate, some of the Visual Studio IDE regular expressions look awfully similar to standard regex:

Visual Studio IDE Standard
Any single character . .
Zero or more * *
One or more + +
Beginning of line ^ ^
End of line $ $
Beginning of word < (no equivalent)
End of word > (no equivalent)
Line break \n \n
Any character in set [ ] [ ]
Any character not in set [^ ] [^ ]
Or | |
Escape special char \ \
Tag expression { } ( )
C/C++ identifier :i ([a-zA-Z_$][a-zA-Z0-9_$]*)
Quoted string :q (("[^"]*")|('[^']*'))
Space or Tab :b [ |\t]
Integer :z [0-9]+

But they certainly don't act related when you try to use them. For example, try something simple, like finding "[A-Za-z]+". That's all occurrences of more than one letter in a row. When I try this via the Visual Studio find dialog with the regex option checked, I get positively bizarre results. It finds a word made up of all letters, true, but as I click "Find Next", it then finds each subsequent letter in the word. Again. What planet are these so-called "regular expressions" from?

The semi-abandoned Microsoft VSEditor blog has a three part tutorial (part one, part two, part three) on using the crazy Visual Studio dialect of Regex. There's a lot of emphasis on the strange < and > begin/end word match characters, which have no equivalent that I know of in the .NET and Perl dialect of regular expressions.

You might say that searching with regular expressions is such an extreme edge condition for most developers that it's not worth the Visual Studio development team's time. I won't disagree with you. It is rare, but it's hardly esoteric. Every developer should be able to grok the value of searching with the basic regular expressions that are a staple of their toolkit these days. Heck, some developers are so hard core they search through their code with Lisp expressions. Basic regex search functionality is awfully mild compared to that.

To be honest, searching with regular expressions isn't a common task for me either. But I'd be a lot more likely to use it if I didn't have to perform a lot of mental translation gymnastics on the occasions that I needed it. Don't make me think, man. But there is hope. There's a free add-in available which offers real regular expression searching in Visual Studio.

* well, mostly standard, anyway. Certainly JavaScript regex syntax could be considered standard these days.

Posted by Jeff Atwood    View blog reactions

 

« Power, Surge Protection, PCs, and You I Heart Strings »

 

Comments

I'm always surprised when I come across a regex syntax that doesn't include \< and \> for matching the beginning and end of words. The vast majority of my regex writing has been for grep, sed and vi, all of which understand those constructs. It sure would be nice if tool writers could just pick one syntax and stick with it, though.

Greg Fleming on July 14, 2006 02:40 AM

Er, that should be backslash-lt and backslash-gt for begin and end. Caught by the html eating software, I guess.

Greg Fleming on July 14, 2006 02:42 AM

It's always seemed dumb to me that the F&R dialog didn't have an option to use the dotnet RegEx classes!

It looks like the ide can handle extra ones - judging by the drop down, so why didn't ms provided one based on dnet apps as most dnet apps would are written in the IDE.

In the end I had to use Expresso( I tried regulator but it would even run on my machine, somthing to do with have DN2 installed )

adrian on July 14, 2006 05:09 AM

It could be worse. Imagine if they'd chosen POSIX regexes, or old VB's Like operator (shudder).

Beginning of word: \W\w
Vice versa for end.

Here I'm going out on a limb and trusting the internet that .net syntax is at least pcre-compatible.

Foxyshadis on July 14, 2006 05:31 AM

It's the grouping operator (ok, 'tag expression') that always catches me - I try every combination of () and \(\) and it doesn't work - ah, that'll be because MS decided to use braces. Bah.

As to the behaviour you saw with searching for "[A-Za-z]+" - it strikes me that VS restarts the search one character after the start of the last match, rather than straight after the end of the last match.

Not sure I'd agree with "...searching with regular expressions is such an extreme edge condition..." either, but then maybe I'm just regex-happy :-) Certainly happy with regexes, anyway!!

Stuart Dootson on July 14, 2006 06:00 AM

>Not sure I'd agree with "...searching with regular expressions is such an extreme edge condition..." either

Indeed. One thing that's true though is that searching through my CODE with regular expressions is an edge condition (for me). However, I'm always looking at other documents, output from a server, etc. that I regularly need to either format really quick using regex or just grep through. So yeah, code searches are rare for me, but regular expression searches in general are an every day occurrence.

And yes, I've been bitten by VS's funky regex language as well.

Jeremy on July 14, 2006 07:11 AM

It's pretty common for \b to indicate "word boundary" when outside of a character class/set, so you can get something at the beginning or end of a word by putting a \b in the right spot.

Travis Illig on July 14, 2006 08:27 AM

> you can get something at the beginning or end of a word by putting a \b in the right spot

Right, \b is very handy, but not quite the same thing as the explicit "beginning of word" or "end of word" > < characters. I did a search on these characters and I got hits on egrep and emacs. So I guess that does exist in some flavors of regex.

Jeff Atwood on July 14, 2006 09:41 AM

I'm not sure why you are griping here. First, there really isn't a "standard" regex syntax. Just a whole bunch of bastardized flavors, and arbitrarily picking the javascript flavor.

To me, the additions that Visual Studios makes (with C++ keywords, quoted strings, etc.) are very useful for searching through code. I'd much rather use ':i' for matching an identifier rather than '([a-zA-Z_$][a-zA-Z0-9_$]*)'

The use of braces {} to tag patterns rather than parenthesis () is pretty annoying, I'll admit.

In the meantime, I highly recommend picking up some freeware application like this one: http://weitz.de/regex-coach/
It's geared toward Perl regular expressions, but still very useful for debugging complex patterns

Steve Bush on July 14, 2006 10:05 AM

> First, there really isn't a "standard" regex syntax. Just a whole bunch of bastardized flavors, and arbitrarily picking the javascript flavor

There's no "standard" C++, or "standard" English, either. So we should just give up and stop trying? I say froozbah* to you!

> ':i' for matching an identifier

I have no problem with the shortcut additions. It's the wholesale abandonment of normal regex behavior and conventions that I have a problem with.

* This is a new word I created. Just because.

Jeff Atwood on July 14, 2006 10:26 AM

You can do some cool stuff with regex capture groups in VS.NET Find / Replace, too:
http://weblogs.asp.net/jgalloway/archive/2003/05/24/7498.aspx

Jon Galloway on July 14, 2006 10:49 AM

Hey Jeff (and assorted follow-up posters),

I'm the lead program manager for the team that owns editing and the find/replace dialog in Visual Studio. Our team agrees with your post :)

It is a very oddball regex syntax, and as best we can tell it comes from Visual C++ 2.0. We did want to add additional support for .NET 2.0-style regular expressions in the Visual Studio 2005 release, but unfortunately due to time pressures it didn't make the final list of features. We were able to make a number of bug fixes to the existing engine though, to give some improvement over VS 2003.

We do keep this on our list of things we want to fix. Ideally at some point we'll actually build in a nifty little extensibility point so you can wire up any regex engine you want for searches.

Thanks for the feedback!

Neil Enns
Lead Program Manager
Microsoft Visual Studio

Neil on July 14, 2006 11:15 AM

So, this will be in Orcas, right? Right?? ;)

But in all seriousness, thanks for letting us know you're out there and thinking about this stuff. It is appreciated.

Jeff Atwood on July 14, 2006 11:24 AM

I use regular expressions heavily in Visual Studio. Perhaps I'm missing the boat on some other interactive regular expression searcher, but I don't know what I'd do without them. I work on an older project with at least a couple hundred thousand lines of code and I need to be able to understand and change any of it. Searching is essential for understanding the nature of such systems that are too large to keep entirely in your head.

Also, iteratively doing regular expression search and replace is really handy when translating code from one language to another or generating wrapper/adapter code for largish API's.

The reason you have the weird search results is that the search starts from the cursor, and your regular expression matches since the first character after the cursor is a letter, etc. Use the beginning of word character or prefix your search with a non-identifier character: [^a-zA-Z][a-zA-Z]+

David Gladfelter on July 14, 2006 01:21 PM

/searching with regular expressions is such an extreme edge condition for most developers/

What?!? I guess if the tools don't support it well then that might be the case.

But Vim's regex search and search/replace, especially combined with really easy macro recording/playback, is so powerful and easy to use (once you've got an idea of the syntax), I find myself using it fairly often. When I'm forced to use VS for work, there are times when the lack of a good regex search really jumps out at me.

Adam on July 14, 2006 04:13 PM

You must not be familiar with grep to think that < and > are funny. However, in grep, they are preceeded by \

Phil on July 14, 2006 04:18 PM

So that's why I keep getting weird results with the regex find. Geez, I thought I was going crazy.

foobar on July 14, 2006 04:34 PM

> You might say that searching with regular expressions is such an extreme edge condition for most developers

You guys have to realize that if you're reading this blog (and by *this* blog, I mean *any* blog), you're already way outside the pool of "most developers". And in a good way, but remember, you're not always representative of the average developer ;)

Jeff Atwood on July 14, 2006 04:36 PM

Reading this post and the comments reminds me of the fact that I wish Visual Studio (and specifically the C# and VB.NET code editors, which I understand are not really part of Visual Studio proper, but rather more like plugins) were more like a good text editor; instead, I find it sometimes gets in my way because it's trying to be a *code* editor when I just want it to be a text editor. It would be cool if there was a way to switch out of "smart code editor" mode into "get out of my way and just be a text editor" mode.

Because VS is quite imperfect as a general purpose text editor, I find myself using my favorite text editor, UltraEdit, in combination with Visual Studio quite often. One of my favorite UltraEdit features is the regular expression search-and-replace feature (which also works across multiple files), though I couldn't say what "standard" it's uing for regex syntax; like VS, it's a bit of a hybrid of "standard" perl-type conventions and proprietary UltraEdit conventions, like ^p for an end-of-line character and ^t for tab. UltraEdit has over the years added lots of features like syntax highlighting, code completion, etc. but they've never in my opinion added any of these features in a way that interferes with the program's ability to just be a good "dumb" text editor.

One of Visual Studio getting in the way with trying to be too damn "smart" is when I want to paste in some text that I've copied and pasted from a web browser. It insists on pasting it in as HTML! It never fails that I forget about this annoyance, so I have to hit Ctrl-Z to undo, then switch over to UltraEdit, paste my text there, copy it again as plain text, and then I can paste it into the VS editor.

Thanks for the good post, per usual, Jeff,
Dan

Daniel Read on July 14, 2006 04:38 PM

>But Vim's regex search and search/replace, especially combined with really easy macro recording/playback, is so powerful and easy to use (once you've got an idea of the syntax), I find myself using it fairly often. When I'm forced to use VS for work, there are times when the lack of a good regex search really jumps out at me.

Vim is free. Why not use it at work when you need it?

This is the reason I always keep GNU Emacs handy on any machine I do coding on. I always run into cases where the power of a really good editor is needed. Vim, Emacs and the like have decades of experience behind them, by people who will get it right if it isn't already.

It appears that VS.NET's F&R was built by people who have time to read blogs, but not to use the very code library that they are asking everyone else to use.

David Avraamides on July 14, 2006 06:45 PM

is this just another example of MS not sticking to standards?

some dude on July 14, 2006 10:09 PM

> is this just another example of MS not sticking to standards?

Not really; it's a case of them hewing too closely to their old, crazy standard from Visual C++ 2.0. Backwards compatibility kills, particularly when it's backwards compatibility with.. er.. nonsensical, obsolete stuff.

And, I suspect, a very low priority for this feature compared to other more mainstream improvements in the IDE.

Still, you could wish someone had been a bit braver about scrapping the old to make way for the new.. it pains me to hear that developers at microsoft spent time bugfixing the old, weirdo VC++ regex syntax.

Jeff Atwood on July 14, 2006 10:42 PM

Daniel Read - I've never had a problem with that. Try Alt+Ctrl+V

[ICR] on July 17, 2006 01:10 AM

The implementation is buggy anyway. Who would use an unreliable regexp-replace feature, that requires manual inspection of the result afterwards?

I have many times attempted to do a regexp-replace that did not match correctly: An example is copying an array from Excel, resulting in tab-seperated text in VS, using this (in hope the HTML-ifier engine wont ruine it) regexp to convert into C# did not work:
Find: "^{:n}\t{:n}$"
Replace: "{ \1 , \2 },"

The result was many empty lines like "{ , },", further if I removed match start and end, using "{:n}\t{:n}". Then not all digits at the end of the line was matched (betst result was achieved when cliking individual "Replace" oposed to "Replace All".

Going back to XEmacs.

Jarl

Jarl Friis on October 25, 2006 11:54 PM

Saw a cool replacement that does what you suggest.

http://channel9.msdn.com/ShowPost.aspx?PostID=181063

You can even wire it up to CTRL+H manually for the Text Edit, so it appears instead of the usual Find/Replace dialog.

Doug on December 3, 2006 07:36 AM

>>I wish Visual Studio (and specifically the C# and VB.NET code
>>editors, which I understand are not really part of Visual Studio
>>proper, but rather more like plugins) were more like a good text
>>editor; instead, I find it sometimes gets in my way because it's
>>trying to be a *code* editor when I just want it to be a text editor.
>>It would be cool if there was a way to switch out of "smart code
>>editor" mode into "get out of my way and just be a text editor" mode.

My thoughts exactly. I wish their intentions were to build a professional programming editor first and integrate their product specific stuff second. This is why I always have to have an alternate editor on standby and I wish that weren't necessary.

steve on March 5, 2007 08:44 AM

For the problem that "finds each subsequent letter in the word", you can resolve it by enabling "Match whole word" in the find/replace dialog.
Once again, M$ sux!

av on July 18, 2007 01:05 AM

There's a better regex search replacement tool for VS2005 available at:

http://channel9.msdn.com/ShowPost.aspx?PostID=181063

Adrian on September 26, 2007 01:58 AM

I believe the F&R dialog should have "Regular Expression" and "Regular Regular Expression"

Paulustrious on December 27, 2007 10:15 AM

Someone please explain this Visual Studio Find and Replace (F&R) behavior when using prevent match
in a regular expression. It appears to be a bug, but maybe I am missing something. It is easy to
reproduce. I am using Visual Studio 2005 (it happens with and without Service Pack 1).

When using the following find against the XML line shown, it correctly finds the
first occurrence of "copyright". Find again skips the second "copyright" as expected:

FIND: ~(-)copyright
<report-option name="copyright-notice" value="{//field[@name = 'item-copyright']/@value}"/>

Now try the same find, only this time put parenthesis (or squirly brackets) around the expression:

FIND: {[]~(-)copyright}

This time when find again is used the second "copyright" is incorrectly matched.

I have found other bugs too. One particularly destructive that happens with replace in files,
producing unexpected results in different files every time it is used. Run it one time and it
works correctly for file 1 and incorrectly for file 2. Restore the files and run it again.
This time it might work incorrectly for file 1 and incorrectly for file 2. A third time, it might
work incorrectly for file 1 and correctly for file 2. It is completely unpredictable every
time it is used against the same files. Now imagine it with 1000 files. It is very ugly.

With such bugs, it is really hard for me to depend on F&R.

Does anyone know of a good site documenting the known bugs, peculiar behavior, and types of finds
(and replaces) that should be avoided when using Visual Studio F&R?

P.S. I too would like to see F&R use the same expression syntax as the framework.

Thest on January 16, 2008 06:43 PM

The following line is the XML for my previous post without the opening and closing < and >. This sight removed that line from my previous post.

report-option name="copyright-notice" value="{//field[@name = 'item-copyright']/@value}"/

Thest on January 16, 2008 06:47 PM

The second find should have been {~(-)copyright} rather than {[]~(-)copyright}. Note that there is a tilde preceding (-) in both cases, which this site is also removing.

Thest on January 16, 2008 06:53 PM

I've posted a complete phrasebook between the VS editor regex syntax and normal regex syntax: http://brianary.blogspot.com/2008/05/visual-studios-nih-regex-syntax.html

Brianary on May 22, 2008 11:16 AM

There is a new tool Regent (http://www.regexinference.com) that tries to infer search and replace regular expression from text example. It supports both Visual Studio regex syntax and ECMAScript/Perl syntax. Not a freeware though.

Sergey Vlasov on June 3, 2008 12:03 AM

Sometimes a slash appears on the Visual Studio IDE while working with code-behind file. It does not effect any functionality. Can anyone explain why this happens?
Mail to sanjayvigil@yahoo.co.in

Sanjay on June 26, 2008 04:38 AM







(hear it spoken)


(no HTML)




Content (c) 2008 Jeff Atwood. Logo image used with permission of the author. (c) 1993 Steven C. McConnell. All Rights Reserved.