The Visual Studio IDE and Regular Expressions

July 12, 2006

The Visual Studio IDE supports searching and replacing with regular expressions, right? Sure it does. It's right there in grey and black in the find and replace dialog. Just tick the "use Regular expressions" checkbox and we're off to the races.

The Visual Studio 2005 find dialog

However, you're in for an unpleasant surprise when you attempt to actually use regular expressions to find anything in Visual Studio. Apparently the Visual Studio IDE has its own bastardized regular expression syntax. Why? Who knows. Probably for arcane backwards compatibility reasons, although I have no idea why you'd want to perpetually carry forward insanity. Evidently it makes people billionaires, so who am I to judge.

God forbid we all learn one standard* regular expression dialect.

At any rate, some of the Visual Studio IDE regular expressions look awfully similar to standard regex:

Visual Studio IDE Standard
Any single character . .
Zero or more * *
One or more + +
Beginning of line ^ ^
End of line $ $
Beginning of word < (no equivalent)
End of word > (no equivalent)
Line break \n \n
Any character in set [ ] [ ]
Any character not in set [^ ] [^ ]
Or | |
Escape special char \ \
Tag expression { } ( )
C/C++ identifier :i ([a-zA-Z_$][a-zA-Z0-9_$]*)
Quoted string :q (("[^"]*")|('[^']*'))
Space or Tab :b [ |\t]
Integer :z [0-9]+

But they certainly don't act related when you try to use them. For example, try something simple, like finding "[A-Za-z]+". That's all occurrences of more than one letter in a row. When I try this via the Visual Studio find dialog with the regex option checked, I get positively bizarre results. It finds a word made up of all letters, true, but as I click "Find Next", it then finds each subsequent letter in the word. Again. What planet are these so-called "regular expressions" from?

The semi-abandoned Microsoft VSEditor blog has a three part tutorial (part one, part two, part three) on using the crazy Visual Studio dialect of Regex. There's a lot of emphasis on the strange < and > begin/end word match characters, which have no equivalent that I know of in the .NET and Perl dialect of regular expressions.

You might say that searching with regular expressions is such an extreme edge condition for most developers that it's not worth the Visual Studio development team's time. I won't disagree with you. It is rare, but it's hardly esoteric. Every developer should be able to grok the value of searching with the basic regular expressions that are a staple of their toolkit these days. Heck, some developers are so hard core they search through their code with Lisp expressions. Basic regex search functionality is awfully mild compared to that.

To be honest, searching with regular expressions isn't a common task for me either. But I'd be a lot more likely to use it if I didn't have to perform a lot of mental translation gymnastics on the occasions that I needed it. Don't make me think, man. But there is hope. There's a free add-in available which offers real regular expression searching in Visual Studio.

* well, mostly standard, anyway. Certainly JavaScript regex syntax could be considered standard these days.

Posted by Jeff Atwood
52 Comments

I use regular expressions heavily in Visual Studio. Perhaps I'm missing the boat on some other interactive regular expression searcher, but I don't know what I'd do without them. I work on an older project with at least a couple hundred thousand lines of code and I need to be able to understand and change any of it. Searching is essential for understanding the nature of such systems that are too large to keep entirely in your head.

Also, iteratively doing regular expression search and replace is really handy when translating code from one language to another or generating wrapper/adapter code for largish API's.

The reason you have the weird search results is that the search starts from the cursor, and your regular expression matches since the first character after the cursor is a letter, etc. Use the beginning of word character or prefix your search with a non-identifier character: [^a-zA-Z][a-zA-Z]+

David Gladfelter on July 14, 2006 2:21 AM

I'm always surprised when I come across a regex syntax that doesn't include \ and \ for matching the beginning and end of words. The vast majority of my regex writing has been for grep, sed and vi, all of which understand those constructs. It sure would be nice if tool writers could just pick one syntax and stick with it, though.

Greg Fleming on July 14, 2006 3:40 AM

Er, that should be backslash-lt and backslash-gt for begin and end. Caught by the html eating software, I guess.

Greg Fleming on July 14, 2006 3:42 AM

/searching with regular expressions is such an extreme edge condition for most developers/

What?!? I guess if the tools don't support it well then that might be the case.

But Vim's regex search and search/replace, especially combined with really easy macro recording/playback, is so powerful and easy to use (once you've got an idea of the syntax), I find myself using it fairly often. When I'm forced to use VS for work, there are times when the lack of a good regex search really jumps out at me.

Adam on July 14, 2006 5:13 AM

You must not be familiar with grep to think that and are funny. However, in grep, they are preceeded by \

Phil on July 14, 2006 5:18 AM

So that's why I keep getting weird results with the regex find. Geez, I thought I was going crazy.

foobar on July 14, 2006 5:34 AM

You might say that searching with regular expressions is such an extreme edge condition for most developers

You guys have to realize that if you're reading this blog (and by *this* blog, I mean *any* blog), you're already way outside the pool of "most developers". And in a good way, but remember, you're not always representative of the average developer ;)

Jeff Atwood on July 14, 2006 5:36 AM

Reading this post and the comments reminds me of the fact that I wish Visual Studio (and specifically the C# and VB.NET code editors, which I understand are not really part of Visual Studio proper, but rather more like plugins) were more like a good text editor; instead, I find it sometimes gets in my way because it's trying to be a *code* editor when I just want it to be a text editor. It would be cool if there was a way to switch out of "smart code editor" mode into "get out of my way and just be a text editor" mode.

Because VS is quite imperfect as a general purpose text editor, I find myself using my favorite text editor, UltraEdit, in combination with Visual Studio quite often. One of my favorite UltraEdit features is the regular expression search-and-replace feature (which also works across multiple files), though I couldn't say what "standard" it's uing for regex syntax; like VS, it's a bit of a hybrid of "standard" perl-type conventions and proprietary UltraEdit conventions, like ^p for an end-of-line character and ^t for tab. UltraEdit has over the years added lots of features like syntax highlighting, code completion, etc. but they've never in my opinion added any of these features in a way that interferes with the program's ability to just be a good "dumb" text editor.

One of Visual Studio getting in the way with trying to be too damn "smart" is when I want to paste in some text that I've copied and pasted from a web browser. It insists on pasting it in as HTML! It never fails that I forget about this annoyance, so I have to hit Ctrl-Z to undo, then switch over to UltraEdit, paste my text there, copy it again as plain text, and then I can paste it into the VS editor.

Thanks for the good post, per usual, Jeff,
Dan

Daniel Read on July 14, 2006 5:38 AM

It's always seemed dumb to me that the FR dialog didn't have an option to use the dotnet RegEx classes!

It looks like the ide can handle extra ones - judging by the drop down, so why didn't ms provided one based on dnet apps as most dnet apps would are written in the IDE.

In the end I had to use Expresso( I tried regulator but it would even run on my machine, somthing to do with have DN2 installed )

adrian on July 14, 2006 6:09 AM

It could be worse. Imagine if they'd chosen POSIX regexes, or old VB's Like operator (shudder).

Beginning of word: \W\w
Vice versa for end.

Here I'm going out on a limb and trusting the internet that .net syntax is at least pcre-compatible.

Foxyshadis on July 14, 2006 6:31 AM

But Vim's regex search and search/replace, especially combined with really easy macro recording/playback, is so powerful and easy to use (once you've got an idea of the syntax), I find myself using it fairly often. When I'm forced to use VS for work, there are times when the lack of a good regex search really jumps out at me.

Vim is free. Why not use it at work when you need it?

This is the reason I always keep GNU Emacs handy on any machine I do coding on. I always run into cases where the power of a really good editor is needed. Vim, Emacs and the like have decades of experience behind them, by people who will get it right if it isn't already.

It appears that VS.NET's FR was built by people who have time to read blogs, but not to use the very code library that they are asking everyone else to use.

David Avraamides on July 14, 2006 7:45 AM

Not sure I'd agree with "...searching with regular expressions is such an extreme edge condition..." either

Indeed. One thing that's true though is that searching through my CODE with regular expressions is an edge condition (for me). However, I'm always looking at other documents, output from a server, etc. that I regularly need to either format really quick using regex or just grep through. So yeah, code searches are rare for me, but regular expression searches in general are an every day occurrence.

And yes, I've been bitten by VS's funky regex language as well.

Jeremy on July 14, 2006 8:11 AM

It's pretty common for \b to indicate "word boundary" when outside of a character class/set, so you can get something at the beginning or end of a word by putting a \b in the right spot.

Travis Illig on July 14, 2006 9:27 AM

you can get something at the beginning or end of a word by putting a \b in the right spot

Right, \b is very handy, but not quite the same thing as the explicit "beginning of word" or "end of word" characters. I did a search on these characters and I got hits on egrep and emacs. So I guess that does exist in some flavors of regex.

Jeff Atwood on July 14, 2006 10:41 AM

I'm not sure why you are griping here. First, there really isn't a "standard" regex syntax. Just a whole bunch of bastardized flavors, and arbitrarily picking the javascript flavor.

To me, the additions that Visual Studios makes (with C++ keywords, quoted strings, etc.) are very useful for searching through code. I'd much rather use ':i' for matching an identifier rather than '([a-zA-Z_$][a-zA-Z0-9_$]*)'

The use of braces {} to tag patterns rather than parenthesis () is pretty annoying, I'll admit.

In the meantime, I highly recommend picking up some freeware application like this one: http://weitz.de/regex-coach/
It's geared toward Perl regular expressions, but still very useful for debugging complex patterns

Steve Bush on July 14, 2006 11:05 AM

is this just another example of MS not sticking to standards?

some dude on July 14, 2006 11:09 AM

First, there really isn't a "standard" regex syntax. Just a whole bunch of bastardized flavors, and arbitrarily picking the javascript flavor

There's no "standard" C++, or "standard" English, either. So we should just give up and stop trying? I say froozbah* to you!

':i' for matching an identifier

I have no problem with the shortcut additions. It's the wholesale abandonment of normal regex behavior and conventions that I have a problem with.

* This is a new word I created. Just because.

Jeff Atwood on July 14, 2006 11:26 AM

is this just another example of MS not sticking to standards?

Not really; it's a case of them hewing too closely to their old, crazy standard from Visual C++ 2.0. Backwards compatibility kills, particularly when it's backwards compatibility with.. er.. nonsensical, obsolete stuff.

And, I suspect, a very low priority for this feature compared to other more mainstream improvements in the IDE.

Still, you could wish someone had been a bit braver about scrapping the old to make way for the new.. it pains me to hear that developers at microsoft spent time bugfixing the old, weirdo VC++ regex syntax.

Jeff Atwood on July 14, 2006 11:42 AM

Hey Jeff (and assorted follow-up posters),

I'm the lead program manager for the team that owns editing and the find/replace dialog in Visual Studio. Our team agrees with your post :)

It is a very oddball regex syntax, and as best we can tell it comes from Visual C++ 2.0. We did want to add additional support for .NET 2.0-style regular expressions in the Visual Studio 2005 release, but unfortunately due to time pressures it didn't make the final list of features. We were able to make a number of bug fixes to the existing engine though, to give some improvement over VS 2003.

We do keep this on our list of things we want to fix. Ideally at some point we'll actually build in a nifty little extensibility point so you can wire up any regex engine you want for searches.

Thanks for the feedback!

Neil Enns
Lead Program Manager
Microsoft Visual Studio

Neil on July 14, 2006 12:15 PM

So, this will be in Orcas, right? Right?? ;)

But in all seriousness, thanks for letting us know you're out there and thinking about this stuff. It is appreciated.

Jeff Atwood on July 14, 2006 12:24 PM

Daniel Read - I've never had a problem with that. Try Alt+Ctrl+V

[ICR] on July 17, 2006 2:10 AM

The implementation is buggy anyway. Who would use an unreliable regexp-replace feature, that requires manual inspection of the result afterwards?

I have many times attempted to do a regexp-replace that did not match correctly: An example is copying an array from Excel, resulting in tab-seperated text in VS, using this (in hope the HTML-ifier engine wont ruine it) regexp to convert into C# did not work:
Find: "^{:n}\t{:n}$"
Replace: "{ \1 , \2 },"

The result was many empty lines like "{ , },", further if I removed match start and end, using "{:n}\t{:n}". Then not all digits at the end of the line was matched (betst result was achieved when cliking individual "Replace" oposed to "Replace All".

Going back to XEmacs.

Jarl

Jarl Friis on October 25, 2006 12:54 PM

Saw a cool replacement that does what you suggest.

http://channel9.msdn.com/ShowPost.aspx?PostID=181063

You can even wire it up to CTRL+H manually for the Text Edit, so it appears instead of the usual Find/Replace dialog.

Doug on December 3, 2006 7:36 AM

I wish Visual Studio (and specifically the C# and VB.NET code
editors, which I understand are not really part of Visual Studio
proper, but rather more like plugins) were more like a good text
editor; instead, I find it sometimes gets in my way because it's
trying to be a *code* editor when I just want it to be a text editor.
It would be cool if there was a way to switch out of "smart code
editor" mode into "get out of my way and just be a text editor" mode.

My thoughts exactly. I wish their intentions were to build a professional programming editor first and integrate their product specific stuff second. This is why I always have to have an alternate editor on standby and I wish that weren't necessary.

steve on March 5, 2007 8:44 AM

For the problem that "finds each subsequent letter in the word", you can resolve it by enabling "Match whole word" in the find/replace dialog.
Once again, M$ sux!

av on July 18, 2007 2:05 AM

There's a better regex search replacement tool for VS2005 available at:

http://channel9.msdn.com/ShowPost.aspx?PostID=181063

Adrian on September 26, 2007 2:58 AM

I believe the FR dialog should have "Regular Expression" and "Regular Regular Expression"

Paulustrious on December 27, 2007 10:15 AM

Someone please explain this Visual Studio Find and Replace (FR) behavior when using prevent match
in a regular expression. It appears to be a bug, but maybe I am missing something. It is easy to
reproduce. I am using Visual Studio 2005 (it happens with and without Service Pack 1).

When using the following find against the XML line shown, it correctly finds the
first occurrence of "copyright". Find again skips the second "copyright" as expected:

FIND: ~(-)copyright
report-option name="copyright-notice" value="{//field[@name = 'item-copyright']/@value}"/

Now try the same find, only this time put parenthesis (or squirly brackets) around the expression:

FIND: {[]~(-)copyright}

This time when find again is used the second "copyright" is incorrectly matched.

I have found other bugs too. One particularly destructive that happens with replace in files,
producing unexpected results in different files every time it is used. Run it one time and it
works correctly for file 1 and incorrectly for file 2. Restore the files and run it again.
This time it might work incorrectly for file 1 and incorrectly for file 2. A third time, it might
work incorrectly for file 1 and correctly for file 2. It is completely unpredictable every
time it is used against the same files. Now imagine it with 1000 files. It is very ugly.

With such bugs, it is really hard for me to depend on FR.

Does anyone know of a good site documenting the known bugs, peculiar behavior, and types of finds
(and replaces) that should be avoided when using Visual Studio FR?

P.S. I too would like to see FR use the same expression syntax as the framework.

Thest on January 16, 2008 6:43 AM

The following line is the XML for my previous post without the opening and closing and . This sight removed that line from my previous post.

report-option name="copyright-notice" value="{//field[@name = 'item-copyright']/@value}"/

Thest on January 16, 2008 6:47 AM

The second find should have been {~(-)copyright} rather than {[]~(-)copyright}. Note that there is a tilde preceding (-) in both cases, which this site is also removing.

Thest on January 16, 2008 6:53 AM

I've posted a complete phrasebook between the VS editor regex syntax and normal regex syntax: http://brianary.blogspot.com/2008/05/visual-studios-nih-regex-syntax.html

Brianary on May 22, 2008 12:16 PM

There is a new tool Regent (http://www.regexinference.com) that tries to infer search and replace regular expression from text example. It supports both Visual Studio regex syntax and ECMAScript/Perl syntax. Not a freeware though.

Sergey Vlasov on June 3, 2008 1:03 PM

Sometimes a slash appears on the Visual Studio IDE while working with code-behind file. It does not effect any functionality. Can anyone explain why this happens?
Mail to sanjayvigil@yahoo.co.in

Sanjay on June 26, 2008 5:38 AM

Yeah, your article does say it clearly, but as Im new to both regex expressions and VS IDE's regular expressions, I really dont have a problem with which I use to get my work done... Both are new to me anyway

Shashank on November 17, 2008 10:53 AM

I have provided a link to this page from my blog, hope you dont mind. http://pcriddler.blogspot.com/2008/11/using-vs2005-ide-regular-expressions.html

Shashank on November 17, 2008 10:56 AM

Is there any hope?

I've been lamenting the Visual Studio regex syntax for years, and I've seen no sign that it will ever support even the .net regular expression syntax.

aboy021 on January 9, 2009 5:14 AM

Great post! If you're looking for a free plugin that provides actual, standard-syntax regex support, I've found this one to work pretty well:
http://www.codeproject.com/KB/macros/VS2005RegexAddIn.aspx?fid=1405582df=90mpp=25noise=3sort=Positionview=Quickselect=2950706fr=1

Brook on March 5, 2009 6:37 AM

What will be the regular expression for RAISEROR function of Sql server ?

Raman on March 6, 2009 5:22 AM

I can't believe they're still doing this. At the least they could provide an option that uses the System.Text.RegularExpressions library.

The idiocy of their proprietary regexs has been carried over into Sql Management Studio as well. Why? It's not like they have backwards compatibility issues there.


Brian R on April 22, 2009 11:12 AM

Once again

xlnv on May 4, 2009 10:24 AM

Standards? For RegEx? You standards-whiners crack me up. Why doesn't your Web site follow the standards for good navigation layout? Why don't you follow the standards for whining all the time? Because there are no standards for those things, just like there are none for RegEx. You need to come to grips with the fact that the freedom to create and express results in differences. Overall, that is great. Accept that and you'll free up a ton of whining time that you can use to do something productive.

Neol on May 28, 2009 2:00 AM

@Jeff,
'There's no "standard" C++'

Huh?
So what is ISO/IEC_14882?

Of course the existence of a standard doesn't mean that everyone adheres to it. But still, there is a standard.

Lars

Lars on June 5, 2009 1:13 PM

Stop writing about things that you clearly have no clue about. How is (("[^"]*")|('[^']*')) supposed to match a quoted string, especially one that contains \"? Or a @-verbatim string that contains ""? And how quickly can you type (("[^"]*")|('[^']*')) compared to :q?

And why do you keep reiterating that there is no equivalent to in standard regexes when clearly there is \b? Read up on some basics before making stupid claims that are obviously false.

Arrogant on June 5, 2009 1:54 PM

Stop writing about things that you clearly have no clue about. How is (("[^"]*")|('[^']*')) supposed to match a quoted string, especially one that contains \"? Or a @-verbatim string that contains ""? And how quickly can you type (("[^"]*")|('[^']*')) compared to :q?

And why do you keep reiterating that there is no equivalent to and in standard regexes when clearly there is \b? Read up on some basics before making stupid claims that are obviously false.

Arrogant on June 5, 2009 1:57 PM

I have often considered using VI or EMACS as an editor for C++ code, in conjunction with the Visual Studio compiler, linker, etc.

The Visual Studio "text editor" really makes my blood boil at times, e.g. when I accidentally hit F1, or when it re-formats some of my code in an effort to "help," or when its schizophrenic threading model wanders off into the sunset with the CPU, leaving me alone with my thoughts.

But I fear one big problem: that any concept of a "solution" or "project" will likely be lost when working in another editor. I work on projects with huge numbers of files spread all over the place, so it's nice to be able to use the "Entire Solution" option of "Find In Files" to really search globally. It's also nice to be able to easily access (via the Solution Explorer) all the project files.

All of this comes with the caveat that #include trumps the Visual Studio concept of "project files" when it comes to determining exactly what constitutes your project. So the Visual Studio features I mention are really not well-implemented any way, and if EMACS or VIM can offer a plausible alternative, I want to explore that.

Do VIM and EMACS allow anything like project- or solution-level searching? Perhaps something can be simulated using the file system. Otherwise, these programs are probably excellent for single-file projects (or those with only a few files).

Incidentally, it seems to me that most Visual Studio projects contain an inordinant, even ridiculous number of files. I documented this in a thread on DailyWTF.com once and got flamed mercilessly for it. Apparently, the collective opinion is that requiring 14 files with 6 different extensions just to print "Hello, World" is de rigeur in Today's Modern World, and I should just get with the program, deal with it, and Install Silverlight Or Else.

Nevertheless, I wonder if this "file logorrhea" exhibited by Visual Studio reflects an effort (at whatever level of consciousness) to make this product less amenable to the use of non-Microsoft text editors.

Porpo Villacuonzo on July 8, 2009 1:22 PM

It's the grouping operator (ok, 'tag expression') that always catches me - I try every combination of () and \(\) and it doesn't work - ah, that'll be because MS decided to use braces. Bah.

As to the behaviour you saw with searching for "[A-Za-z]+" - it strikes me that VS restarts the search one character after the start of the last match, rather than straight after the end of the last match.

Not sure I'd agree with "...searching with regular expressions is such an extreme edge condition..." either, but then maybe I'm just regex-happy :-) Certainly happy with regexes, anyway!!

Stuart Dootson on February 6, 2010 9:46 PM

You can do some cool stuff with regex capture groups in VS.NET Find / Replace, too:
http://weblogs.asp.net/jgalloway/archive/2003/05/24/7498.aspx

Jon Galloway on February 6, 2010 9:46 PM

Jeff, thanks for this post. Saved my butt.

Brandon on February 6, 2010 9:46 PM

Hi Jeff,

I know this is off the wall, but the correct way to perform your first example would be: "<[A-Z]+>" (minus the quotes), instead of "[A-Za-z]+". The '<' and '>' characters match the whole word only instead of the word *and* the individual characters in it. Certainly, less than obvious : )

What I'm trying to figure out is how to use a regex in the "Replace with" instead of a literal. I don't even know if this can be done...

Dave Black

Dave Black on March 5, 2010 6:58 AM

I should have added in my previous post that I'm trying to turn all uppercase params/variables into lowercase with a preceeding underscore...

Dave Black on March 5, 2010 7:58 AM

Thanks for a very useful post. I was hoping to searh for particular search terms which had NOT been commented out.

To exclude comments from your rexed search, use

^~(:b*')

which effectively means at the beginning of the line exclude any amount of white space (to allow for auto-indented comments) followed by the apostrophe comment character.

To find conditional stops that have not been commented out use

^~(:b*').*:b+Stop(:b+|\n)

Shelia Smithson on August 17, 2010 11:45 PM

Curiously, VIM uses '\<' for beginning of word, '\>' for end of word, in its enormous language-parser regex syntax. So now there are at two instances of using the same (well, very close) symbols for the start/end word match.
(http://vimdoc.sourceforge.net/htmldoc/pattern.html#pattern)

Jeff Bevis on September 30, 2010 8:31 AM

The comments to this entry are closed.