I <3 Steve McConnell*
Coding Horror
programming and human factors
by Jeff Atwood

October 23, 2005

Excluding Matches With Regular Expressions

Here's an interesting regex problem:

I seem to have stumbled upon a puzzle that evidently is not new, but for which no (simple) solution has yet been found. I am trying to find a way to exclude an entire word from a regular expression search. The regular expression should find and return everything EXCEPT the text string in the search expression.

For example, if the word fox was what I wanted to exclude, and the searched text was:

The quick brown fox jumped over the lazy dog.

... and I used a regular expression of [^"fox"] (which I know is incorrect) (why this doesn't work I don't understand; it would make life SO much easier), then the returned search results would be:

The quick brown jumped over the lazy dog.

Regular expressions are great at matching. It's easy to formulate a regex using what you want to match. Stating a regex in terms of what you don't want to match is a bit harder.

One easy way to exclude text from a match is negative lookbehind:

\w+\b(?<!\bfox)

But not all regex flavors support negative lookbehind. And those that do typically have severe restrictions on the lookbehind, eg, it must be a simple fixed-length expression. To avoid incompatibility, we can restate our solution using negative lookahead:

(?!fox\b)\b\w+

You can test this regex in the cool online JavaScript Regex evaluator. Unfortunately, JavaScript doesn't support negative lookbehind, so if you want to test that one, I recommend RegexBuddy. It's not free, but it's the best regex tool out there by far-- and it keeps getting better with every incremental release.

Posted by Jeff Atwood    View blog reactions

 

« It looks like you're writing a for loop! The Cost of Leaving Your PC On »

 

Comments

What about a RegEx replace? You could do a replace on all of your matches with an empty string(or a "filtered" notification if you prefer) and achieve the same functionality.

Marty Thompson on October 24, 2005 01:31 PM

i faced a problem similar to this a few months ago where i had a template and i needed to look for anything in the string matching the template (ex. {something} look for something between the two braces). i searched high and low, to no avail, for a reg ex solution. it would seem as though a similar solution to the above problem could be used? maybe this could be done with a reg ex, i dunno (goes way past my knowledge of reg ex), in the end i had to write a string parsing algorithm.

matt on October 25, 2005 01:19 PM

> i needed to look for anything in the string matching the template (ex. {something} look for something between the two braces).

Hmm, something between the two braces:

"{[^}]+}"

Just watch out for a) non-escaped {} characters and b) line breaks inside the braces.

Jeff Atwood on October 25, 2005 03:26 PM

I took advantage of this just last month. http://blog.eriklane.com/archive/2005/09/28/2105.aspx and I was pretty excited when I finally got it work. Woohoo!

Erik Lane on October 26, 2005 11:08 AM

Old comments, I know. But if you just split on "fox" you'll get an array of strings split on the occurrences of "fox". For example, "foo fox bar" would return an array of two strings, "foo " and " bar".

ORANGE!

Will Sullivan on February 19, 2007 10:04 AM

Your regex's appear to use lookaheads/behinds in a different order from those on http://www.regular-expressions.info.

I thought lookaheads were positioned after the text whilst lookbehinds preceeded the text.

James on November 7, 2007 02:58 AM

I used to use these too but I still can't solve this problem
ex:
"<span>this is span</span>and <span>this is another"

I need to get the part of "<span>this is another"

using /(<span>.*(!?</span>))$/ is not working

Waleed GadElKareem on January 7, 2008 08:14 PM

This is great. I needed something to exclude a tag. I will test it to see if it works for my application.

Lorre on August 2, 2008 08:04 AM

If i had a string of numbers (9660199), how would I exclude the 4th & 5th digits? (01)

My match should return 96699. Any ideas on this?

Arron Tow on August 6, 2008 08:36 AM

Very helpful when I needed to match anything but [Please Select]
^(?!\[Please Select\]).*$

David on September 4, 2008 06:11 AM







(hear it spoken)


(no HTML)




Content (c) 2008 Jeff Atwood. Logo image used with permission of the author. (c) 1993 Steven C. McConnell. All Rights Reserved.