Add power to Word searches with regular expressions

Power User Corner

By Colin Wilcox,
Graham Mayor, and

Klaus Linke

Have you ever wanted to do more than use the basic find-and-replace functions in Word? Wildcard characters and regular expressions can make those operations much more flexible and powerful.

Applies to
Microsoft Word 97, 2000, and 2002

See all Power User columns
See all columns


Have you ever had to make a large number of repetitive changes to a document by hand? For example, have you ever had to find and remove duplicate rows from a large table, or transpose a list of names (change them from "Colin Wilcox" to "Wilcox, Colin")? That type of repetitive find-and-replace work gets old in a big hurry, doesn't it?

You can automate many of those find-and-replace tasks. Microsoft Word provides a set of wildcard characters that you can use to build regular expressions, combinations of literal text and wildcard characters. You can use regular expressions to find text that matches a given pattern and then replace those matches with new text.

If this all sounds complex, don't worry. We'll introduce it in easy steps, explain things as we go, and provide several working examples. You can use the information in this column with Word 97, 2000, and 2002. The user interfaces vary slightly between the versions, but you can accomplish the tasks described here with each version.

A quick spin through the jargon

To start, let's define a couple of terms:

  • A wildcard character is a keyboard character that you can use to represent one or many characters. For example, the asterisk (*) typically represents one or more characters, and the question mark (?) typically represents a single character.
  • In our case, a regular expression is a combination of literal and wildcard characters that you use to find and replace patterns of text. The literal text characters indicate text that must exist in the target string of text. The wildcard characters indicate the text that can vary in the target string.

That may seem a bit abstract, but you've seen (and most likely used) wildcard characters and regular expressions since you first began computing. For example, the Open dialog box (on the File menu, click the Open command) uses the asterisk wildcard character extensively:

Wildcard characters in the Open dialog box

And, if you ever used the MS-DOS operating system, you probably used a command and a simple regular expression to copy files:

copy *.doc a:

That command uses the asterisk wildcard character and the .doc literal text string to copy a set of Word documents to hard disk drive A. If you look around a bit, you'll see that Microsoft Windows® and the Microsoft Office applications use wildcard characters everywhere.

Try it!

The steps in this section explain how to use a regular expression that transposes names. Keep in mind that you always use the Find and Replace dialog box to run your regular expressions. Also, remember that if an expression doesn't work as expected, you can always press CTRL+Z to undo your changes, and then try another expression.

To transpose names
  1. Start Word and open a new, blank document.
  2. Copy this table and paste it into the document.
Josh Barnhill
Doris Hartwig
Tamara Johnston
Daniel Shimshoni
  1. Press CTRL+F to open the Find and Replace dialog box.
  2. If you don't see the Use wildcards check box, click More, and then select the check box. If you don't select the check box, Word treats the wildcard characters as text.
  3. Click the Replace tab, and then enter the following characters in the Find what box. Make sure you include the space between the two sets of parentheses: (<*>) (<*>)
  4. In the Replace with box, enter the following characters. Make sure you include the space between the comma and the second slash: \2, \1
  5. Select the table, and then click Replace All. Word transposes the names and separates them with a comma, like so:
Barnhill, Josh
Hartwig, Doris
Johnston, Tamara
Shimshoni, Daniel

At this point, you may wonder what to do if some or all of your names contain middle initials. See the first example in Putting regular expressions to work in Word for more information.

The next section explains how those regular expressions work.

What makes the expression tick

From here on, keep this principle in mind: The content of a document controls most (but not all) of the design of your regular expressions. For example, in the sample table you used earlier, each cell contained two words. If the cell contained two words and a middle initial, you'd use a different expression.

Let's examine each expression from the inside out:

In the first expression, (<*>) (<*>):

  • The asterisk (*) returns all the text in the word.
  • The less than and greater than symbols (< >) mark the start and end of each word, respectively. They ensure that the search returns a single word.
  • The parentheses and the space between them divide the words into distinct groups: (first word) (second word). The parentheses also indicate the order in which you want search to evaluate each expression.

In other words, the expression says: "Find both words."

 Note   Searching on this expression, (*) (*>), produces the same results. However, the expression in the example is easier to describe, and you should use restricting characters whenever you can, because doing so ensures greater accuracy in your results.

In the second expression, \2, \1:

  • The slash (\) works with the numbers to serve as a placeholder. (You can also use the slash to find other wildcard characters. See the next section for more information.)
  • The comma after the first placeholder inserts the correct punctuation between the transposed names.

In other words, the expression says: "Write the second word, add a comma, write the first word."

Next, let's take a look at the full set of wildcard characters and what they do.

Wildcard character reference

The following table lists and describes the wildcard characters that are available for use in Word. Keep one fact in mind as you go: Wildcard characters become more powerful when you combine them.

To find this Type this character Examples
Any single character ? s?t finds "sat" and "set." This character also finds the chosen combination of characters within a word. For example, it could locate "set" within "inset."
Any string of characters *

s*d finds "sad" and "started." The asterisk returns all characters and spaces that lie between the literal characters. For example, use the s*t expression to search for the phrase "analysis system." The following images show you the matches that search highlights:

  • The first text string found by the wildcard search
  • The second string of text found by the wildcard search
  • The final string of text found by the wildcard search
  • A pattern found by a regular expression

Notice that the asterisk returns st as a match. That is default behavior. Word does not limit the number of characters that the asterisk can match, and it does not require that characters or spaces reside between the literal characters that you use with the asterisk. So, be careful when using the asterisk, because it can return a lot of unwanted results.

The beginning of a word < <(inter) finds all the words that start with "inter," such as "interesting" and "intercept," but not "splintered."
The end of a word > (in)> finds all the words that end with "in," such as "in" and "within," but not "interesting."
One or more specified characters [ ]

w[io]n finds "win" and "won" but not "worn," because the "r" is not specified.

Always use brackets in pairs. If you use an opening bracket, you also use the closing bracket.

Any single character in a given range of characters [x-z] [r-t]ight finds "right" and "sight." The ranges you specify must be in ascending order. In other words, you can specify [a-m], but not [m-a].
Any single character except the characters in the range inside the brackets [!x-z] t[!a-m]ck finds "tock" and "tuck," but not "tack" or "tick."
Exactly n occurrences of the previous character or expression {n}

fe{2}d finds "feed" but not "fed." f[a-z]{2}d finds "find," "feed," and "food," but not "fed."
f([a-z]){2}d finds "feed" and "food," but not "find" or "fed."

Always use braces in pairs. If you use an opening brace, you also use the closing brace.

At least n occurrences of the previous character or expression {n,} fe{1,}d finds "fed" and "feed."
From n to m occurrences of the previous character or expression {n,m} 10{1,3} finds "10," "100," and "1000."
One or more occurrences of the previous character or expression @ lo@t finds "lot" and "loot."
Any wildcard character \wildcard_character [\?] finds all question mark wildcard characters, [\*] finds all asterisk wildcard characters, and so on.
To group characters and establish orders of evaluation () Use parentheses (also called round brackets) to create complex regular expressions. The example earlier in this column, and the reference article Putting regular expressions to work in Word, demonstrate some of the ways you can use parentheses.

Examples of regular expressions at work

Admittedly, the regular expression syntax is a bit cryptic. So, we created Putting regular expressions to work in Word, a page of examples that demonstrates some of the ways you can use regular expressions. If you'd like to read some of the source material for this article, see Finding and replacing characters using wildcards on the Microsoft Word MVP FAQ site.


About the authors

  • Graham Mayor and Klaus Linke are Microsoft Word Most Valuable Professionals (MVPs). For more information about MVPs and the MVP program, see the Microsoft MVP Site and MVPs.org.
  • Colin Wilcox writes for the Office Help team. In addition to contributing to the Office Power User Corner column, he writes articles and tutorials for Microsoft Data Analyzer.

See all Power User columns
See all columns