
By Colin Wilcox,
Graham Mayor, and
Klaus Linke
Have you ever wanted to do more than use the basic find-and-replace functions in
Word? Wildcard characters and regular expressions can make those operations much more flexible and
powerful.
| Applies to |
| Microsoft Word 97, 2000, and 2002 |
See all Power User columns
See all columns
Have you ever had to make a large number of repetitive changes to a document by hand?
For example, have you ever had to find and remove duplicate rows from a large
table, or transpose a list of names (change them from "Colin Wilcox" to "Wilcox,
Colin")? That type of repetitive find-and-replace work gets old in a big hurry, doesn't it?
You can automate many of those find-and-replace tasks. Microsoft Word
provides a set of wildcard characters that you can use to build regular
expressions, combinations of literal text and wildcard characters. You can
use regular expressions to find text that matches a given pattern and then replace those
matches with new text.
If this all sounds complex, don't worry. We'll introduce it in easy steps,
explain things as we go, and provide
several working examples. You can use the information in this column with Word 97, 2000, and 2002. The user
interfaces vary slightly between the versions, but you can accomplish the tasks
described here with each version.
A quick spin through the jargon
To start, let's define a couple of terms:
- A wildcard character is a keyboard character that you can use to
represent one or many characters. For example, the asterisk (*) typically
represents one or more characters, and the question mark (?) typically
represents a single character.
- In our case, a regular expression is a combination of literal and
wildcard characters that you use to find and replace patterns of text. The
literal text characters indicate text that must exist in the target string of text.
The wildcard characters indicate the text that can vary in the target string.
That may seem a bit abstract, but you've seen (and most likely used) wildcard
characters and regular expressions since you first began computing. For example, the Open dialog
box (on the File menu, click the Open command) uses the asterisk wildcard character
extensively:

And, if you ever used the MS-DOS operating system, you probably used a command
and a simple
regular expression to copy files:
copy *.doc a:
That command uses the asterisk wildcard character and the .doc
literal text string to copy a set of Word documents to hard disk drive A. If you look around a bit, you'll see that Microsoft Windows® and the Microsoft Office applications use wildcard
characters everywhere.
Try it!
The steps in this section explain how to use a regular expression that
transposes names. Keep in mind that you always use the Find
and Replace dialog box to run your regular expressions.
Also, remember that if an
expression doesn't work as expected, you can always press CTRL+Z to undo your
changes, and then try another expression.
To transpose names
- Start Word and open a new, blank document.
- Copy this table and paste it into the document.
| Josh Barnhill |
| Doris Hartwig |
| Tamara Johnston |
| Daniel Shimshoni |
- Press CTRL+F to open the Find and Replace dialog box.
- If you don't see the Use wildcards check box, click More, and then select the check box. If
you don't select the check box, Word treats the wildcard characters as text.
- Click the Replace tab, and then enter the following characters in the Find what box.
Make sure you include the space between the two sets of parentheses:
(<*>) (<*>)
- In the Replace with box, enter the following characters. Make sure
you include the space between the comma and the second slash:
\2, \1
- Select the table, and then click Replace All. Word transposes the names and separates them
with a comma, like so:
| Barnhill, Josh |
| Hartwig, Doris |
| Johnston, Tamara |
| Shimshoni, Daniel |
At this point, you may wonder what to do if some or all of your names contain
middle initials. See the first example in
Putting regular expressions to work in Word for more information.
The next section explains how those regular expressions work.
What makes the expression tick
From here on, keep this principle in mind: The content of a document controls
most (but not all) of the design of your regular expressions. For
example, in the sample table you used earlier, each cell contained two words. If
the cell contained two words and a middle initial, you'd use a different
expression.
Let's examine each expression from the inside out:
In the first expression,
(<*>) (<*>):
- The asterisk (*) returns all the text in the word.
- The less
than and greater
than symbols (< >) mark the start and end
of each word, respectively. They ensure that the search returns a single word.
- The parentheses
and the space between them
divide the words into distinct groups: (first word) (second word). The
parentheses also indicate the order in which you want search to evaluate each expression.
In other words, the expression says: "Find
both words."
Note Searching on this expression, (*) (*>), produces the same
results. However, the expression in the example is easier to describe, and you should
use restricting characters whenever you can, because doing so ensures greater
accuracy in your results.
In the second expression,
\2, \1:
- The slash (\) works with the numbers to serve as a placeholder. (You
can also use the slash to find other wildcard characters. See the next section for
more information.)
- The comma after the first placeholder inserts the correct
punctuation between the transposed names.
In other words, the expression says: "Write the second word, add a comma, write the first word."
Next, let's take a look at the full set of wildcard characters
and what they do.
Wildcard character reference
The following table lists and describes the wildcard characters that are available for use in Word. Keep
one fact in mind as you go: Wildcard characters become more powerful when you
combine them.
| To find this |
Type this character |
Examples |
| Any single character |
? |
s?t finds "sat" and "set."
This character also finds the chosen combination of characters within a
word. For example, it could locate "set" within "inset." |
| Any string of characters |
* |
s*d finds "sad" and "started."
The asterisk
returns all characters and spaces that lie between the literal characters.
For example, use the s*t expression to search for
the phrase "analysis system." The following images show you the matches that search highlights:
Notice that the asterisk returns st as a match. That is default
behavior. Word does not limit the number of characters that the asterisk
can match, and it does not require that characters or spaces reside between
the literal characters that you use with the asterisk. So, be careful when using the asterisk, because it can return a
lot of unwanted results.
|
| The beginning of a word |
< |
<(inter) finds all the words that
start with "inter," such as "interesting" and "intercept,"
but not "splintered." |
| The end of a word |
> |
(in)> finds all the words that end with
"in," such as "in" and "within,"
but not "interesting." |
| One or more specified characters |
[ ] |
w[io]n finds "win" and "won" but not "worn,"
because the "r" is not specified.
Always use brackets in
pairs. If you use an opening bracket, you also use the closing bracket.
|
| Any single character in a given range of characters |
[x-z] |
[r-t]ight finds "right" and "sight."
The ranges you specify must be in ascending order. In other words, you can
specify [a-m], but not [m-a]. |
| Any single character except the characters in the range inside the
brackets |
[!x-z] |
t[!a-m]ck finds "tock" and "tuck,"
but not "tack" or "tick." |
| Exactly n occurrences of the previous character or expression
|
{n} |
fe{2}d finds "feed" but not "fed."
f[a-z]{2}d finds "find,"
"feed," and "food," but not "fed."
f([a-z]){2}d finds
"feed" and "food,"
but not "find" or "fed."
Always use braces in pairs. If you use an opening brace, you also use the
closing brace.
|
| At least n occurrences of the previous character or expression |
{n,} |
fe{1,}d finds "fed" and "feed." |
| From n to m occurrences of the previous character or
expression |
{n,m} |
10{1,3} finds "10,"
"100,"
and "1000." |
| One or more occurrences of the previous character or expression
|
@ |
lo@t finds "lot" and "loot." |
| Any wildcard character |
\wildcard_character |
[\?] finds all question mark
wildcard characters, [\*] finds all
asterisk wildcard characters, and so on. |
| To group characters and establish orders of evaluation |
() |
Use parentheses (also called round brackets) to create complex regular
expressions. The example earlier in this column, and the reference article
Putting regular expressions to work in Word, demonstrate some of the ways you can
use parentheses. |
Examples of regular expressions at work
Admittedly, the regular expression syntax is a bit cryptic. So, we created
Putting regular expressions to work in Word, a page of examples
that demonstrates some of the ways you can use regular expressions. If
you'd like to read some of the source material for this article, see
Finding and replacing characters using wildcards on the Microsoft Word MVP FAQ site.
About the authors
- Graham Mayor and Klaus Linke are Microsoft Word Most Valuable
Professionals (MVPs). For more information about MVPs and the MVP program, see the
Microsoft MVP Site and MVPs.org.
- Colin Wilcox writes for the Office Help team. In addition to contributing to
the Office Power User Corner column, he writes articles and tutorials for
Microsoft Data Analyzer.
See all Power User columns
See all columns