Putting regular expressions to work in Word

Power User Corner

 By Colin Wilcox,
 Graham Mayor, and

 Klaus Linke

Now that you understand the basics of using wildcard characters to create regular expressions, here are some examples that you can put to work.

Applies to
Microsoft Word 97, 2000, and 2002

See all Power User columns
See all columns


The examples on this reference page show you some of the ways that you can use wildcard characters and regular expressions in Microsoft Word. For an introduction to this subject, see Add power to Word searches with regular expressions.

Example 1: Transpose names with middle initials

The article Add power to Word searches with regular expressions explained how to use wildcard characters and a regular expression to transpose names—for example, to change "Colin Wilcox" to "Wilcox, Colin." But what do you do if some or all of the names contain middle initials or middle names? This example uses a combination of wildcard characters and character codes to transpose names that contain middle initials. If you're unfamiliar with character codes, see the Word Help topic titled "Find and replace text or other items."

Keep these facts in mind as you proceed:

  • Whenever you use this expression on names that reside in a table, you must first convert that table to text.
  • If the table contains more than one column, copy the column containing the names to a blank document and convert it to text there.
  • After you transpose the names, convert the text back to a table. You can then delete the original column and replace it with your changed data.

Follow the steps in these examples to walk through the entire process.

To prepare sample data
  1. If you haven't already done so, start Word and create a new, blank document.
  2. Insert a blank table into the document. Make the table one column wide by four rows high.
  3. Copy these names individually, and paste each one into a blank table cell: Joshua Quentin Barnhill
    Doris X. Hartwig
    Tamara Y. Johnston
    Daniel Shimshoni
    Your table should look something like this:
Joshua Quentin Barnhill
Doris X. Hartwig
Tamara Y. Johnston
Daniel Shimshoni
  1. Select the table, and on the Table menu, point to Convert, and then click Table to Text.
  2. Select Paragraph marks as the text separator, and then click OK.
To transpose names with initials
  1. On the Edit menu, click Find to open the Find and Replace dialog box.
  2. Select the Use wildcards check box (you may need to click More to see the check box), and then enter the following expression in the Find what box: (*) ([! ]@)^13
  3. Enter the following expression in the Replace with box: \2, \1^p Make sure you enter a space between the two sets of parentheses and after the exclamation point. If you haven't seen the ^13 character before, we explain what it does in the next section.
  4. Select the list of names, and then click Replace All. Word transposes the names and either middle initials or middle names, like so:

Barnhill, Joshua Quentin
Hartwig, Doris X.
Johnston, Tamara Y.
Shimshoni, Daniel

To convert the changed text back to a table
  1. Select the list of transposed names.
  2. On the Table menu, point to Convert, and then click Text to Table. The Convert Text to Table dialog box opens.
  3. Under Separate text at, click Paragraphs, and then click OK.

The expressions dissected

Let's look at the individual pieces of the expression to see how they work, starting with the second half of the expression in the Find what box.

The entire expression looks for two groups of patterns: a first name with a middle initial (or a middle name) and a last name. This part of the expression matches the last names:

([! ]@)^13

We need to know where the last name ends, so we also use the ^13 character to search for the paragraph mark at the end of each line. However, since we don't plan to reuse the paragraph mark, we surround everything else with parentheses.

You can try this by copying the names to your test document again (make sure you separate them with paragraph marks), and then search using ([! ]@)^13 in the Find what box. Search matches each last name.

Because search starts again at the beginning of the next line, we use the asterisk wildcard character (*) to match everything from there to the beginning of the next last name.

Since we don't plan to reuse the space in front of the last name, we use parentheses to exclude it from the two groups:

(*) ([! ]@)^13

 Important   Be careful when using the ^13 character code. Normally, you can use the ^p character code to search for paragraph marks. However, that code does not work in wildcard searches. Instead, you need to use the substitute code ^13. Although the ^p character code does not work in wildcard searches, you should use it in wildcard replace operations. Why? The ^p character includes formatting information, and the ^13 character does not. In addition, you cannot assign style information to the ^13 character at all. Misusing the ^13 code in a replace operation can essentially convert your document into a file that you cannot format.

In the Replace with box, the \2, characters tell search to write the second pattern first and to add a comma after the pattern. The \1^p characters tell search where to write the first pattern and to write a paragraph mark after that pattern.

Example 2: Transposing dates

You can use the regular expressions shown here to convert dates in European format to dates in the U.S. format.

To transpose dates
  1. Copy and paste the following date into your document: 28th May 2003
  2. Open the Find and Replace dialog box, and enter the following expression in the Find what box: ([0-9]{1,2})([dhnrst]{2}) (<[ADFJMNOS]*>) ([0-9]{4}) Make sure you insert a space between the following opening and closing parentheses: 2}) (<[ and *>) ([0.
  3. Enter the following expression in the Replace with box (make sure you insert a space between each set of characters), and then click Replace All: \3 \1, \4

Search replaces 28th May, 2003 with May 28, 2003.

The expression, piece by piece

Let's start with the expression in the Find what box. The expression works by breaking dates down into four patterns, denoted by the sets of parentheses. Each pattern contains the components that you find in all dates written in the style that you used in the example. Working from left to right:

  • The number range [0-9] matches the single-digit numbers in the first pattern. Because dates can consist of two numbers, we tell search to return either one-digit or two-digit dates: {1,2}. The result is the first pattern: ([0-9]{1,2}).

Ordinals make up the second pattern. Ordinals consist of "th," "nd," "st," and "rd," so we add those letters to a range [dhnrst]. Because ordinals always consist of two letters, we restrict the letter count to two: ([dhnrst]{2}).

  • Next comes a space, followed by literal and wildcard characters that find month names. All month names begin with these capital letters: ADFJMNOS. We don't know how many characters follow each capital letter, so we follow them with the asterisk (*). We're only interested in the month name itself, so we use greater-than and less-than characters to limit the results to the individual word. The result is the fourth pattern: (<[ADFJMNOS]*>).
  • Finally, we search for the year. We use the same number range, but this time we restrict the count to four letters ([0-9]{4}).

In the Replace with box, notice that we only wrote three of the four address patterns. We omitted the ordinal (the "th") from the date because dates in the U.S. format don't use ordinals. If you want to leave the ordinal in the date, enter \3 \1\2, \4 in the Replace with box. In this case, you enter a space both after the 3 and after the comma, nowhere else.

At this point, you may ask how to handle dates in which the name of the month isn't spelled out, such as 28/05/03. You search using this expression:

([0-9]{1,2})/([0-9]{1,2})/([0-9]{2})

You replace using this expression:

\3/\1/\2

If the date takes the format of 28/05/2003, you use {4} in the last pattern instead of {2}.

About using list separators in regular expressions

The previous example uses the following argument to find either one-digit or two-digit dates: {1,2}. In this case, a comma separates the two values. However, remember that your regional settings in Microsoft Windows® control the list separator that you use. If your regional settings specify the use of semicolons as list separators, you must use them instead of commas.

To find out which list separator your operating system specifies
  1. Click Start, point to Settings, and then click Control Panel.
  2. Double-click Regional Settings (if you use Windows Me, Windows 98, or Windows NT®), Regional Options (if you use Windows 2000), or Regional and Language Options (if you use Windows XP).

 Note   If you use Category view in Windows XP, you only need to click Regional and Language Options once.

The following table describes how to find the list separator setting for the supported versions of Windows.

Operating System Steps to find your list separator
Windows XP
  1. After you click or double-click the Regional and Language Options command, the Regional and Language Options dialog box opens. On the Regional tab, click Customize. The Customize Regional Options dialog box opens.
  2. Click the Numbers tab, and then locate the List separator entry.
Windows 2000
  1. After you double-click the Regional Options command, the Regional Options dialog box opens.
  2. Click the Numbers tab, and then locate the List separator entry.
Windows Me, 98, and NT
  1. After you double-click the Regional Settings command, the Regional Settings dialog box opens.
  2. Click the Numbers tab and locate the List separator entry.

Example 3: Add periods to, or remove them from, salutations

In some countries, honorific titles (Mr., Mrs., and so on) do not include periods. This example shows you how to add periods to or remove them from honorifics. From this point on, we assume that you know how to use the Find and Replace dialog box.

This expression finds Mr, Ms, Mrs, and Dr without periods:

<([DM][rs]{1,2})( )

Notice that the expression uses a second pattern containing a blank space. That space normally would follow the honorific if the period was not there. This expression adds the period:

\1.\2

To do the reverse, search using this expression:

<([DM][rs]{1,2}).

And replace using this expression:

\1

Example 4: Find duplicate paragraphs or rows

When you use this expression, you may want to sort the list first to place duplicate rows next to each other. Also, you need to remove all blank paragraph marks. In other words, if you use blank paragraphs to separate blocks of text, like so:

Joshua Quentin Barnhill¶

Joshua Quentin Barnhill¶

Doris X. Hartwig¶

you need to remove those paragraphs, like so:

Joshua Quentin Barnhill¶
Joshua Quentin Barnhill¶
Doris X. Hartwig¶

You can use your favorite method to remove the blank paragraphs, but since we're talking about regular expressions, here's one that finds two consecutive paragraph characters. Search using this expression (the @ character repeats the find-and-replace operation and removes all multiple empty lines):

(^13)\1@

You replace the results with this expression:

^p

Now let's look at ways to replace text. This expression finds any sequence of two consecutive identical paragraphs:

(*^13)\1

This expression also matches longer repetitions of text that end in paragraphs. For example, run the expression against the following list:

Joshua Quentin Barnhill¶
Doris X. Hartwig¶
Joshua Quentin Barnhill¶
Doris X. Hartwig¶
Tamara Y. Johnston¶

Search finds the first four lines and stops only when the overall pattern changes. In contrast, if you run the expression against this list:

Joshua Quentin Barnhill¶
Joshua Quentin Barnhill
Doris X. Hartwig¶
Doris X. Hartwig¶

The expression finds only the first two paragraphs. 

To search for a greater number of identical items, add more placeholders. For example, this expression finds three consecutive identical paragraphs:

(*^13)\1\1

You can also use braces to do the same thing. The following examples find two and three identical paragraphs, respectively:

(*^13){2}
(*^13){3}

Or, you can find either two or three identical paragraphs:

(*^13){2,3}

You can also find two or more identical paragraphs:

(*^13){2,}

You can replace any of those expressions with the following string:

\1

In addition, you can repeat the find-and-replace operation as needed to replace all the duplicate paragraphs in your document, or you can add the @ wildcard character and have the expression repeat the operation for you:

(*^13)\1@

You also use this method to replace duplicate rows in a table. To do so, first remove any merged cells, and then sort the table to place duplicate cells adjacent to each other. Next, convert your table to text. (On the Table menu, point to Convert, and then click Table to text; when prompted, use the tab delimiter.) After you make your replacements, convert the text back to a table.

More examples


About the authors

  • Graham Mayor and Klaus Linke are Microsoft Word Most Valuable Professionals (MVPs). For more information about MVPs and the MVP program, see the Microsoft MVP Site and MVPs.org.
  • Colin Wilcox writes for the Office Help team. In addition to contributing to the Office Power User Corner column, he writes articles and tutorials for Microsoft Data Analyzer.

See all Power User columns
See all columns