Find and replace text by using regular expressions (Advanced)

You can automate many find-and-replace tasks by using wildcard characters to build regular expressions, which are combinations of literal text and wildcard characters.

For example, you can use regular expressions to find and remove duplicate rows from a large table or to transpose a list of names (change them from "First Last" to "Last, First").

In this article


Terms

To start, let's define a couple of terms:

  • A wildcard character is a keyboard character that you can use to represent one or many characters. For example, the asterisk (*) typically represents one or more characters, and the question mark (?) typically represents a single character.
  • In our case, a regular expression is a combination of literal and wildcard characters that you use to find and replace patterns of text. The literal text characters indicate text that must exist in the target string of text. The wildcard characters indicate the text that can vary in the target string.

Top of Page Top of Page

Try it!

The steps in this section explain how to use a regular expression that transposes names. Keep in mind that you always use the Find and Replace dialog box to run your regular expressions. Also, remember that if an expression doesn't work as expected, you can always press CTRL+Z to undo your changes, and then try another expression.

To transpose names
  1. Start Word and open a new, blank document.
  2. Copy this table and paste it into the document.
Josh Barnhill
Doris Hartwig
Tamara Johnston
Daniel Shimshoni
  1. On the Home tab, in the Editing group, click Replace to open the Find and Replace dialog box.

Editing group

  1. If you don't see the Use wildcards check box, click More, and then select the check box. If you don't select the check box, Word treats the wildcard characters as text.
  2. Type the following characters in the Find what box. Make sure you include the space between the two sets of parentheses:

(<*>) (<*>)

  1. In the Replace with box, type the following characters. Make sure you include the space between the comma and the second slash:

\2, \1

  1. Select the table, and then click Replace All. Word transposes the names and separates them with a comma, like so:
Barnhill, Josh
Hartwig, Doris
Johnston, Tamara
Shimshoni, Daniel

At this point, you may wonder what to do if some or all of your names contain middle initials. See the first example in Putting regular expressions to work for more information.

Top of Page Top of Page

How regular expressions work

From here on, keep this principle in mind: The document’s content determines most (but not all) of the design of your regular expressions. For example, in the sample table you used earlier, each cell contained two words. If the cell contained two words and a middle initial, you'd use a different expression.

Let's examine each expression from the inside out:

In the first expression, (<*>) (<*>):

  • The asterisk (*) returns all the text in the word.
  • The less than and greater than symbols (< >) mark the start and end of each word, respectively. They ensure that the search returns a single word.
  • The parentheses and the space between them divide the words into distinct groups: (first word) (second word). The parentheses also indicate the order in which you want search to evaluate each expression.

In other words, the expression says: "Find both words."

In the second expression, \2, \1:

  • The slash (\) works with the numbers to serve as a placeholder. (You can also use the slash to find other wildcard characters. See the next section for more information.)
  • The comma after the first placeholder inserts the correct punctuation between the transposed names.

In other words, the expression says: "Write the second word, add a comma, write the first word."

Top of Page Top of Page

Wildcards for items you want to find and replace

You can use wildcards to search for text. For example, you can use the asterisk (*) wildcard to search for a string of characters (for example, "s*d" finds "sad" and "started").

Use wildcards to find and replace text

  1. On the Home tab, in the Editing group, click the arrow next to Find, and then click Advanced Find.

Find command in the Editing group

  1. Select the Use wildcards check box.

If you don't see the Use wildcards check box, click More.

  1. Do one of the following:
    • To choose a wildcard character from a list, click Special, click a wildcard character, and then type any additional text in the Find what box. For more information, see the table Available wildcards.
    • Type a wildcard character directly in the Find what box. For more information, see the table Available wildcards.
  2. If you want to replace the item, click the Replace tab, and then type what you want to use as a replacement in the Replace with box.
  3. Click Find Next, Find All, Replace, or Replace All.

To cancel a search in progress, press ESC.

Available wildcards

 Notes 

  • When the Use wildcards check box is selected, Word finds only the exact text that you specify. Notice that the Match case and Find whole words only check boxes are unavailable (dimmed) to indicate that these options are automatically turned on. You can't turn off these options.
  • To search for a character that's defined as a wildcard, type a backslash (\) before the character. For example, type \? to find a question mark.
  • You can use parentheses to group the wildcard characters and text and to indicate the order of evaluation. For example, type <(pre)*(ed)> to find "presorted" and "prevented".
  • You can use the \n wildcard to search for an expression and then replace it with the rearranged expression. For example, type (Ashton) (Chris) in the Find what box and \2 \1 in the Replace with box. Word will find Ashton Chris and replace it with Chris Ashton.
To find Type Example
Any single character ? s?t finds sat and set.
Any string of characters * s*d finds sad and started.
The beginning of a word < <(inter) finds interesting and intercept, but not splintered.
The end of a word > (in)> finds in and within, but not interesting.
One of the specified characters [ ] w[io]n finds win and won.
Any single character in this range [-] [r-t]ight finds right and sight. Ranges must be in ascending order.
Any single character except the characters in the range inside the brackets [!x-z] t[!a-m]ck finds tock and tuck, but not tack or tick.
Exactly n occurrences of the previous character or expression {n} fe{2}d finds feed but not fed.
At least n occurrences of the previous character or expression {n,} fe{1,}d finds fed and feed.
From n to m occurrences of the previous character or expression {n,m} 10{1,3} finds 10, 100, and 1000.
One or more occurrences of the previous character or expression @ lo@t finds lot and loot.

Top of Page Top of Page

Putting regular expressions to work

These examples show you some of the ways that you can use wildcard characters and regular expressions in Microsoft Word

Example 1: Transpose names with middle initials

This example uses a combination of wildcard characters and character codes to transpose names that contain middle initials. If you're unfamiliar with character codes, see the Word Help topic Find and replace text and other data in your Word 2010 files.

Keep these facts in mind as you proceed:

  • Whenever you use this expression on names that reside in a table, you must first convert that table to text.
  • If the table contains more than one column, copy the column containing the names to a blank document and convert it to text there.
  • After you transpose the names, convert the text back to a table. You can then delete the original column and replace it with your changed data.
To prepare sample data
  1. If you haven't already done so, start Word and create a new, blank document.
  2. Insert a blank table into the document. Make the table one column wide by four rows high.
  3. Copy these names individually, and paste each one into a blank table cell:

Joshua Quentin Barnhill
Doris X. Hartwig
Tamara Y. Johnston
Daniel Shimshoni

Your table should look something like this:

Joshua Quentin Barnhill
Doris X. Hartwig
Tamara Y. Johnston
Daniel Shimshoni
  1. Select the table, and on the Table Tools Layout tab, in the Data group, click Convert to Text.
  2. Select Paragraph marks as the text separator, and then click OK.
To transpose names with initials
  1. On the Home tab, in the Editing group, click Replace to open the Find and Replace dialog box.
  2. Select the Use wildcards check box (you may need to click More to see the check box), and then type the following expression in the Find what box:

(*) ([! ]@)^13

Make sure you enter a space between the two sets of parentheses and after the exclamation point. If you haven't seen the ^13 character before, we explain what it does in the next section.

  1. In the Replace with box, type the following expression:

\2, \1^p

  1. Select the list of names, and then click Replace All. Word transposes the names and either middle initials or middle names, like so:

Barnhill, Joshua Quentin
Hartwig, Doris X.
Johnston, Tamara Y.
Shimshoni, Daniel

To convert the changed text back to a table
  1. Select the list of transposed names.
  2. On the Insert tab, in the Tables group, click Table, and then click Convert Text to Table.

The Convert Text to Table dialog box opens.

  1. Under Separate text at, click Paragraphs, and then click OK.

The expressions, piece by piece

Let's look at the individual pieces of the expression to see how they work, starting with the expression in the Find what box.

The entire expression looks for two groups of patterns: a first name with a middle initial (or a middle name) and a last name. The (*) finds all first names. Notice that there's a space after it.

This part of the expression matches the last names:

([! ]@)^13

The exclamation point excludes any character specified in the brackets. In this case, [! ] means "find everything but spaces." Its effect is to trim the space from in front of the last names.

The @ character finds one or more occurrences of the previous character, so all it's doing is making sure that all spaces in front of the last name are removed.

We need to know where the last name ends, so we also use the ^13 character to search for the paragraph mark at the end of each line. However, since we don't plan to reuse the paragraph mark, we surround everything else with parentheses.

You can try this by copying the names to your test document again (make sure you separate them with paragraph marks), and then search using ([! ]@)^13 in the Find what box. Search matches each last name.

Because search starts again at the beginning of the next line, we use the asterisk wildcard character (*) to match everything from there to the beginning of the next last name.

Since we don't plan to reuse the space in front of the last name, we use parentheses to exclude it from the two groups:

(*) ([! ]@)^13

 Important   Be careful when using the ^13 character code. Normally, you can use the ^p character code to search for paragraph marks. However, that code does not work in wildcard searches. Instead, you need to use the substitute code ^13. Although the ^p character code does not work in wildcard searches, you should use it in wildcard replace operations. Why? The ^p character includes formatting information, and the ^13 character does not. In addition, you cannot assign style information to the ^13 character at all. Misusing the ^13 code in a replace operation can essentially convert your document into a file that you cannot format.

The "replace" expression (\2 \1) does the actual transposition. In the Replace with box, the \2, characters tell search to write the second pattern first and to add a comma after the pattern. The \1^p characters tell search where to write the first pattern and to write a paragraph mark after that pattern.

Example 2: Transposing dates

You can use the regular expressions shown here to convert dates in European format to dates in the U.S. format.

To transpose dates
  1. Copy and paste the following date into your document: 28th May 2003
  2. Open the Find and Replace dialog box, and type the following expression in the Find what box:

([0-9]{1,2})([dhnrst]{2}) (<[ADFJMNOS]*>) ([0-9]{4})

Make sure you insert a space between the following opening and closing parentheses: 2}) (<[ and *>) ([0.

  1. Enter the following expression in the Replace with box:

\3 \1, \4

Make sure you insert a space between each set of characters.

  1. Click Replace All.

Search replaces 28th May, 2003 with May 28, 2003.

The expressions, piece by piece

Let's start with the expression in the Find what box. The expression works by breaking dates down into four patterns, denoted by the sets of parentheses. Each pattern contains the components that you find in all dates written in the style that you used in the example. Working from left to right:

  • The number range [0-9] matches the single-digit numbers in the first pattern. Because dates can consist of two numbers, we tell search to return either one-digit or two-digit dates: {1,2}. The result is the first pattern: ([0-9]{1,2}).

Ordinals make up the second pattern. Ordinals consist of "th," "nd," "st," and "rd," so we add those letters to a range [dhnrst]. Because ordinals always consist of two letters, we restrict the letter count to two: ([dhnrst]{2}).

  • Next comes a space, followed by literal and wildcard characters that find month names. All month names begin with these capital letters: ADFJMNOS. We don't know how many characters follow each capital letter, so we follow them with the asterisk (*). We're only interested in the month name itself, so we use greater-than and less-than characters to limit the results to the individual word. The result is the fourth pattern: (<[ADFJMNOS]*>).
  • Finally, we search for the year. We use the same number range, but this time we restrict the count to four letters ([0-9]{4}).

Notice that in the Replace with box we wrote only three of the four address patterns. We omitted the ordinal (the "th") from the date because dates in the U.S. format don't use ordinals. If you want to leave the ordinal in the date, enter \3 \1\2, \4 in the Replace with box. In this case, you enter a space both after the 3 and after the comma, but nowhere else.

At this point, you may ask how to handle dates in which the name of the month isn't spelled out, such as 28/05/03. You search using this expression:

([0-9]{1,2})/([0-9]{1,2})/([0-9]{2})

You replace using this expression:

\3/\1/\2

If the date takes the format of 28/05/2003, you use {4} in the last pattern instead of {2}.

About using list separators in regular expressions

The previous example uses the following argument to find either one-digit or two-digit dates: {1,2}. In this case, a comma separates the two values. However, remember that your regional settings in Windows control the list separator that you use. If your regional settings specify the use of semicolons as list separators, you must use them instead of commas.

To find out which list separator your operating system specifies, do the following:

ShowWindows 7

  1. Click the Start button Button image, and then click Control Panel.
  2. Click Clock, Language, and Region.
  3. Click Change the date, time, or number format, and then click Additional settings.
  4. Click the Numbers tab, and then locate the List separator entry.

ShowWindows Vista

  1. Click the Start button Button image, and then click Control Panel.
  2. Click Clock, Language, and Region.
  3. Click Regional and Language Options.
  4. On the Formats tab, under Current format, click Customize this format.
  5. Click the Numbers tab, and then locate the List separator entry.

ShowWindows XP

  1. Click Start, and then click Control Panel.
  2. Double-click Regional and Language Options.
  3. On the Regional tab, click Customize.
  4. Click the Numbers tab, and then locate the List separator entry.

Example 3: Add periods to, or remove them from, salutations

In some countries, honorific titles (Mr., Mrs., and so on) do not include periods. This example shows you how to add periods to or remove them from honorifics. From this point on, we assume that you know how to use the Find and Replace dialog box.

This expression finds Mr, Ms, Mrs, and Dr without periods:

<([DM][rs]{1,2})( )

Notice that the expression uses a second pattern containing a blank space. That space normally would follow the honorific if the period was not there. This expression adds the period:

\1.\2

To do the reverse, search using this expression:

<([DM][rs]{1,2}).

And replace using this expression:

\1

Example 4: Find duplicate paragraphs or rows

When you use this expression, you may want to sort the list first to place duplicate rows next to each other. Also, you need to remove all blank paragraph marks. In other words, if you use blank paragraphs to separate blocks of text, like so:

Joshua Quentin Barnhill¶

Joshua Quentin Barnhill¶

Doris X. Hartwig¶

you need to remove those paragraphs, like so:

Joshua Quentin Barnhill¶
Joshua Quentin Barnhill¶
Doris X. Hartwig¶

You can use your favorite method to remove the blank paragraphs, but since we're talking about regular expressions, here's one that finds two consecutive paragraph characters. Search using this expression (the @ character repeats the find-and-replace operation and removes all multiple empty lines):

(^13)\1@

You replace the results with this expression:

^p

Now let's look at ways to replace text. This expression finds any sequence of two consecutive identical paragraphs:

(*^13)\1

This expression also matches longer repetitions of text that end in paragraphs. For example, run the expression against the following list:

Joshua Quentin Barnhill¶
Doris X. Hartwig¶
Joshua Quentin Barnhill¶
Doris X. Hartwig¶
Tamara Y. Johnston¶

Search finds the first four lines and stops only when the overall pattern changes. In contrast, if you run the expression against this list:

Joshua Quentin Barnhill¶
Joshua Quentin Barnhill
Doris X. Hartwig¶
Doris X. Hartwig¶

The expression finds only the first two paragraphs. 

To search for a greater number of identical items, add more placeholders. For example, this expression finds three consecutive identical paragraphs:

(*^13)\1\1

You can also use braces to do the same thing. The following examples find two and three identical paragraphs, respectively:

(*^13){2}
(*^13){3}

Or, you can find either two or three identical paragraphs:

(*^13){2,3}

You can also find two or more identical paragraphs:

(*^13){2,}

You can replace any of those expressions with the following string:

\1

In addition, you can repeat the find-and-replace operation as needed to replace all the duplicate paragraphs in your document, or you can add the @ wildcard character and have the expression repeat the operation for you:

(*^13)\1@

You also use this method to replace duplicate rows in a table. To do so, first remove any merged cells, and then sort the table to place duplicate cells adjacent to each other. Next, convert your table to text. (On the Table menu, point to Convert, and then click Table to text; when prompted, use the tab delimiter.) After you make your replacements, convert the text back to a table.

More examples

For more examples of how to use regular expressions in Word, see Finding and replacing characters using wildcards on the MVP FAQ site.


About the authors

This article assembles content previously authored with the help of Graham Mayor and Klaus Linke, former Microsoft Word Most Valuable Professionals (MVPs). For more information about MVPs and the MVP program, see the Microsoft MVP Site and MVPs.org.

Top of Page Top of Page

 
 
Applies to:
Word 2010