About optical character recognition (OCR)

Optical character recognition (OCR) translates images of text, such as scanned documents, into actual text characters. Also known as text recognition, OCR makes it possible to edit and reuse the text that is normally locked inside scanned images. OCR works using a form of artificial intelligence known as pattern recognition to identify individual text characters on a page, including punctuation marks, spaces, and ends of lines.

OCR can be performed in three scenarios:

ShowSetting OCR options

In each of the three OCR scenarios, you can control the following settings. You must select these options in the Preset Options dialog box before you scan or perform a manual OCR operation.

  • Auto rotate If the paper was scanned sideways or upside down, selecting Auto rotate correctly repositions the page. Keeping this option turned on helps to maximize OCR accuracy. When indexing text, Auto rotate allows text to be read in all orientations, but no changes are made to the file.
  • Auto straighten If the paper was scanned slightly out of alignment, selecting Auto straighten correctly repositions the page. Keeping this option turned on helps to maximize OCR accuracy. This option is not applicable to indexing.
  • OCR language By selecting a different language from this list, you can scan a document in a language other than your computer's default language.

ShowAdjust settings for automatic OCR

  1. On the File menu, click Scan New Document, and then click Preset options.
  2. Select either Create new preset or Edit selected preset.
  3. Change OCR options on the Processing tab of the Preset Options dialog box.

ShowAdjust settings for manual OCR

  1. On the Tools menu, click Options, and then click the OCR tab.
  2. Change OCR options under Manual OCR Settings.

ShowAdjust settings for indexing

  1. On the Tools menu, click Options, and then click the OCR tab.
  2. Click Indexing Service and change OCR options in the Indexing Service dialog box.

ShowOptimizing OCR accuracy

The accuracy of text recognition depends on several factors, including the type of document scanned and the scanner settings used. You can choose scanning presets to help improve OCR accuracy.

 Note   Documents scanned at resolutions greater than 900 dots per inch (dpi) will be converted to 900 dpi if Auto rotate or Auto straighten is selected.

To select a preset, click Scan New Document on the File menu and then select a preset from the list. The following table lists suggestions for optimizing OCR when scanning different document types.

Original document Scanner preset

Black or colored text on colored paper, or documents with busy backgrounds

Photographs or artwork

Colored text on white paper, or continuous-tone black and white images (such as photographs)

Black and white from color page
Black text on white paper, or line art Black and white

You can also create or edit presets. In the Microsoft Office Document Scanning dialog box, click Preset options, and then click either Create new preset or Edit selected preset.

ShowTip

For maximum OCR accuracy, make sure that the Auto rotate and Auto straighten check boxes are selected on the Processing tab of the Preset Options dialog box. This is the default setting.

If you are using manual OCR, select the Auto rotate and Auto straighten check boxes on the OCR tab in the Options dialog box.

For maximum OCR accuracy when indexing, click Options on the Tools menu, click the OCR tab, and then click Indexing Service to adjust these settings.

ShowSelecting text

After OCR has been performed on a document, you can select text in the page pane (page pane: Displays the page that's selected in the thumbnail pane in a larger size, and enables portions of the page to be selected and copied.) or in Reading view (Reading view: Presents a full-screen view of a single page or portion of a page for easy on-screen reading.) by dragging, just as you can in other Microsoft Office programs. When you drag the mouse to select text, only whole words are selected. If you drag across the middle of a word, the entire word is selected, even if it falls partially outside the selection rectangle.

Drag the mouse to select text

The selected text is actually contained in a hidden layer, separate from the document image. You can select and copy text in this layer just as you can in a text document.