About indexing text in TIFF and MDI files

When Microsoft Office Document Imaging performs optical character recognition (OCR) (OCR: Translates images of text, such as scanned documents, into actual text characters. Also known as text recognition.) on a scanned document, the text is stored within a Tagged Image File Format (TIFF) (Tagged Image File Format (TIFF): A high-resolution, tag-based graphics format. TIFF is used for the universal interchange of digital graphics.) or Microsoft Document Imaging Format (MDI) (Microsoft Document Imaging Format (MDI): A high resolution, tag-based graphics format, based on the Tagged Image File Format (TIFF) used for digital graphics.) file when you save it.

This text is available not only when you open the document in Office Document Imaging, but also when you search for files using other Microsoft Office programs, or when you use Microsoft Windows search features.

The indexing service

Indexing is a special service that enables fast file searches on your computer. Text found in files on your computer is added to the index, which also stores a reference to the file where the text was found.

Without indexing, you can search for words only in TIFF or MDI files that have had OCR performed on them — either automatically at scan time, or manually from the Tools menu. With indexing, you can search for any TIFF or MDI file based on the words it contains.

The indexing service is part of the operating system in both Windows 2000 and Windows XP and the resulting index is used by the search features in both Windows and Office. For information about controlling the indexing service in Windows, see Microsoft Windows Help.

If the indexing service is turned off, only the following text is available in TIFF and MDI files that have not had OCR performed on them: the file name itself and any file properties that might be available.

Indexing TIFF files without embedded OCR information

OCR is automatically performed on other TIFF and MDI files that exist on your computer, making the text available to the indexing service for file searches. In this case, the OCR text is stored only in the index, not within the TIFF or MDI files. This process takes several seconds per TIFF or MDI file encountered.

To turn off automatic indexing

  1. Click Options on the Tools menu, and then click the OCR tab.
  2. Click Indexing Service, and then clear the Use OCR to recognize the text in TIFF and MDI files when indexing check box.

Indexing in other languages

If you want to index documents in languages other than your computer's default language, you can select a dictionary from the OCR Language list in the Indexing Service dialog box.