When Microsoft Office Document Imaging performs optical character recognition (OCR) (OCR: Translates images of text, such as scanned documents, into actual text characters. Also known as text recognition.) on a scanned document,
the text is stored within a Tagged Image File Format
(TIFF) (Tagged Image File Format (TIFF): A high-resolution, tag-based graphics format. TIFF is used for the universal interchange of digital graphics.)
or Microsoft Document Imaging
Format (MDI) (Microsoft Document Imaging Format (MDI): A high resolution, tag-based graphics format, based on the Tagged Image File Format (TIFF) used for digital graphics.) file when you save it.
This text is available not only when you open the document in Office
Document Imaging, but also when you search for files using other Microsoft
Office programs, or when you use Microsoft Windows search features.
The indexing service
Indexing is a special service that enables fast file searches on your
computer. Text found in files on your computer is added to the index, which
also stores a reference to the file where the text was found.
Without indexing, you can search for words only in TIFF
or MDI files that have had OCR performed on them —
either automatically at scan time, or manually from the
Tools
menu. With indexing, you can search for any TIFF or
MDI file based on the words it contains.
The indexing service is
part of the operating system in both Windows 2000 and Windows XP and the
resulting index is used by the search features in both Windows and Office. For information about controlling the indexing service in Windows, see Microsoft Windows Help.
If the indexing service is turned off, only the following text is
available in TIFF
and MDI files that have not had OCR performed on them:
the file name itself and any file properties that might be available.
Indexing
TIFF files without embedded OCR information
OCR is automatically performed on other TIFF and
MDI files that exist on your computer, making the text
available to the indexing service for file searches. In this case, the OCR text
is stored only in the index, not within the TIFF or
MDI files. This process takes several seconds per TIFF
or MDI file encountered.
To turn off automatic indexing
- Click
Options on the
Tools
menu, and then click the
OCR tab.
- Click Indexing Service,
and then clear the
Use OCR to
recognize the text in TIFF
and MDI files when indexing check box.
Indexing in other
languages
If you want to index documents in languages other than your
computer's default language, you can select a dictionary from the
OCR Language list in the
Indexing Service dialog box.