|Microsoft Office 2003
The Microsoft Office Document Imaging program has two components—a scanning component and an imaging component. The scanning component controls the scanning of documents into your computer by using any installed scanner. Optical character recognition (OCR) (OCR: Translates images of text, such as scanned documents, into actual text characters. Also known as text recognition.) is automatically performed on documents immediately after scanning, and you can easily scan multiple pages into a single file. The imaging component makes it easy to view and work with scanned documents on the screen.
Scanner drivers: Twain drivers and WIA drivers
Most scanners use TWAIN drivers to communicate status and settings between the scanner and the computer. In Microsoft Windows® XP and later, scanners may also include Windows Image Acquisition (WIA) drivers. Microsoft Office Document Imaging uses these drivers to determine your scanner's capabilities and to obtain the image from the scanner.
Document Imaging includes a database of recognized scanners and drivers. If Document Imaging recognizes your scanner, you can choose to display your scanner driver dialog box each time you scan and adjust the scanning settings yourself, or you can let Document Imaging manage the scanning process for you.
WIA drivers may not work for scanners that have automatic document feeders. If your scanner has an automatic document feeder, you should install and use the TWAIN driver for best results.
Some scanners use a TWAIN emulation driver. Document Imaging does not work with emulation drivers. If your scanner uses a TWAIN emulation driver, contact the manufacturer for the most recent version of a compatible TWAIN or WIA driver.
Scanned file formats and compression
Document Imaging stores scanned images in Microsoft Document Imaging Format (MDI) or Tagged Image File Format (TIFF). The resulting file is larger than the same image stored in Joint Photographic Experts Group (JPEG) format. Both formats are capable of storing text recognized by OCR along with images.
Microsoft Document Imaging Format (MDI)
The MDI file format is based on the TIFF, which is designed to store images by page layout. In Document Imaging, you can open files and save scanned files in the MDI format as well as in TIFF.
In general, images that are saved in the MDI file format take up less disk space than the same images saved as TIFF files. The image quality of an image saved in the MDI file format is comparable to that of the same image saved in TIFF.
Tagged Image File Format (TIFF)
TIFF is a commonly used format for various imaging programs, including those that scan and fax. You can open and edit TIFF files created with Document Imaging by using many other graphics programs. Note that you can open or edit MDI files only in Microsoft Office Document Imaging 2003. If you want to share files with people who are using Microsoft Office Document Imaging 2002 or another document imaging program, save your files in TIFF format for compatibility.
Joint Photographic Experts Group (JPEG)
JPEG is the most common file format for Web graphics. JPEG files are compressed to a fraction of their original file size, so they can be downloaded more quickly. Because of the way JPEG files are compressed, there is a loss of detail and clarity. That loss of detail affects OCR quality, and it becomes much more difficult to get good OCR results. This is why Document Imaging files are stored in MDI or TIFF formats and not in JPEG format.
JPEG format is used to preserve images that you copy from Document Imaging to another program—including when you use the Copy or Copy Image command (Edit menu) or the Send Text To Word command (Tools menu). Document Imaging supports copying up to 2 megabytes (MB) in image file size. This means that if you copy a large, high-resolution (600 dpi) image in Document Imaging, the image fidelity of the pasted image may be lower to preserve the original image dimensions while keeping the file size within the 2 MB limit.
Tips about compression and OCR results
Document Imaging can be used to view standard TIFF files from other programs. Faxes received in e-mail or through a fax program on your computer are often stored as TIFF files. To use the text from a fax (for example, to copy the text into another program), you can open the TIFF file in Document Imaging and then run OCR by clicking Recognize Text Using OCR on the Tools menu. When you save an MDI or TIFF file in Document Imaging, it adds the OCR information to the file and compresses the original image. After you run OCR and save the file, the text within the fax becomes available for searching.
Best OCR results are obtained for scanned images that are around 300 dpi. Most scanned images are 300 dpi, and the default scanning presets in Document Imaging are set to 200 dpi or 300 dpi. In Document Imaging, you can scan a document at resolutions of up to 1,200 dpi. However, setting the resolution greater than 300 dpi can greatly increase scan time and file size, and the higher setting does not necessarily improve OCR results.
Adding pages to multiple-page scanned files
In Document Imaging, you can drag several files into a separate file to create one larger file. For example, you can scan all the receipts from your last business trip or a home building project and keep them in one file for easy reference. If you forget a receipt, you can easily add it to the file:
- In Document Imaging, open the file that contains all the previously scanned receipts.
- On the File menu, click Scan New Document, and then scan the additional receipt.
- On the Window menu, click Arrange All to show both the document with the newly scanned receipt and the document with the previously scanned receipts.
- In the thumbnail pane (thumbnail pane: Presents the current scanned document as one or more small images.) for the newly scanned receipt, select the thumbnail, and then drag it to the thumbnail pane for the document with the previously scanned receipts.
- Be sure to save the file that contains all of the receipts.
You can also use the thumbnail pane to rearrange the pages within a multiple-page file.