Taking Advantage of Unicode Support in Office 2000

Planning an International Move to Office 2000

In a code page – based environment, each script has its own table of characters. Documents based on the code page of one operating system rarely travel well to an operating system that uses another code page. In some cases, the documents cannot contain text that uses characters from more than one script.

For example, if a user running the English version of Windows 95 with the Latin code page opens a plain text file created in the Japanese version of Windows 95, the code points of the Japanese code page are mapped to unexpected or nonexistent characters in the Western script, and the resulting text is unintelligible.

The universal character set provided by Unicode overcomes this problem. Office 97 was the first version of Office to support Unicode in all applications except Microsoft Access and Microsoft Outlook®. In Office 2000, Access and Microsoft Publisher gain Unicode support. Microsoft FrontPage® 2000 also supports Unicode on Web pages, but text typed into dialog boxes and other elements of the user interface are limited to characters defined by the user’s code page.


 Note    Outlook 2000 supports Unicode in the body of mail messages. However, Outlook data — such as Contacts, Tasks, and the To and Subject lines of messages — are limited to characters defined by the user’s code page.


Office 2000 also provides the conversion tables necessary to convert code page – based data to Unicode and back again for interaction with previous applications. Because Office 2000 provides fonts to support many languages, users can create multilingual documents with text from multiple scripts.

Unicode support in Office 2000 means that users can copy multilingual text from most Office 97 documents and paste it into any Office 2000 document, and the text is displayed correctly. Conversely, multilingual text copied from any Office 2000 document can be pasted into a document created in any Office 97 application (except Access).

In addition to document text, Office 2000 supports Unicode in other areas, including document properties, bookmarks, style names, footnotes, and user information. Unicode support in Office 2000 also means that you can edit and display multilingual text in dialog boxes. For example, you can search for a file by a Greek author’s name in the Open dialog box.


 Note    Windows NT 4.0 and Windows 2000 provide full support for Unicode. Some support is provided in Windows 95/98.


Using Unicode values in Visual Basic for Applications

The Microsoft Visual Basic® environment does not support Unicode. Only text supported by the operating system can be used in the Visual Basic Editor or displayed in custom dialog boxes or message boxes.

You can use the ChrW() function to manipulate text outside the code page. The ChrW() function accepts a number that represents the Unicode value of a character and returns that character string.

Using ASCII characters in shared file names

In Windows 95/98, Unicode characters in file names are not supported, but they are supported in Windows NT and Windows 2000. In Windows 95/98, file names must use characters that exist in the code page of the operating system.

If users in your organization share files between language versions of Windows, they can use ASCII characters (unaccented Latin script) to ensure that the file names can be used in any language version of the operating system.

Printing and displaying Unicode text

Not all printers can print characters from more than one code page. In particular, printers that have built-in fonts might not have characters for other scripts in those fonts. Also, new characters such as the euro currency symbol might be missing from a particular font.

Although the Office applications contain many workarounds to enable printing on such printers, it is not possible in all cases. If text is not printing correctly, updating the printer driver might fix the problem. If the latest driver doesn’t fix the problem, you can create a registry entry that works around the printing problems of most printers; the printing quality, however, might be lowered.

To set the registry so that extended characters are printed correctly

  1. Go to the following registry subkey:

HKEY_CURRENT_USER\Software\Microsoft\Office\9.0\Word\Options

  1. Add a new entry to the subkey, consisting of the value name NoWideTextPrinting and the binary value data of 1.

In addition to printers, not all video display drivers support Unicode. Even when your text prints correctly, it might not display correctly on the screen. If your documents are displaying unintelligible characters, upgrade to a display driver that supports Unicode.

Compressing files that contain Unicode text

Unicode characters are encoded in two bytes rather than a single byte, or in a mixture of one and two bytes in some Asian languages. Generally, Office 2000 files with multilingual text are 30 to 50 percent larger than files created in previous, non-Unicode versions of Office.


 Note    If a file contains text from only English or Western European languages, there is little or no increase in file size because Office 2000 applications can compress the text.


When Microsoft Word 2000 users open and save an English or Western European file from a previous, non-Unicode version of Word, Word converts the contents to Unicode. The first time the file is saved, Word analyzes the file and notes regions that can be compressed, resulting in a file that is temporarily twice the size of the original file. The next time the file is saved, Word performs the compression, and file size returns to normal.

For Microsoft PowerPoint® files, text is typically a small percentage of file size, so Unicode does not significantly increase file size. In fact, PowerPoint 2000 employs the same graphics compression used in PowerPoint 97, so PowerPoint 2000 files are smaller than PowerPoint 95 files of equivalent content.

Copying multilingual text

You can use the Clipboard to copy multilingual text from one Office application to another. Text in RTF, HTML, and Unicode formats can successfully be pasted into Office applications

Multilingual text in RTF, HTML, and Unicode

When you copy text from an Office 2000 document, the RTF or HTML formatting data, as well as the Unicode text data, is stored on the Clipboard. This allows applications that do not support Unicode to use font information to identify a code page for interpreting the content. For example, both Word 95 and Word 6.0 accept Word 2000 text from the Clipboard as RTF format.

All language versions of Word 95 and Word 6.0 can display text in most European languages. However, Asian and right-to-left language versions cannot display other Asian or right-to-left languages.

Word 97 can accept RTF and Unicode text from the Clipboard and display content in all European and most Asian languages.

Microsoft Access 2000 and Microsoft Excel 2000 support copying multilingual Unicode, RTF, or HTML text to the Clipboard. However, Access and Excel cannot accept RTF content, but they can accept HTML-formatted text or Unicode text from the Clipboard.

Multilingual code page – based single-byte text

If users paste single-byte (ANSI) text into an Office 2000 document from a code page that is different from the one their operating system uses, they are likely to get unintelligible characters in their text. This problem occurs because Office cannot determine which code page to use to interpret the single-byte text.

For example, you might paste text from a non-Unicode text editor that uses fonts to indicate which code page to use. If the text editor supplies only RTF and single-byte text, the font (and code page) information is lost when the text is pasted in an application that does not accept RTF. Instead, the application uses the operating system’s code page, which maps some characters’ code points to unexpected or nonexistent characters.

 
 
Applies to:
Deployment Center 2003