Choose text encoding when you open and save files

Typically, you can share text files without worrying about the underlying details of how the text is stored. However, if you share text files with people who work in other languages, download text files across the Internet, or share text files with other computer systems, you may need to choose an encoding standard when you open or save a file.

When you or someone else opens a text file in Microsoft Office Word or in another program — perhaps on a computer that has system software in a language that is different from the language that was used to create the file — the encoding standard helps that program determine how to represent the text so that it is readable.

What do you want to do?


Understand text encoding

What appears to you as text on the screen is actually stored as numeric values in the text file. Your computer translates the numeric values into visible characters. It does this is by using an encoding standard.

An encoding standard is a numbering scheme that assigns each text character in a character set to a numeric value. A character set can include alphabetical characters, numbers, and other symbols. Different languages commonly consist of different sets of characters, so many different encoding standards exist to represent the character sets that are used in different languages.

Different encoding standards for different alphabets

The encoding standard that is saved with a text file provides the information that your computer needs to display the text on the screen. For example, in the Cyrillic (Windows) encoding, the character Й has the numeric value 201. When you open a file that contains this character on a computer that uses the Cyrillic (Windows) encoding, the computer reads the 201 numeric value and displays Й on the screen.

However, if you open the same file on a computer that uses a different encoding, the computer displays whatever character corresponds to the 201 numeric value in the encoding standard that the computer uses by default. For example, if your computer uses the Western European (Windows) encoding standard, the character in the original Cyrillic-based file will be displayed as É rather than Й because in Western European (Windows) encoding, the value 201 maps to É.

Unicode: One encoding standard for many alphabets

To avoid problems with encoding and decoding text files, you can save files with Unicode encoding. Unicode accommodates most characters sets across all the languages that are commonly used among computer users today.

Because Word is based on Unicode, Word automatically saves files encoded as Unicode. You can open and read Unicode-encoded files on your English-language computer system regardless of the language of the text. Likewise, when you use your English-language system to save files encoded as Unicode, the file can include characters not found in Western European alphabets, such as Greek, Cyrillic, Arabic, or Japanese characters.

Top of Page Top of Page

Choose an encoding standard when you open a file

If, when you open a file, text appears garbled or as question marks or boxes, Microsoft Office Word may not have accurately detected the encoding standard of text in the file. You can specify the encoding standard to use to display (decode) the text:

  1. Click the Microsoft Office Button Button image, and then click Word Options.
  1. Click Advanced.
  2. Scroll to the General section, and then select the Confirm file format conversion on open check box.

 Note   When this check box is selected, Word displays the Convert File dialog box every time you open a file in a format other than a Word format (Word formats include .doc, .dot, .docx, .docm, .dotx, or .dotm files). If you frequently work with such files but you rarely want to choose an encoding standard, remember to switch this option off to prevent having this dialog box open unnecessarily.

  1. Close and then reopen the file.
  2. In the Convert File dialog box, select Encoded Text.
  3. In the File Conversion dialog box, select Other encoding, and then select the encoding standard that you want from the list.

You can preview the text in the Preview area to check whether all the text is readable in the encoding standard that you selected.

If almost all the text looks the same (for example, all boxes or all dots), the font required for displaying the characters may not be installed. If the font that you need is not available, you can install additional fonts.

ShowHow?

  1. In Microsoft Windows, click the Start button, and then click Control Panel.
  2. Do one of the following:

ShowIn Windows Vista

  1. In Control Panel, click Uninstall a program.
  2. In the list of programs, click the listing for Microsoft Office or Microsoft Word, depending on whether you installed Word as part of Office or as an individual program, and then click Change.

ShowIn Microsoft Windows XP

  1. In Control Panel, click Add or Remove Programs.
  2. In the Currently installed programs box, click the listing for Microsoft Office or Microsoft Word, depending on whether you installed Word as part of Office or as an individual program, and then click Change.
  1. Under Change your installation of Microsoft Office, click Add or Remove Features, and then click Continue.
  2. Under Installation Options, expand Office Shared Features, and then expand International Support.
  3. Select the font set that you need, click the arrow next to your selection, and then select Run from My Computer.

 Tip   When you open an encoded text file, Word applies the fonts that are defined in the Web Options dialog box. (To reach the Web Options dialog box, click the Microsoft Office Button, click Word Options, and then click Advanced. In the General section, click Web Options.) You can select the options on the Fonts tab in the Web Options dialog box to customize the font for each character set.

Top of Page Top of Page

Choose an encoding standard when you save a file

If you do not choose an encoding standard when you save a file, Word encodes the file as Unicode. Usually, you can use the default Unicode encoding, because it supports most characters in most languages.

If your document will be opened in a program that does not support Unicode, you can choose an encoding standard that matches that of the target program. For example, Unicode enables you to create a Traditional Chinese language document on your English-language system. However, if the document will be opened in a Traditional Chinese language program that does not support Unicode, you can save the document with Chinese Traditional (Big5) encoding. When the document is opened in the Traditional Chinese language program, all the text is displayed properly.

 Note   Because Unicode is the most comprehensive standard, saving text in any other encoding may result in some characters that can no longer be displayed. For example, a document encoded in Unicode can contain Hebrew and Cyrillic text. If this document is saved with Cyrillic (Windows) encoding, the Hebrew text can no longer be displayed, and if the document is saved with Hebrew (Windows) encoding, the Cyrillic text can no longer be displayed.

If you choose an encoding standard that doesn't support the characters that you used in the file, Word marks in red the characters that it cannot save. You can preview the text in the encoding standard that you choose before you save the file.

Text formatted in the Symbol font or in field codes is removed from the file when you save a file as encoded text.

Choose an encoding standard

  1. Click the Microsoft Office Button Button image, and then click Save As.

If you want to save the file in a different folder, locate and open the folder.

  1. In the File name box, type a new name for the file.
  2. In the Save as type box, select Plain Text.
  3. Click Save.
  4. If the Microsoft Office Word Compatibility Checker dialog box appears, click Continue.
  5. In the File Conversion dialog box, select the option for the encoding standard that you want to use:
    • To use the default encoding standard for your system, click Windows (Default).
    • To use the MS-DOS encoding standard, click MS-DOS.
    • To choose a specific encoding standard, click Other encoding, and then select the encoding standard that you want from the list. You can preview the text in the Preview area to check whether all the text is readable in the encoding standard that you selected.

 Note   You can resize the File Conversion dialog box so that you can preview more of your document.

  1. If you receive a message that states, "Text marked in red will not save correctly in the chosen encoding," you can try to choose a different encoding, or you can select the Allow character substitution check box.

When you allow character substitution, Word replaces a character that cannot be displayed with the closest equivalent character in the encoding that you chose. For example, three dots replace an ellipsis, and straight quotation marks replace curly quotation marks.

If the encoding that you chose has no equivalent character for a character marked in red, the character marked in red will be saved as an out-of-context character, such as a question mark.

  1. If the document will be opened in a program that does not wrap text from one line to the next, you can include hard line breaks in the document by selecting the Insert line breaks check box and then specifying whether you want the line breaks to be delineated with a carriage return (CR), line feed (LF), or both, in the End lines with box.

Top of Page Top of Page

Look up encoding standards that are available in Word

Word recognizes a number of encoding standards, and it supports the encoding standards that are provided with the system software on your computer.

The following list of writing systems shows the encoding standards (also called code pages) that are associated with each writing system.

Writing system Encoding standards Font applied
Multilingual Unicode (UCS-2 little-endian and big-endian, UTF-8, UTF-7) Default font based on the Normal style for your language version of Word
Arabic Windows 1256, ASMO 708 Courier New
Simplified Chinese GB2312, GBK, EUC-CN, ISO-2022-CN, HZ SimSun
Traditional Chinese BIG5, EUC-TW, ISO-2022-TW MingLiU
Cyrillic Windows 1251, KOI8-R, KOI8-RU, ISO8859-5, DOS 866 Courier New
English, Western European, or other Latin script Windows 1250, 1252-1254, 1257, ISO8859-x Courier New
Greek Windows 1253 Courier New
Hebrew Windows 1255 Courier New
Japanese Shift-JIS, ISO-2022-JP (JIS), EUC-JP MS Mincho
Korean Wansung, Johab, ISO-2022-KR, EUC-KR Malgun Gothic
Thai Windows 874 Tahoma
Vietnamese Windows 1258 Courier New
Indic: Tamil ISCII 57004 Latha
Indic: Nepali ISCII 57002 (Devanagari) Mangal
Indic: Konkani ISCII 57002 (Devanagari) Mangal
Indic: Hindi ISCII 57002 (Devanagari) Mangal
Indic: Assamese ISCII 57006  
Indic: Bengali ISCII 57003  
Indic: Gujarati ISCII 57010  
Indic: Kannada ISCII 57008  
Indic: Malayalam ISCII 57009  
Indic: Oriya ISCII 57007  
Indic: Marathi ISCII 57002 (Devanagari)  
Indic: Punjabi ISCII 57011  
Indic: Sanskrit ISCII 57002 (Devanagari)  
Indic: Telugu ISCII 57005  

 Notes 

  • Use of Indic languages requires system support and the appropriate OpenType fonts.
  • Only limited support is available for Nepali, Assamese, Bengali, Gujarati, Malayalam, and Oriya.

Top of Page Top of Page

 
 
Applies to:
Word 2007