Microsoft Office HTML Filter selectively removes Office-specific markup that is created when Microsoft Word 2000 or Microsoft Excel 2000 is used to create Web pages. Office-specific markup does not usually affect the browser display of the HTML file; rather, it stores the Office document settings that are used when the HTML document is opened in Office. Download HTML Filter 2.0 for Office 2000 self-extracting executable file from Microsoft Office Online.
Removing Office-specific markup from an HTML file affects your ability to edit some Office features when you edit the Web page in an Office program. The following table lists the markup that Office HTML Filter version 2.0 removes and provides some examples of the Office features that are affected after you use the filter.
| Office-specific markup removed |
Examples of what is affected |
Downlevel conditional comments content within
<!--[
and
]-->
Example
<!--[if gte mso 9]>...<![endif]-->
|
WordArt cannot be edited, only the resulting static image is left. These comments make some HTML markup invisible to browsers earlier than Microsoft Internet Explorer 5.
For example, Office inserts XML blocks containing WordArt document properties inside these comments so that the contents of these XML elements do not show up as text in browsers earlier than Internet Explorer 5.
|
Uplevel conditional comments within
<![
and
]>
Example
<! [if !vml]>
|
These comments make some HTML markup visible in browsers earlier than Internet Explorer 5 but invisible in Internet Explorer 5 or later. When the comments are removed, the markup indicating that static images should not be loaded in Internet Explorer 5 or later is lost.
For example, WordArt is saved as HTML in two parts. One part is an XML block that describes the image. The other part is an actual image that makes the picture visible in older browsers that don't interpret XML. The static image is put inside uplevel comments to hide it from Internet Explorer 5 or later.
|
| XML tags in the "o", "v", "w", "x", and "p" namespaces
Example
<o:p></o:p>
|
Paragraph mark formatting (if different from the paragraph) is lost. The <o:p></o:p> tags represent the character that Word treats as the paragraph mark. |
| @-rule definitions
Example
@page Section1
{size: 8.5in 11in }
|
Page settings, such as page dimensions and orientation, are lost:
- @page contains document page setup information
- @font-face contains document font definitions
- @list contains Office-specific bulleted and numbered list styles definitions
To keep standard @ rule defintions, @page and @font-face, use the -a switch at the command prompt.
|
| CSS comments containing /* and */
Example
/* List Definitions */
|
Minimal impact on HTML document. |
| VML attributes, or any attribute with a colon ( : ) in the attribute name
Example
v:shapes="_x000_i1025"
|
WordArt, clip art, and AutoShapes cannot be edited; only the resulting static image is left. |
ProgID
<meta>
tags
Example
<meta name=ProgID content=Word.Document>
|
Minimal impact on HTML document. ProgID identifies the application the file was created in.
You can also remove GENERATOR and ORIGINATOR META tags, which contain the information about the HTML document's originating program (for example, Word or Excel) and the latest generating program (Office HTML Filter). To remove the GENERATOR and ORIGINATOR META tags, use the -m switch at the command prompt.
|
Link elements with the rel attribute set to any of the following:
- "file-list"
- "edit-time-data"
- "ole-object-data"
- "original-file"
- "preview"
Example
<link rel=File-Listhref="./mydoc_files/filelist.xml">
|
The association with all the special extra files that contain Office-specific data, such as OLE object binaries, is lost. |
| The following XML namespace declarations - that is, the xmlns attribute setting:
Example
xmlns:v="urn:schemas-microsoft-com:vml"
|
The ability to render WordArt and clip art as vector images in the browser is lost. Instead, they become static images.
To keep VML in the file, use the -v switch at the command prompt.
If either -o or -v is used at the command prompt, the XML namespace declarations remain in the file.
|
| Empty style attributes, especially when they become empty as a result of processing their values
Example
style=""
|
Minimal impact on HTML document. |
| "mso-" prefix properties
Example
mso-margin-top-alt: 12pt;
|
Office-specific formatting that stores Office document settings, which are are used when the HTML document is opened in Office. Some features, such as footnotes and customized bullet and numbering are lost. Word legacy frames become tables, and some edit-time language and font-formatting information is lost.
To keep mso- prefix properties and other Office-specific properties, use the -o switch at the command prompt.
|
Other non-standard properties such as:
- "tab-stops"
- "tab-interval"
- "language"
- "text-underline"
- "text-effect"
- "text-line-through"
- "font-color"
- "horiz-align"
- "list-image-1"
- "list-image-2"
- "list-image-3"
- "separator-image"
- "table-border-color-dark"
- "table-border-color-light"
- "vert-align"
- "vnd.ms-excel.numberformat"
Example
tab-interval: .5in;
|
Tab settings are lost. All text underline styles become single underline. All underline colors become black. Engraved text and embossed text are lost. |
| Empty inline HTML elements: FONT, EM, STRONG, SAMP, ACRONYM, CITE, CODE, DFN, KBD, TT, B, I, U, S, SUB, SUP, INS, DEL, VAR, SPAN. An element is considered empty if it contains no displayable contents.
Example
<FONT COLOR=blue><B></B></FONT>
|
No impact on the display of the HTML document. |