Microsoft Office Online
Sign in to My Office Online (What's this?) | Sign in

 
 
Help and How-to
Search
Search
 
Check for updates: (c) Microsoft
Microsoft Update
 
 
 
Warning: You are viewing this page with an unsupported Web browser. This Web site works best with Microsoft Internet Explorer 6.0 or later, Firefox 1.5, or Netscape Navigator 8.0 or later. Learn more about supported browsers.

Email this linkEmail this link Printer-Friendly VersionPrinter-Friendly Version Bookmark and ShareShare
Using Office HTML Filter to remove Office-specific markup
 
Applies to
Microsoft Outlook® 2000

Microsoft Office HTML Filter selectively removes Office-specific markup that is created when Microsoft Word 2000 or Microsoft Excel 2000 is used to create Web pages. Office-specific markup does not usually affect the browser display of the HTML file; rather, it stores the Office document settings that are used when the HTML document is opened in Office. Download HTML Filter 2.0 for Office 2000 self-extracting executable file from Microsoft Office Online.

What happens when you use Office HTML Filter

Removing Office-specific markup from an HTML file affects your ability to edit some Office features when you edit the Web page in an Office program. The following table lists the markup that Office HTML Filter version 2.0 removes and provides some examples of the Office features that are affected after you use the filter.

Office-specific markup removed Examples of what is affected
Downlevel conditional comments content within
<!--[
and
]-->

Example

<!--[if gte mso 9]>...<![endif]-->
WordArt cannot be edited, only the resulting static image is left. These comments make some HTML markup invisible to browsers earlier than Microsoft Internet Explorer 5.

For example, Office inserts XML blocks containing WordArt document properties inside these comments so that the contents of these XML elements do not show up as text in browsers earlier than Internet Explorer 5.

Uplevel conditional comments within

<![
and

]>

Example


<! [if !vml]>
These comments make some HTML markup visible in browsers earlier than Internet Explorer 5 but invisible in Internet Explorer 5 or later. When the comments are removed, the markup indicating that static images should not be loaded in Internet Explorer 5 or later is lost.

For example, WordArt is saved as HTML in two parts. One part is an XML block that describes the image. The other part is an actual image that makes the picture visible in older browsers that don't interpret XML. The static image is put inside uplevel comments to hide it from Internet Explorer 5 or later.

XML tags in the "o", "v", "w", "x", and "p" namespaces

Example


<o:p></o:p>
Paragraph mark formatting (if different from the paragraph) is lost. The
<o:p></o:p>
tags represent the character that Word treats as the paragraph mark.
@-rule definitions

Example


@page Section1
               {size: 8.5in 11in }
Page settings, such as page dimensions and orientation, are lost:
  • @page contains document page setup information
  • @font-face contains document font definitions
  • @list contains Office-specific bulleted and numbered list styles definitions

To keep standard @ rule defintions, @page and @font-face, use the -a switch at the command prompt.

CSS comments containing /* and */

Example


/* List Definitions */
Minimal impact on HTML document.
VML attributes, or any attribute with a colon ( : ) in the attribute name

Example


v:shapes="_x000_i1025"
WordArt, clip art, and AutoShapes cannot be edited; only the resulting static image is left.
ProgID

<meta>
tags

Example


<meta name=ProgID content=Word.Document>
Minimal impact on HTML document. ProgID identifies the application the file was created in.

You can also remove GENERATOR and ORIGINATOR META tags, which contain the information about the HTML document's originating program (for example, Word or Excel) and the latest generating program (Office HTML Filter). To remove the GENERATOR and ORIGINATOR META tags, use the -m switch at the command prompt.

Link elements with the rel attribute set to any of the following:
  • "file-list"
  • "edit-time-data"
  • "ole-object-data"
  • "original-file"
  • "preview"

Example

<link rel=File-Listhref="./mydoc_files/filelist.xml">
The association with all the special extra files that contain Office-specific data, such as OLE object binaries, is lost.
The following XML namespace declarations - that is, the xmlns attribute setting:
  • "o"
  • "w"
  • "x"
  • "p"
  • "v"

Example


xmlns:v="urn:schemas-microsoft-com:vml"
The ability to render WordArt and clip art as vector images in the browser is lost. Instead, they become static images.

To keep VML in the file, use the -v switch at the command prompt.

If either -o or -v is used at the command prompt, the XML namespace declarations remain in the file.

Empty style attributes, especially when they become empty as a result of processing their values

Example


style=""
Minimal impact on HTML document.
"mso-" prefix properties

Example


mso-margin-top-alt: 12pt;
Office-specific formatting that stores Office document settings, which are are used when the HTML document is opened in Office. Some features, such as footnotes and customized bullet and numbering are lost. Word legacy frames become tables, and some edit-time language and font-formatting information is lost.

To keep mso- prefix properties and other Office-specific properties, use the -o switch at the command prompt.

Other non-standard properties such as:
  • "tab-stops"
  • "tab-interval"
  • "language"
  • "text-underline"
  • "text-effect"
  • "text-line-through"
  • "font-color"
  • "horiz-align"
  • "list-image-1"
  • "list-image-2"
  • "list-image-3"
  • "separator-image"
  • "table-border-color-dark"
  • "table-border-color-light"
  • "vert-align"
  • "vnd.ms-excel.numberformat"

Example


tab-interval: .5in;
Tab settings are lost. All text underline styles become single underline. All underline colors become black. Engraved text and embossed text are lost.
Empty inline HTML elements: FONT, EM, STRONG, SAMP, ACRONYM, CITE, CODE, DFN, KBD, TT, B, I, U, S, SUB, SUP, INS, DEL, VAR, SPAN. An element is considered empty if it contains no displayable contents.

Example


<FONT COLOR=blue><B></B></FONT>
No impact on the display of the HTML document.
Get Office 2007
Get Office 2007
advertisement