Open XML II: Editing Documents in the XML

The w: at the beginning of a tag identifies it as part of a Word document

Notice that every tag shown here begins with w: to identify the document part as belonging to a Word document. Tags for some object types that apply to more than one program, such as those for Office Art graphics, may have a different identifier. Additionally, tags in document parts for other file types, or in parts outside of the main document folder, may have different identifiers. For example, the identifier a: represents Office Art objects such as SmartArt™ graphics, cp: is used in the core document properties part, and p: is used in the main document folder of a PowerPoint file.

Let’s take a closer look at the paragraph markup that was shown on the preceding page.

  • The tag <w:pPr> translates to Word paragraph properties. This refers to formatting applied to the entire paragraph (such as the paragraph style Heading 1 that is indicated here).
  • The tag <w:r> refers to a run of content in a Word document. A run is a contiguous portion of content that shares the same properties. In this case, the run includes the text of the entire short paragraph. (Notice that the tag <w:t>, which means Word text, surrounds the paragraph text.)

If some words in the paragraph had different formatting, they would be separated into discrete runs.

Note     When a tag is not paired (that is, when it does not have a start tag and an end tag between which other content is nested), it’s known as a standalone tag. This is the case with the style tag Heading 1 shown here. Notice that a slash character appears at the end of the standalone tag.

