Extensible Markup Language, or XML, is a markup language defined by the XML Working Group of the World Wide Web Consortium (W3C) (World Wide Web Consortium (W3C): A consortium of commercial and educational institutions that oversees research and promotes standards in all areas related to the World Wide Web.). XML is similar to Hypertext Markup Language (HTML) (HTML: The standard markup language used for documents on the World Wide Web. HTML uses tags to indicate how Web browsers should display page elements such as text and graphics and how to respond to user actions.) in that it is a tag-based language specifically designed for delivering information on the Web. However, XML is different from HTML in that the tags that it uses are not predefined. Instead, the W3C XML recommendation specifies a set of rules that must be followed so that you can create your own meaningful set of tags.
You can create your own tags to use within an XML document by following a few simple rules:
An XML document can contain only one root element The root element of an XML document is a single element that contains all of the content that is considered to be part of the document itself. The root element is the first element to appear after the document's prolog section. The root element is also known as the document element.
All XML elements must contain end tags While end tags are optional with certain HTML document elements, all elements in an XML document must have an end tag.
Element start and end tag names must be identical XML is case-sensitive, so the name of an end tag must exactly match the name of its accompanying start tag.
XML elements cannot overlap If the start tag for an element appears within another element, it must end within the same containing element.
All attribute values must use quotation marks Attribute values must be enclosed in either single or double quotation marks.
You cannot use the following characters within the text of an XML document: < > & These are special characters that have a specific meaning for XML parsers. If you need to use these characters in the text of your XML document, you should use predefined character or entity references.
Following these rules will ensure that your XML document is well-formed (well-formed XML: An XML document that meets the minimal criteria for a conforming XML document.), which means that it adheres to XML syntax as set forth by the W3C recommendation. XML documents are considered to have valid XML (valid XML: A well-formed XML document that conforms to a specific set of constraints, usually defined in an XML Schema.) if they use an XML Schema to constrain the type of data that can be used in the XML document.
Structure of XML documents
XML documents consist of two primary parts: a prolog and a root element. XML documents can also contain comments.
The prolog is the first section of an XML document. It contains the XML declaration, which states that the document is an XML document; processing instructions (processing instructions: Information stored in the prologue of an XML document. This information is passed through the XML parser to any application that uses the XML document.), which provide information that XML parsers use to determine how to handle the document; and schema declarations, which determine the XML Schemas (XML Schema: A formal specification, written in XML, that defines the structure of an XML document, including element names and rich data types, which elements can appear in combination, and which attributes are available for each element.) that should be used to verify that the document is valid. The following is an example of a prolog in an XML document:
<?xml version="1.0" encoding="UTF-8"?>
The root element is the main section of an XML document. It contains the document's data, along with the information that describes the structure of the data. The following is an example of the root element section in an XML document:
Information in the root element is stored in two types of XML constructs: elements and attributes. All the elements and attributes used in an XML document are nested within the root element.
Elements Elements are the primary building blocks of an XML document. They are used to represent both the structure of the XML document and the data that is contained in the XML document. Elements contain a start tag, content, and an end tag; because XML is case-sensitive, the start and end tags must match exactly. The following is an example of a simple Employee element that describes the name of an employee. The Employee element is nested within a root element named Employees:
Elements can contain text, other elements, character references, or character data sections. Elements that have no content are called empty elements. The start and end tags of an empty element can be combined into one tag, as shown in the following example:
Attributes Attributes are XML constructs that use a name-value pair that is associated with a particular element. They contain information about the element's content that is not necessarily intended to be displayed, but is instead used to describe some property of the element. Attribute values are enclosed in either single or double quotation marks, separated from the name of the attribute by an equal sign, and enclosed within the element's start tag. The following is an example of an EmployeeNumber attribute that is associated with a Name element:
<Name EmployeeNumber="10101">Patricia Doyle</Name>
XML documents can also contain comments. Comments are not processed by the XML parser but are used to provide meaningful documentation in the XML source of the document. Comments begin with
and end with
. The text between these characters is ignored by the XML parser. The following is an example of a comment in an XML document:
<!-- This XML document contains employee information. -->
<Name EmployeeNumber="10101">Patricia Doyle</Name>