Skip Headers
Oracle® XML DB Developer's Guide
10g Release 2 (10.2)

Part Number B14259-02
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Index
Index
Go to Master Index
Master Index
Go to Feedback page
Contact Us

Go to previous page
Previous
Go to next page
Next
View PDF

B XPath and Namespace Primer

This appendix describes introductory information about the W3C XPath Recommendation, Namespace Recommendation, and the Information Set (infoset).

This appendix contains these topics:

Overview of the W3C XML Path Language (XPath) 1.0 Recommendation

XML Path Language (XPath) is a language for addressing parts of an XML document, designed to be used by both XSLT and XPointer. It can be used as a searching or query language as well as in hypertext linking. Parts of this brief XPath primer are extracted from the W3C XPath Recommendation.

XPath also facilities the manipulation of string, number, and Boolean values.

XPath uses a compact syntax that is not XML syntax to facilitate the use of XPath expressions in URIs and XML attribute values. XPath operates on the abstract, logical structure of an XML document, rather than its surface syntax. It gets its name from its use of a path notation as in URLs for navigating through the hierarchical structure of an XML document.

In addition to its use for addressing, XPath is also designed so that it has a natural subset that can be used for matching, that is, testing whether or not a node matches a pattern. This use of XPath is described in the W3C XSLT Recommendation.

Note:

In this release, Oracle XML DB supports a subset of the XPath 1.0 Recommendation. It does not support XPath values that return Boolean, number, or string values. However, Oracle XML DB does support these XPath types within predicates.

XPath Models an XML Document as a Tree of Nodes

XPath models an XML document as a tree of nodes. There are different types of nodes, including element nodes, attribute nodes, and text nodes. XPath defines a way to compute a string-value for each type of node. Some types of nodes also have names. XPath fully supports XML Namespaces. Thus, the name of a node is modeled as a pair consisting of a local part and a possibly null namespace URI; this is called an expanded-name. The data model is described in detail in "XPath 1.0 Data Model". A summary of XML Namespaces is provided in "Overview of the W3C Namespaces in XML Recommendation".

See Also:

XPath Expression

The primary syntactic construct in XPath is the expression. An expression matches the production Expr. An expression is evaluated to yield an object, which has one of the following four basic types:

Evaluating Expressions with Respect to a Context

Expression evaluation occurs with respect to a context. XSLT and XPointer specify how the context is determined for XPath expressions used in XSLT and XPointer respectively. The context consists of the following:

  • Node, the context node

  • Pair of nonzero positive integers, context position and context size. Context position is always less than or equal to the context size.

  • Set of variable bindings. These consist of a mapping from variable names to variable values. The value of a variable is an object, which can be of any of the types possible for the value of an expression, can also be of additional types not specified here.

  • Function library. This consists of a mapping from function names to functions. Each function takes zero or more arguments and returns a single result. See the XPath Recommendation for the core function library definition, that all XPath implementations must support. For a function in the core function library, arguments and result are of the four basic types:

    • Node Set functions

    • String Functions

    • Boolean functions

    • Number functions

    Both XSLT and XPointer extend XPath by defining additional functions; some of these functions operate on the four basic types; others operate on additional data types defined by XSLT and XPointer.

  • Set of namespace declarations in scope for the expression. These consist of a mapping from prefixes to namespace URIs.

Evaluating Subexpressions

The variable bindings, function library, and namespace declarations used to evaluate a subexpression are always the same as those used to evaluate the containing expression.

The context node, context position, and context size used to evaluate a subexpression are sometimes different from those used to evaluate the containing expression. Several kinds of expressions change the context node; only predicates change the context position and context size. When the evaluation of a kind of expression is described, it will always be explicitly stated if the context node, context position, and context size change for the evaluation of subexpressions; if nothing is said about the context node, context position, and context size, then they remain unchanged for the evaluation of subexpressions of that kind of expression.

XPath Expressions Often Occur in XML Attributes

The grammar specified here applies to the attribute value after XML 1.0 normalization. So, for example, if the grammar uses the character less-than (<), then this must not appear in the XML source as a less-than character, but must be quoted according to XML 1.0 rules by, for example, entering it as &lt;.

Within expressions, literal strings are delimited by single or double quotation marks, which are also used to delimit XML attributes. To avoid a quotation mark in an expression being interpreted by the XML processor as terminating the attribute value:

  • The quotation mark can be entered as a character reference (&quot; or &apos;)

  • The expression can use single quotation marks if the XML attribute is delimited with double quotation marks or vice-versa

Location Paths

One important kind of expression is a location path. A location path is the route to be taken. The route can consist of directions and several steps, each step being separated by a /.

A location path selects a set of nodes relative to the context node. The result of evaluating an expression that is a location path is the node-set containing the nodes selected by the location path.

Location paths can recursively contain expressions used to filter sets of nodes. A location path matches the production LocationPath.

Expressions are parsed by first dividing the character string to be parsed into tokens and then parsing the resulting sequence of tokens. Whitespace can be freely used between tokens.

Although location paths are not the most general grammatical construct in the XPath language (a LocationPath is a special case of an Expr), they are the most important construct.

Location Path Syntax Abbreviations

Every location path can be expressed using a straightforward but rather verbose syntax. There are also a number of syntactic abbreviations that allow common cases to be expressed concisely. The next sections:

Location Path Examples Using Unabbreviated Syntax

Table B-1 lists examples of location paths using the unabbreviated syntax.

Table B-1 XPath: Location Path Examples Using Unabbreviated Syntax

Unabbreviated Location Path Description

child::para

Selects the para element children of the context node

child::*

Selects all element children of the context node

child::text()

Selects all text node children of the context node

child::node()

Selects all children of the context node, whatever their node type

attribute::name

Selects the name attribute of the context node

attribute::*

Selects all attributes of the context

nodedescendant::para

Selects the para element descendants of the context node

ancestor::div

Selects all div ancestors of the context node

ancestor-or-self::div

Selects the div ancestors of the context node and, if the context node is a div element, the context node as well

descendant-or-self::para

Selects the para element descendants of the context node and, if the context node is a para element, the context node as well

self::para

Selects the context node if it is a para element; otherwise, selects nothing

child::chapter/descendant::para

Selects the para element descendants of the chapter element children of the context node

child::*/child::para

Selects all para grandchildren of the context node

/


Selects the document root, which is always the parent of the document element

/descendant::para

Selects all para elements in the same document as the context node

/descendant::olist/child::item

Selects all item elements that have an olist parent and are in the same document as the context node

child::para[position()=1]

Selects the first para child of the context node

child::para[position()=last()]

Selects the last para child of the context node

child::para[position()=last()-1]

Selects the penultimate para child of the context node

child::para[position()>1]

Selects all para children of the context node other than its first para child

following-sibling::chapter[position()=1]

Selects the next chapter sibling of the context node

preceding-sibling::chapter[position()=1]

Selects the previous chapter sibling of the context node

/descendant::figure[position()=42]

Selects the forty-second figure element in the document

/child::doc/child::chapter[position()=5]/child::section [position()=2]

Selects the second section of the fifth chapter of the doc document element

child::para[attribute::type="warning"]

Selects all para children of the context node that have a type attribute with value warning

child::para[attribute::type='warning'][position()=5]

Selects the fifth para child of the context node that has a type attribute with value warning

child::para[position()=5][attribute::type= "warning"]

Selects the fifth para child of the context node, if that child has a type attribute with value warning

child::chapter[child::title='Introduction']

Selects the chapter children of the context node that have one or more title children with string-value Introduction

child::chapter[child::title]

Selects the chapter children of the context node that have one or more title children

child::*[self::chapter or self::appendix]

Selects the chapter and appendix children of the context node

child::*[self::chapter or self::appendix][position()=last()]

Selects the last chapter or appendix child of the context node


Location Path Examples Using Abbreviated Syntax

Table B-2 lists examples of location paths using abbreviated syntax.

Table B-2 XPath: Location Path Examples Using Abbreviated Syntax

Abbreviated Location Path Description

para

Selects the para element children of the context node

*


Selects all element children of the context node

text()

Selects all text node children of the context node

@name

Selects the name attribute of the context node

@*


Selects all attributes of the context node

para[1]

Selects the first para child of the context node

para[last()]

Selects the last para child of the context node

*/para

Selects all para grandchildren of the context node

/doc/chapter[5]/section[2]

Selects the second section of the fifth chapter of document element doc

chapter//para

Selects the para element descendants of the chapter element children of the context node

//para

Selects all para descendants of the document root, and thus selects all para elements in the same document as the context node

//olist/item

Selects all item elements in the same document as the context node that have an olist parent

.

Selects the context node

.//para

Selects the para element descendants of the context node

..

Selects the parent of the context node

../@lang

Selects the lang attribute of the parent of the context node

para[@type="warning"]

Selects all para children of the context node that have a type attribute with value warning

para[@type="warning"][5]

Selects the fifth para child of the context node that has a type attribute with value warning

para[5][@type="warning"]

Selects the fifth para child of the context node, if that child has a type attribute with value warning

chapter[title="Introduction"]

Selects the chapter children of the context node that have one or more title children with string-value Introduction

chapter[title]

Selects the chapter children of the context node that have one or more title children

employee[@secretary and @assistant]

Selects all employee children of the context node that have both a secretary attribute and an assistant attribute


The most important abbreviation is that child:: can be omitted from a location step. In effect, child is the default axis. For example, a location path div/para is short for child::div/child::para.

Attribute Abbreviation @

There is also an abbreviation for attributes: attribute:: can be abbreviated to an at-sign (@).

For example, a location path para[@type="warning"] is short for child::para[attribute::type="warning"] and so selects para children with a type attribute with value equal to warning.

Path Abbreviation //

Two slashes (//) is short for /descendant-or-self::node()/. For example, //para is short for /descendant-or-self::node()/child::para and so will select any para element in the document (even a para element that is a document element will be selected by //para because the document element node is a child of the root node);

div//para is short for div/descendant-or-self::node()/child::para and so will select all para descendants of div children.

Note:

Location path //para[1] does not mean the same as the location path /descendant::para[1]. The latter selects the first descendant para element; the former selects all descendant para elements that are the first para children of their parents.

Location Step Abbreviation .

A location step of a period (.) is short for self::node(). This is particularly useful in conjunction with //. For example, the location path .//para is short for:

self::node()/descendant-or-self::node()/child::para

and so will select all para descendant elements of the context node.

Location Step Abbreviation ..

Similarly, a location step of two periods (..) is short for parent::node(). For example, ../title is short for:

parent::node()/child::title

and so will select the title children of the parent of the context node.

Abbreviation Summary

AbbreviatedAbsoluteLocationPath ::= '//' RelativeLocationPath

AbbreviatedRelativeLocationPath ::= RelativeLocationPath '//' Step

AbbreviatedStep ::= '.' | '..'

AbbreviatedAxisSpecifier ::= '@'?

Relative and Absolute Location Paths

There are two kinds of location path:

  • Relative location paths. A relative location path consists of a sequence of one or more location steps separated by /. The steps in a relative location path are composed together from left to right. Each step in turn selects a set of nodes relative to a context node. An initial sequence of steps is composed together with a following step as follows. The initial sequence of steps selects a set of nodes relative to a context node. Each node in that set is used as a context node for the following step. The sets of nodes identified by that step are unioned together. The set of nodes identified by the composition of the steps is this union.

    For example, child::div/child::para selects the para element children of the div element children of the context node, or, in other words, the para element grandchildren that have div parents.

  • Absolute location paths. An absolute location path consists of / optionally followed by a relative location path. A / by itself selects the root node of the document containing the context node. If it is followed by a relative location path, then the location path selects the set of nodes that would be selected by the relative location path relative to the root node of the document containing the context node.

Location Path Syntax Summary

Location path provides a means to search for target nodes. Here is the general syntax for location path:

axisname :: nodetest expr1 expr2 ...

LocationPath             ::=    RelativeLocationPath
                              | AbsoluteLocationPath
AbsoluteLocationPath     ::=    '/' RelativeLocationPath?
                              | AbbreviatedAbsoluteLocationPath
RelativeLocationPath     ::=    Step
                              | RelativeLocationPath '/' Step
                              | AbbreviatedRelativeLocationPath

XPath 1.0 Data Model

XPath operates on an XML document as a tree. This section describes how XPath models an XML document as a tree. The relationship of this model to the XML documents operated on by XPath must conform to the XML Namespaces Recommendation.

Nodes

The tree contains nodes. There are seven types of node:

Root Nodes

The root node is the root of the tree. It does not occur except as the root of the tree. The element node for the document element is a child of the root node. The root node also has as children processing instruction and comment nodes for processing instructions and comments that occur in the prolog and after the end of the document element. The string-value of the root node is the concatenation of the string-values of all text node descendants of the root node in document order. The root node does not have an expanded-name.

Element Nodes

There is an element node for every element in the document. An element node has an expanded-name computed by expanding the QName of the element specified in the tag in accordance with the XML Namespaces Recommendation. The namespace URI of the element expanded-name will be null if the QName has no prefix and there is no applicable default namespace.

Note:

In the notation of Appendix A.3 of http://www.w3.org/TR/REC-xml-names/, the local part of the expanded-name corresponds to the type attribute of the ExpEType element; the namespace URI of the expanded-name corresponds to the ns attribute of the ExpEType element, and is null if the ns attribute of the ExpEType element is omitted.

The children of an element node are the element nodes, comment nodes, processing instruction nodes and text nodes for its content. Entity references to both internal and external entities are expanded. Character references are resolved. The string-value of an element node is the concatenation of the string-values of all text node descendants of the element node in document order.

Unique IDs. An element node may have a unique identifier (ID). This is the value of the attribute that is declared in the Document Type Definition (DTD) as type ID. No two elements in a document may have the same unique ID. If an XML processor reports two elements in a document as having the same unique ID (which is possible only if the document is invalid), then the second element in document order must be treated as not having a unique ID.

Note:

If a document does not have a DTD, then no element in the document will have a unique ID.

Text Nodes

Character data is grouped into text nodes. As much character data as possible is grouped into each text node: a text node never has an immediately following or preceding sibling that is a text node. The string-value of a text node is the character data. A text node always has at least one character of data. Each character within a CDATA section is treated as character data. Thus, <![CDATA[<]]> in the source document will treated the same as &lt;. Both will result in a single < character in a text node in the tree. Thus, a CDATA section is treated as if the <![CDATA[ and ]]> were removed and every occurrence of < and & were replaced by &lt; and &amp; respectively.

Note:

When a text node that contains a < character is written out as XML, an escape character must precede the < character must be escaped for example, by using &lt;, or including it in a CDATA section. Characters inside comments, processing instructions and attribute values do not produce text nodes. Line endings in external entities are normalized to #xA as specified in the XML Recommendation. A text node does not have an expanded name.

Attribute Nodes

Each element node has an associated set of attribute nodes; the element is the parent of each of these attribute nodes; however, an attribute node is not a child of its parent element.

Note:

This is different from the Document Object Model (DOM), which does not treat the element bearing an attribute as the parent of the attribute.

Elements never share attribute nodes: if one element node is not the same node as another element node, then none of the attribute nodes of the one element node will be the same node as the attribute nodes of another element node.

Note:

The = operator tests whether two nodes have the same value, not whether they are the same node. Thus attributes of two different elements may compare as equal using =, even though they are not the same node.

A defaulted attribute is treated the same as a specified attribute. If an attribute was declared for the element type in the DTD, but the default was declared as #IMPLIED, and the attribute was not specified on the element, then the element attribute set does not contain a node for the attribute.

Some attributes, such as xml:lang and xml:space, have the semantics that they apply to all elements that are descendants of the element bearing the attribute, unless overridden with an instance of the same attribute on another descendant element. However, this does not affect where attribute nodes appear in the tree: an element has attribute nodes only for attributes that were explicitly specified in the start-tag or empty-element tag of that element or that were explicitly declared in the DTD with a default value.

An attribute node has an expanded-name and a string-value. The expanded-name is computed by expanding the QName specified in the tag in the XML document in accordance with the XML Namespaces Recommendation. The namespace URI of the attribute name will be null if the QName of the attribute does not have a prefix.

Note:

In the notation of Appendix A.3 of XML Namespaces Recommendation, the local part of the expanded-name corresponds to the name attribute of the ExpAName element; the namespace URI of the expanded-name corresponds to the ns attribute of the ExpAName element, and is null if the ns attribute of the ExpAName element is omitted.

An attribute node has a string-value. The string-value is the normalized value as specified by the XML Recommendation. An attribute whose normalized value is a zero-length string is not treated specially: it results in an attribute node whose string-value is a zero-length string.

Note:

It is possible for default attributes to be declared in an external DTD or an external parameter entity. The XML Recommendation does not require an XML processor to read an external DTD or an external parameter unless it is validating. A style sheet or other facility that assumes that the XPath tree contains default attribute values declared in an external DTD or parameter entity may not work with someXML processors that do not validate.

There are no attribute nodes corresponding to attributes that declare namespaces.

Namespace Nodes

Each element has an associated set of namespace nodes, one for each distinct namespace prefix that is in scope for the element (including the xml prefix, which is implicitly declared by the XML Namespaces Recommendation) and one for the default namespace if one is in scope for the element. The element is the parent of each of these namespace nodes; however, a namespace node is not a child of its parent element.

Elements never share namespace nodes: if one element node is not the same node as another element node, then none of the namespace nodes of the one element node will be the same node as the namespace nodes of another element node. This means that an element will have a namespace node:

  • For every attribute on the element whose name starts with xmlns:;

  • For every attribute on an ancestor element whose name starts xmlns: unless the element itself or a nearer ancestor re-declares the prefix;

  • For an xmlns attribute, if the element or some ancestor has an xmlns attribute, and the value of the xmlns attribute for the nearest such element is nonempty

    Note:

    An attribute xmlns="" undeclares the default namespace.

A namespace node has an expanded-name: the local part is the namespace prefix (this is empty if the namespace node is for the default namespace); the namespace URI is always NULL.

The string-value of a namespace node is the namespace URI that is being bound to the namespace prefix; if it is relative, then it must be resolved just like a namespace URI in an expanded-name.

Processing Instruction Nodes

There is a processing instruction node for every processing instruction, except for any processing instruction that occurs within the document type declaration. A processing instruction has an expanded-name: the local part is the processing instruction target; the namespace URI is NULL. The string-value of a processing instruction node is the part of the processing instruction following the target and any whitespace. It does not include the terminating ?>.

Note:

The XML declaration is not a processing instruction. Therefore, there is no processing instruction node corresponding to the XML declaration.

Comment Nodes

There is a comment node for every comment, except for any comment that occurs within the document type declaration. The string-value of comment is the content of the comment not including the opening <!-- or the closing -->. A comment node does not have an expanded-name.

For every type of node, there is a way of determining a string-value for a node of that type. For some types of node, the string-value is part of the node; for other types of node, the string-value is computed from the string-value of descendant nodes.

Note:

For element nodes and root nodes, the string-value of a node is not the same as the string returned by the DOM nodeValue method.

Expanded-Name

Some types of node also have an expanded-name, which is a pair consisting of:

  • A local part. This is a string.

  • A namespace URI. The namespace URI is either null or a string. If specified in the XML document it can be a URI reference as defined in RFC2396; this means it can have a fragment identifier and be relative. A relative URI should be resolved into an absolute URI during namespace processing: the namespace URIs of expanded-names of nodes in the data model should be absolute.

Two expanded names are equal if they have the same local part, and both have a null namespace URI or both have namespace URIs that are equal.

Document Order

There is an ordering, document order, defined on all the nodes in the document corresponding to the order in which the first character of the XML representation of each node occurs in the XML representation of the document after expansion of general entities. Thus, the root node will be the first node.

Element nodes occur before their children. Thus, document order orders element nodes in order of the occurrence of their start-tag in the XML (after expansion of entities). The attribute nodes and namespace nodes of an element occur before the children of the element. The namespace nodes are defined to occur before the attribute nodes.

The relative order of namespace nodes is implementation-dependent.

The relative order of attribute nodes is implementation-dependent.

Reverse document order is the reverse of document order.

Root nodes and element nodes have an ordered list of child nodes. Nodes never share children: if one node is not the same node as another node, then none of the children of the one node will be the same node as any of the children of another node.

Every node other than the root node has exactly one parent, which is either an element node or the root node. A root node or an element node is the parent of each of its child nodes. The descendants of a node are the children of the node and the descendants of the children of the node.

Overview of the W3C Namespaces in XML Recommendation

Software modules must recognize tags and attributes which they are designed to process, even in the face of collisions occurring when markup intended for some other software package uses the same element type or attribute name.

Document constructs should have universal names, whose scope extends beyond their containing document. The W3C Namespaces in XML Recommendation describes the mechanism, XML namespaces, which accomplishes this.

What Is a Namespace?

An XML namespace is a collection of names, identified by a URI reference [RFC2396], which are used in XML documents as element types and attribute names. XML namespaces differ from the namespaces conventionally used in computing disciplines in that the XML version has internal structure and is not, mathematically speaking, a set. These issues are discussed in the W3C Namespace Recommendation, appendix, "A. The Internal Structure of XML Namespaces".

URI References

URI references which identify namespaces are considered identical when they are exactly the same character-for-character. Note that URI references which are not identical in this sense may in fact be functionally equivalent. Examples include URI references which differ only in case, or which are in external entities which have different effective base URIs.

Names from XML namespaces may appear as qualified names, which contain a single colon, separating the name into a namespace prefix and a local part.

The prefix, which is mapped to a URI reference, selects a namespace. The combination of the universally managed URI namespace and the namespace of the document produces identifiers that are universally unique. Mechanisms are provided for prefix scoping and defaulting.

URI references can contain characters not allowed in names, so cannot be used directly as namespace prefixes. Therefore, the namespace prefix serves as a proxy for a URI reference. An attribute-based syntax described in the following section is used to declare the association of the namespace prefix with a URI reference; software which supports this namespace proposal must recognize and act on these declarations and prefixes.

Notation and Usage

Many of the nonterminals in the productions in this specification are defined not here but in the W3C XML Recommendation. When nonterminals defined here have the same names as nonterminals defined in the W3C XML Recommendation, the productions here in all cases match a subset of the strings matched by the corresponding ones there.

In productions of this document, the NSC is a Namespace Constraint, one of the rules that documents conforming to this specification must follow.

All Internet domain names used in examples, with the exception of w3.org, are selected at random and should not be taken as having any import.

Declaring Namespaces

A namespace is declared using a family of reserved attributes. Such an attribute name must either be xmlns or have xmlns: as a prefix. These attributes, like any other XML attributes, can be provided directly or by default.

Attribute Names for Namespace Declaration

[1] NSAttName ::=    PrefixedAttName
                     | DefaultAttName
[2] PrefixedAttName ::= 'xmlns:' NCName [NSC: Leading "XML" ]
[3] DefaultAttName ::= 'xmlns'
[4] NCName ::= (Letter | '_') (NCNameChar)*   /* An XML Name, minus the ":" */
[5] NCNameChar ::= Letter | Digit | '.' | '-' | '_' | CombiningChar | Extender

The attribute value, a URI reference, is the namespace name identifying the namespace. The namespace name, to serve its intended purpose, should have the characteristics of uniqueness and persistence. It is not a goal that it be directly usable for retrieval of a schema (if any exists). An example of a syntax that is designed with these goals in mind is that for Uniform Resource Names [RFC2141]. However, it should be noted that ordinary URLs can be managed in such a way as to achieve these same goals.

When the Attribute Name Matches the PrefixedAttName

If the attribute name matches PrefixedAttName, then the NCName gives the namespace prefix, used to associate element and attribute names with the namespace name in the attribute value in the scope of the element to which the declaration is attached. In such declarations, the namespace name may not be empty.

When the Attribute Name Matches the DefaultAttName

If the attribute name matches DefaultAttName, then the namespace name in the attribute value is that of the default namespace in the scope of the element to which the declaration is attached. In such a default declaration, the attribute value may be empty. Default namespaces and overriding of declarations are discussed in section "Applying Namespaces to Elements and Attributes" of the W3C Namespace Recommendation.

The following example namespace declaration associates the namespace prefix edi with the namespace name http://ecommerce.org/schema:

<x xmlns:edi='http://ecommerce.org/schema'>
  <!-- the "edi" prefix is bound to http://ecommerce.org/schema
       for the "x" element and contents -->
</x>

Namespace Constraint: Prefixes Beginning X-M-L

Prefixes beginning with the three-letter sequence x, m, l, in any case combination, are reserved for use by XML and XML-related specifications.

Qualified Names

In XML documents conforming to the W3C Namespace Recommendation, some names (constructs corresponding to the nonterminal Name) may be given as qualified names, defined as follows:

Qualified Name Syntax

[6] QName ::= (Prefix ':')? LocalPart
[7] Prefix ::= NCName
[8] LocalPart::= NCName

What is the Prefix?

The Prefix provides the namespace prefix part of the qualified name, and must be associated with a namespace URI reference in a namespace declaration.

The LocalPart provides the local part of the qualified name. Note that the prefix functions only as a placeholder for a namespace name. Applications should use the namespace name, not the prefix, in constructing names whose scope extends beyond the containing document.

Using Qualified Names

In XML documents conforming to the W3C Namespace Recommendation, element types are given as qualified names, as follows:

Element Types

[9]  STag ::= '<' QName (S Attribute)* S? '>' [NSC: Prefix Declared ]
[10] ETag::= '</' QName S? '>'[NSC: Prefix Declared ]
[11] EmptyElemTag ::= '<' QName (S Attribute)* S? '/>' [NSC: Prefix Declared ]

The following is an example of a qualified name serving as an element type:

<x xmlns:edi='http://ecommerce.org/schema'>
  <!-- namespace of 'price' element is http://ecommerce.org/schema -->
  <edi:price units='Euro'>32.18</edi:price>
</x>

Attributes are either namespace declarations or their names are given as qualified names:

Attribute

[12] Attribute::= NSAttName Eq AttValue|QName Eq AttValue [NSC:Prefix Declared]

The following is an example of a qualified name serving as an attribute name:

<x xmlns:edi='http://ecommerce.org/schema'>
   <!-- namespace of 'taxClass' attribute is http://ecommerce.org/schema -->
   <lineItem edi:taxClass="exempt">Baby food</lineItem>
</x>

Namespace Constraint: Prefix Declared

The namespace prefix, unless it is xml or xmlns, must have been declared in a namespace declaration attribute in either the start-tag of the element where the prefix is used or in an ancestor element, that is, an element in whose content the prefixed markup occurs:

The prefix xml is by definition bound to the namespace name http://www.w3.org/XML/1998/namespace.

The prefix xmlns is used only for namespace bindings and is not itself bound to any namespace name.

This constraint may lead to operational difficulties in the case where the namespace declaration attribute is provided, not directly in the XML document entity, but through a default attribute declared in an external entity. Such declarations may not be read by software which is based on an XML processor that does not validate.

Many XML applications, presumably including namespace-sensitive ones, fail to require validating processors. For correct operation with such applications, namespace declarations must be provided either directly or through default attributes declared in the internal subset of the DTD.

Element names and attribute types are also given as qualified names when they appear in declarations in the DTD:

Qualified Names in Declarations

[13] doctypedecl::= '<!DOCTYPE' S QName (S ExternalID)? S? ('[' (markupdecl | 
                     PEReference | S)* ']' S?)? '>'
[14] elementdecl::= '<!ELEMENT' S QName S contentspec S? '>'
[15] cp         ::= (QName | choice | seq) ('?' | '*' | '+')?
[16] Mixed      ::= '(' S? '#PCDATA' (S? '|' S? QName)* S? ')*' 
                     | '(' S? '#PCDATA' S? ')' 
[17] AttlistDecl::= '<!ATTLIST' S QName AttDef* S? '>'
[18] AttDef     ::=  S (QName | NSAttName) S AttType S DefaultDecl

Applying Namespaces to Elements and Attributes

This section describes how to apply namespaces to elements and attributes.

Namespace Scoping

The namespace declaration is considered to apply to the element where it is specified and to all elements within the content of that element, unless overridden by another namespace declaration with the same NSAttName part:

<?xml version="1.0"?>
  <!-- all elements here are explicitly in the HTML namespace -->
  <html:html xmlns:html='http://www.w3.org/TR/REC-html40'>
    <html:head><html:title>Frobnostication</html:title></html:head>
    <html:body><html:p>Moved to 
      <html:a href='http://frob.com'>here.</html:a></html:p></html:body>
</html:html>

Multiple namespace prefixes can be declared as attributes of a single element, as shown in this example:

<?xml version="1.0"?>
  <!-- both namespace prefixes are available throughout -->
  <bk:book xmlns:bk='urn:loc.gov:books'
           xmlns:isbn='urn:ISBN:0-395-36341-6'>
      <bk:title>Cheaper by the Dozen</bk:title>
      <isbn:number>1568491379</isbn:number>
  </bk:book>

Namespace Defaulting

A default namespace is considered to apply to the element where it is declared (if that element has no namespace prefix), and to all elements with no prefix within the content of that element. If the URI reference in a default namespace declaration is empty, then un-prefixed elements in the scope of the declaration are not considered to be in any namespace. Note that default namespaces do not apply directly to attributes.

<?xml version="1.0"?>
  <!-- elements are in the HTML namespace, in this case by default -->
  <html xmlns='http://www.w3.org/TR/REC-html40'>
    <head><title>Frobnostication</title></head>
    <body><p>Moved to 
      <a href='http://frob.com'>here</a>.</p></body>
  </html>

<?xml version="1.0"?>
  <!-- unprefixed element types are from "books" -->
  <book xmlns='urn:loc.gov:books'
        xmlns:isbn='urn:ISBN:0-395-36341-6'>
      <title>Cheaper by the Dozen</title>
      <isbn:number>1568491379</isbn:number>
  </book>

A larger example of namespace scoping:

<?xml version="1.0"?>
  <!-- initially, the default namespace is "books" -->
  <book xmlns='urn:loc.gov:books'
        xmlns:isbn='urn:ISBN:0-395-36341-6'>
      <title>Cheaper by the Dozen</title>
      <isbn:number>1568491379</isbn:number>
      <notes>
        <!-- make HTML the default namespace for some commentary -->
        <p xmlns='urn:w3-org-ns:HTML'>
            This is a <i>funny</i> book!
        </p>
      </notes>
  </book>

The default namespace can be set to the empty string. This has the same effect, within the scope of the declaration, of there being no default namespace.

<?xml version="1.0"?>
  <Beers>
    <!-- the default namespace is now that of HTML -->
    <table xmlns='http://www.w3.org/TR/REC-html40'>
     <th><td>Name</td><td>Origin</td><td>Description</td></th>
     <tr> 
       <!-- no default namespace inside table cells -->
       <td><brandName xmlns="">Huntsman</brandName></td>
       <td><origin xmlns="">Bath, UK</origin></td>
       <td>
         <details xmlns=""><class>Bitter</class><hop>Fuggles</hop>
           <pro>Wonderful hop, light alcohol, good summer beer</pro>
           <con>Fragile; excessive variance pub to pub</con>
           </details>
          </td>
        </tr>
      </table>
    </Beers>

Uniqueness of Attributes

In XML documents conforming to this specification, no tag may contain two attributes which:

  • Have identical names, or

  • Have qualified names with the same local part and with prefixes which have been bound to namespace names that are identical.

For example, each of the bad start-tags is not permitted in the following:

<!-- http://www.w3.org is bound to n1 and n2 -->
  <x xmlns:n1="http://www.w3.org" 
     xmlns:n2="http://www.w3.org" >
    <bad a="1"     a="2" />
    <bad n1:a="1"  n2:a="2" />
  </x>

However, each of the following is legal, the second because the default namespace does not apply to attribute names:

<!-- http://www.w3.org is bound to n1 and is the default -->
  <x xmlns:n1="http://www.w3.org" 
     xmlns="http://www.w3.org" >
    <good a="1"     b="2" />
    <good a="1"     n1:a="2" />
  </x>

Conformance of XML Documents

In XML documents which conform to the W3C Namespace Recommendation, element types and attribute names must match the production for QName and must satisfy the Namespace Constraints.

An XML document conforms to this specification if all other tokens in the document which are required, for XML conformance, to match the XML production for Name, match the production of this specification for NCName.

The effect of conformance is that in such a document:

  • All element types and attribute names contain either zero or one colon.

  • No entity names, PI targets, or notation names contain any colons.

Strictly speaking, attribute values declared to be of types ID, IDREF(S), ENTITY(IES), and NOTATION are also Names, and thus should be colon-free.

However, the declared type of attribute values is only available to processors which read markup declarations, for example validating processors. Thus, unless the use of a validating processor has been specified, there can be no assurance that the contents of attribute values have been checked for conformance to this specification.

The following W3C Namespace Recommendation Appendixes are not included in this primer:

  • A. The Internal Structure of XML Namespaces (Non-Normative)

  • A.1 The Insufficiency of the Traditional Namespace

  • A.2 XML Namespace Partitions

  • A.3 Expanded Element Types and Attribute Names

  • A.4 Unique Expanded Attribute Names

Overview of the W3C XML Information Set

The W3C XML Information Set specification defines an abstract data set called the XML Information Set (Infoset). It provides a consistent set of definitions for use in other specifications that must refer to the information in a well-formed XML document.

The primary criterion for inclusion of an information item or property has been that of expected usefulness in future specifications. It does not constitute a minimum set of information that must be returned by an XML processor.

An XML document has an information set if it is well-formed and satisfies the namespace constraints described in the following section.

There is no requirement for an XML document to be valid in order to have an information set.

Information sets may be created by methods (not described in this specification) other than parsing an XML document. See "Synthetic Infosets".

The information set of an XML document consists of a number of information items; the information set for any well-formed XML document will contain at least a document information item and several others. An information item is an abstract description of some part of an XML document: each information item has a set of associated named properties. In this specification, the property names are shown in square brackets, [thus]. The types of information item are listed in section 2.

The XML Information Set does not require or favor a specific interface or class of interfaces. This specification presents the information set as a modified tree for the sake of clarity and simplicity, but there is no requirement that the XML Information Set be made available through a tree structure; other types of interfaces, including (but not limited to) event-based and query-based interfaces, are also capable of providing information conforming to the XML Information Set.

The terms "information set" and "information item" are similar in meaning to the generic terms "tree" and "node", as they are used in computing. However, the former terms are used in this specification to reduce possible confusion with other specific data models. Information items do not map one-to-one with the nodes of the DOM or the "tree" and "nodes" of the XPath data model.

In this specification, the words "must", "should", and "may" assume the meanings specified in [RFC2119], except that the words do not appear in uppercase.

Namespaces and the W3C XML Information Set

XML 1.0 documents that do not conform to the W3C Namespace Recommendation, though technically well-formed, are not considered to have meaningful information sets. That is, this specification does not define an information set for documents that have element or attribute names containing colons that are used in other ways than as prescribed by the W3C Namespace Recommendation.

Also, the XML Infoset specification does not define an information set for documents which use relative URI references in namespace declarations. This is in accordance with the decision of the W3C XML Plenary Interest Group described in Relative Namespace URI References in the W3C Namespace Recommendation.

The value of a namespace name property is the normalized value of the corresponding namespace attribute; no additional URI escaping is applied to it by the processor.

Entities

An information set describes its XML document with entity references already expanded, that is, represented by the information items corresponding to their replacement text. However, there are various circumstances in which a processor may not perform this expansion. An entity may not be declared, or may not be retrievable. A processor that does not validate may choose not to read all declarations, and even if it does, may not expand all external entities. In these cases an un-expanded entity reference information item is used to represent the entity reference.

End-of-Line Handling

The values of all properties in the Infoset take account of the end-of-line normalization described in the XML Recommendation, 2.11 "End-of-Line Handling".

Base URIs

Several information items have a base URI or declaration base URI property. These are computed according to XML Base. Note that retrieval of a resource may involve redirection at the parser level (for example, in an entity resolver) or at a lower level; in this case the base URI is the final URI used to retrieve the resource after all redirection.

The value of these properties does not reflect any URI escaping that may be required for retrieval of the resource, but it may include escaped characters if these were specified in the document, or returned by a server in the case of redirection.

In some cases (such as a document read from a string or a pipe) the rules in XML Base may result in a base URI being application dependent. In these cases this specification does not define the value of the base URI or declaration base URI property.

When resolving relative URIs the base URI property should be used in preference to the values of xml:base attributes; they may be inconsistent in the case of Synthetic Infosets.

Unknown and No Value

Some properties may sometimes have the value unknown or no value, and it is said that a property value is unknown or that a property has no value respectively. These values are distinct from each other and from all other values. In particular they are distinct from the empty string, the empty set, and the empty list, each of which simply has no members. This specification does not use the term null because in some communities it has particular connotations which may not match those intended here.

Synthetic Infosets

This specification describes the information set resulting from parsing an XML document. Information sets may be constructed by other means, for example by use of an application program interface (API) such as the DOM or by transforming an existing information set.

An information set corresponding to a real document will necessarily be consistent in various ways; for example the in-scope namespaces property of an element will be consistent with the [namespace attributes] properties of the element and its ancestors. This may not be true of an information set constructed by other means; in such a case there will be no XML document corresponding to the information set, and to serialize it will require resolution of the inconsistencies (for example, by producing namespace declarations that correspond to the namespaces in scope).