Sunday, January 8, 2012

Canonical XML

The canonical form of an XML document is physical representation of the document produced by the method described in this specification. The changes are summarized in the following list:
  • The document is encoded in UTF-8
  • Line breaks normalized to #xA on input, before parsing
  • Attribute values are normalized, as if by a validating processor
  • Character and parsed entity references are replaced
  • CDATA sections are replaced with their character content
  • The XML declaration and document type declaration (DTD) are removed
  • Empty elements are converted to start-end tag pairs
  • Whitespace outside of the document element and within start and end tags is normalized
  • All whitespace in character content is retained (excluding characters removed during line feed normalization)
  • Attribute value delimiters are set to quotation marks (double quotes)
  • Special characters in attribute values and character content are replaced by character references
  • Superfluous namespace declarations are removed from each element
  • Default attributes are added to each element
  • Lexicographic order is imposed on the namespace declarations and attributes of each element

The term canonical XML refers to XML that is in canonical form. The XML canonicalization method is the algorithm defined by this specification that generates the canonical form of a given XML document or document subset. The term XML canonicalization refers to the process of applying the XML canonicalization method to an XML document or document subset.

source: http://www.w3.org/TR/2001/REC-xml-c14n-20010315

1 comment:

  1. Thank you for writing post on Canonical XML. I went through all changes in it. You described very well. The thing thing you described all changes by points. Good job!
    electronic signatures

    ReplyDelete