XML, well-formed XML and valid XML

XML, the eXtensible Markup Language, is a language with a simple formal syntax, well suited for creating and manipulating documents with programs, and at the same time easy to read and create documents by humans.

XML is extensible, it does not fix the markup used in documents, and the developer is free to create markup in accordance with the needs of a particular area, being limited only by the syntax rules of the language.

DTD

A DTD, Document Type Definition, is a predefined set of rules that defines the relationships between elements and attributes.

For example, the DTD for HTML says that the DIV tag must be inside the BODY tag and can appear multiple times, TITLE - in HEAD and only once, and SCRIPT - both there and there as many times as you want.

A DTD is usually described directly in the document as a formulation line starting with <!DOCTYPE ... > or in a separate file.

Difference between well-formed XML and valid XML

Depending on the level of compliance, the document can be “well-formed” or “valid”.

The main features of well-formed XML follow from the formal description of the standard:

  • The document has exactly one root element, which contains all the others. That is, <document>...</document><appendix>...</appendix> is not an XML document.
  • All open tags must be closed. HTML, for example, allows many tags (<p>, <body>, <li>, <td> and many others) not to be closed. You can't do that in XML.
  • For single tags (like <br>), to distinguish them from opening tags, a special notation is provided: <br/>. But you can write completely <br></br>.
  • Tag names are case sensitive. If you open the <SiteDescription> tag, then it must be closed with the same one, </sitedescription> is not allowed.
  • Tags cannot break nesting. This should not be: <em><b>...</em></b>.
  • All tag attributes must be enclosed in double quotes (").
  • There are three characters - <, > and & that must be escaped everywhere with &lt;, &gt; и &amp;. Inside the attributes, you must also escape the double quotation mark with &quot;.
  • All characters in the document must correspond to the declared encoding.

A document is valid if it is formed in compliance with all syntax rules for correctness of a particular XML, i.e. complies with DTD.

well-formed XML is syntactically correct (can be parsed by a parser), and valid XML is syntactically and semantically correct (follows the rules of a pre-defined dictionary and grammar (DTD)).

Namespace in XML

An XML namespace is a URI-identifiable collection of names used in XML documents to denote element types and attribute names. The XML namespace differs from those "namespaces" commonly used in computer science in that it is structured internally in the XML version and, mathematically, is not a set.

Namespaces are declared using the XML attribute xmlns, which must be a URI and a prefix that uniquely identifies the namespace of each element.

All element names within a namespace must be unique.

In general, the XML namespace does not require its vocabulary to be defined.

An XML document can contain element and attribute names from multiple XML vocabularies. Each dictionary has its own namespace - this is how the problem of ambiguous names of elements and attributes is solved.


Read also:


Comments

Popular posts from this blog

Methods for reading XML in Java

ArrayList and LinkedList in Java, memory usage and speed