Markup Languages for Complex Documents


  • David Dubin

For all the developments in XML since 1998, one thing that has not changed is the understanding of XML documents as serializations of tree structures conforming to the constraints expressed in the document's schema. Notwithstanding XML's many strengths, there are problem areas which invite further research on some of the fundamental assumptions of XML and the document models associated with it. It is a challenge to represent in XML anything that does not easily lend itself to representation by context-free or constituent structure grammars, such as overlapping or fragmented elements, and multiple co-existing complete or partial alternative structures or orderings. For the purpose of our work, we call such structures complex structures, and we call documents containing such structures complex documents. The MLCD (Markup Languages for Complex Documents) project aims to integrate alternative approaches by developing both an alternative notation, a data structure and a constraint language which as far as possible is compatible with and retains the strengths of XML-based markup, yet solves the problems with representation and processing of complex structures. MLCD started in 2001 and is expected to complete its work in 2007. The project is a collaboration between a group of researchers based at several different institutions.

HTML code

Funding Agencies