Problem solve Get help with specific problems with your technologies, process and projects.

Schema, query: Is it database or XML?

Developers familiar with XML will recognize terms such as schema and query.

For many developers, terms such as schema and query evoke thoughts of databases and DBMS technology. Recently, however, Extended Markup Language (XML) developers have added query algebra and schema to their lexicon. XML is maturing and evolving from a W3C specification for a markup language to a family of specifications and technologies. The W3C has chartered working groups focused on creating specifications for HTML's replacement (XHTML), XML messaging (XML Protocol), hyper linking (XLink), and other technologies. The W3C has also made other moves that make XML more approachable for database developers, including the publication of specifications for schemas and a query language.

As a descendant of Standard Generalized Markup Language (SGML), XML also supports the use of a Document Type Definition (DTD) to describe a document's structure. DTDs do not provide capabilities, such as type checking and declarative constraints, that database developers have come to expect. XML developers entered the breech with several alternatives to DTDs and the W3C eventually chartered the Schema Working Group that produced a two-part specification for schemas. Likewise the original XML specification did not define an abstract data model or query language, so developers once again took the initiative. XQL, XML-QL, XPath and other solutions were developed to permit developers to search XML documents.

Besides ongoing debates about schema and queries, developers also have different beliefs about which database paradigm is best suited to storing and retrieving XML. Advocates of Object Database Management Systems (ODBMS) essentially believe the defining characteristics of a DBMS should be its ability to persist application objects, and support encapsulation, inheritance, and polymorphism. The DBMS operates with instances of a class of object, and can access persistent instances by using a unique ID. Simply put, a developer using a Book class and ODBMS expects the object database design to mirror his application's Book class.

SQL database technology evolved from the relational model research of E.F. Codd at IBM. Codd's relational model defines relations (a type of set), plus data integrity rules plus well-defined operations known as relational algebra. These operations include restriction, projection, union, difference, assignment, and Cartesian product. An SQL user often manipulates a database as rows and columns, although SQL object-relational databases also provide an object view of data. Therefore, an SQL user might define a Book type or decompose a Book into constituent elements, using different tables for information such as Author, Publisher, and Book. The XML definition of a Book might be specified by a DTD using different elements or attributes of a book document, or be specified as an XML Schema complex type.

The Document Object Model (DOM) is a W3C specification for a programming interface that enables developers to manipulate documents as tree-like objects and traverse through documents by accessing nodes of the tree. Various projects have used the DOM as the core technology for building a persistent store for XML. The defining model for this approach to database operations is persistence of application objects -- in this case DOM objects.

Despite DOM access and persistent DOM implementations, some XML developers wanted other solutions for accessing XML data. The XPath Working Group defined an approach that includes path expressions and functions. XML Query (XQuery) and XPath use the same expression language, but the W3C Query Working Group took query a step further. It used a formal approach by defining a data model and formal query algebra as the basis for creating XQuery. The XML Query Algebra uses a simple type system and supports query optimization. It is statically typed, which supports compile-time type checking. It includes operations such as projection, iteration, selection, and join and it supports recursive types and recursive operations.

The query algebra and new XQuery language will probably be widely-supported in DBMSs and other software products, much like the support SQL enjoyed after it became an international standard. Database developers can take comfort from knowing the W3C XML Query Working Group included industry veterans who had developed SQL, XQL, Quilt, XML-QL, and other query languages, obviously a group that understands data integrity, query optimization, and other requirements.

For more information about XQuery, you can attend XML DevCon 2001 in New York City. Paul Cotton (XML Query Chair) is presenting "Querying XML Documents" on April 9, Jonathan Robie of the Query WG is doing an XQuery tutorial on April 8, and the Query Working Group will conduct a workshop on April 10.

You can also use the URLs below if you are interested in reading W3C specifications.

About the Author

Ken North is a consultant, software developer, author and speaker. He teaches Expert Series Seminars, is the XML Channel Editor for Dr. Dobb's web site, and has written for a wide variety of publications, including Web Techniques and Internet Computing. Ken is also the author of Database Magic with Ken North (Prentice Hall).

For More Information

Dig Deeper on Oracle XML

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.