Problems in XML data management

 

(Research Seminar, October 10th, 2002)

Mary Fernandez
AT&T Labs, Research

Abstract:

XML is a flexible data format that has rapidly become the lingua franca of data exchange between inter-enterprise applications on the Internet.  Hundreds of application- and industry-specific XML dialects already exist.  Bioinformatics data, financial products, legal documents, medical transcripts, and electronic-commerce transactions are some examples of the diverse kinds of data that are exchanged in XML.  Because of XML's rapid adoption, however, data-management tools for XML are still sparse and immature. 

In this talk, we will consider three problems in XML data management: accessing XML data via programmatic and query interfaces; publishing legacy (non-XML) data in XML; and storing XML data in legacy storage systems.  We will briefly survey commercial and research solutions to each of these problems, then focus on the problem of publishing relational data in XML.  We will describe our own solution: SilkRoute, a general, selective, and efficient architecture for viewing and querying relational data in XML.  Lastly, we will identify some of the interesting research problems in XML data management.

This talk is based on a full-day tutorial given at WWW 2002. SilkRoute is joint work with Atsuyuki Morishima (Univ. of Tsukuba), Yana Kadiyska and Dan Suciu (Univ. of Washington), and Wang-Chiew Tan (U.C. Santa Cruz).