XStreamCast: Broadcasting and Query Processing of Streamed XML Data

Award Number: IIS-0307460 (01/01/04 - 12/31/06)

Leonidas Fegaras, PI
David Levine, co-PI
Department of Computer Science and Engineering
The University of Texas at Arlington
416 Yates Street, 301 Nedderman Hall
P.O. Box 19015
Arlington, TX 76019
Phone: (817) 272-3629,  Fax: (817) 272-3784
E-mail: {fegaras,levine}@cse.uta.edu
URL: http://lambda.uta.edu/

Keywords:

Streamed XML, XQuery, Query Optimization, Query Processing

Project Summary:

The XStreamCast project addresses the efficient processing of XQueries over continuous streams in which a server broadcasts XML data to multiple clients concurrently and a client may tune-in to multiple streams at the same time. The goal of this project is to develop a framework that improves query throughput and response time on all clients, taking full advantage of their limited resources. Under this framework, the unit of transmission in an XML stream is an XML fragment, which corresponds to one or a few XML elements from the transmitted document. A server may choose to disseminate XML fragments from multiple documents in the same stream, can repeat some fragments when they are critical or in high demand, can replace them by sending delta changes, and can delete invalid ones. The client architecture is based on an optimization framework for XQuery that utilizes many efficient evaluation algorithms for processing XML data streams under memory and processing power constraints. The end goal is the construction of a complete prototype system based on the theoretical framework, which is expected to drastically improve performance when compared with other stream processing systems. The algorithms and the software resulting from this project will have a broader impact on a wide range of applications, especially in electronic commerce, since they will improve the way services are provided to clients. They will also reduce network traffic between servers and clients and will give the ability to small service providers and businesses to serve a larger number of clients using less powerful server computers and lower cost networking.

Publications and Products:

There is nothing to report yet, since this is a new award, but there are some related papers:

Project Impact:

When completed, this project will make the following contributions:
This research work will have a broader impact on a wide range of applications, especially in electronic commerce, since it will improve the way services are provided to clients.  It will also reduce network traffic between servers and clients and will give the ability to small service providers and businesses to serve a larger number of clients (such as PDAs) using less powerful server computers and lower cost networking.

Goals, Objectives, and Targeted Activities:

We intend to demonstrate the effectiveness of our proposed work through the construction of a complete querying system for continuous XML streams. As the main application domain for our system, we chose network traffic management, in which network routers multicast packet traces in XML form while clients evaluate continuous XQueries to monitor the traffic. The quality of our push-based dissemination system will be evaluated based on the capacity of our client engines to process continuous streams of XML data without overrunning their buffers. The evaluation will be carried out for multiple servers broadcasting XML data and for various client buffer sizes. As a measurement of the effectiveness of our algebraic optimizations on XQuery, we will evaluate the performance gain of each optimization using real network packet traces. After the first prototypes become available, we will fine-tune the implementation and evaluate the performance of the entire prototype system. Finally, we will evaluate the effectiveness of our system by comparing it against the NiagaraCQ system.

Area Background:

Our goals are similar to those of the Niagara Continuous Query Processing project, NiagaraCQ [7]. A recent work by this group proposed the use of punctuations for handling blocking operators [13]. There are already many proposals for algebras on semi-structured and XML data, including an algebra based on structural recursion [4], YATL [6], SAL [5], x-algebra [9], and TAX [11]. Recently, histograms have been used to provide approximate answers to aggregate queries [3]. This research is focused on generating optimal histograms to provide approximate answers to aggregate queries. Another approach is to aggregate inside a landmark or a sliding window [10]. The former is aggregation between a landmark point and the current point, while the latter is aggregation over a sliding window with a fixed width. The randomized technique in [8] uses histograms to improve accuracy (called synopses or sketches). Generating histograms for streaming data is also being pursued with relative rigor [12].

Area References:

  1. B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and Issues in Data Stream Systems. In PODS 2002.
  2. S. Babu and J. Widom. Continuous Queries Over Data Streams. SIGMOD Record, 30(3):109-120, September 2001.
  3. S. Acharya, P. B. Gibbons, V. Poosala, and S. Ramaswamy. Join Synopses for Approximate Query Answering. In SIGMOD 1999.
  4. P. Buneman, S. Davidson, G. Hillebrand, and D. Suciu. A Query Language and Optimization Techniques for Unstructured Data. In SIGMOD 1996.
  5. C. Beeri and Y. Tzaban. SAL: An Algebra for Semistructured Data and XML. In WebDB 1999, pp 37-42.
  6. V. Christophides, S. Cluet, and J. Simeon. On Wrapping Query Languages and Efficient XML Integration. In SIGMOD 2000.
  7. J. Chen, D. DeWitt, F. Tian, and Y. Wang. NiagaraCQ: A Scalable Continuous Query System for Internet Databases. In SIGMOD 2000.
  8. A. Dobra, M. N. Garofalakis, J. Gehrke, and R. Rastogi. Processing Complex Aggregate Queries over Data Streams. In SIGMOD 2002.
  9. M. Fernandez, J. Simeon, and P. Wadler. An Algebra for XML Query. In FST TCS 2000.
  10. J. Gehrke, F. Korn, and D. Srivastava. On Computing Correlated Aggregates Over Continual Data Streams. In SIGMOD 2001, pp 13-24.
  11. H. V. Jagadish, Laks V. S. Lakshmanan, Divesh Srivastava, and Keith Thompson. TAX: A Tree Algebra for XML. In DBPL 2001, pp 149-164.
  12. G. Manku and R. Motwani. Approximate Frequency Counts over Data Streams. In VLDB 2002.
  13. P. Tucker, D. Maier, T. Sheard, and L. Fegaras. Online Analysis and Querying of Continous Data Streams. IEEE Transactions on Knowledge and Data Engineering, May-June 2002.

Potential Related Projects:

Project Web site URL:

http://lambda.uta.edu/XStreamCast/

Online software and resources:

There is nothing to report yet, since this is a new award.