XStreamCast:
Broadcasting and Query Processing of Streamed XML Data
Award Number: IIS-0307460
(01/01/04 - 12/31/06)
Leonidas Fegaras, PI
David Levine, co-PI
Department of Computer Science and Engineering
The University of Texas at Arlington
416 Yates Street, 301 Nedderman Hall
P.O. Box 19015
Arlington, TX 76019
Phone: (817) 272-3629, Fax: (817) 272-3784
E-mail: {fegaras,levine}@cse.uta.edu
URL: http://lambda.uta.edu/
Keywords:
Streamed XML, XQuery, Query Optimization, Query Processing
Project Summary:
The XStreamCast project addresses the efficient
processing of XQueries over continuous streams in which a server broadcasts
XML data to multiple clients concurrently and a client may tune-in to multiple
streams at the same time. The goal of this project is to develop a framework
that improves query throughput and response time on all clients, taking full
advantage of their limited resources. Under this framework, the unit of transmission
in an XML stream is an XML fragment, which corresponds to one or a few XML
elements from the transmitted document. A server may choose to disseminate
XML fragments from multiple documents in the same stream, can repeat some
fragments when they are critical or in high demand, can replace them by sending
delta changes, and can delete invalid ones. The client architecture is based
on an optimization framework for XQuery that utilizes many efficient evaluation
algorithms for processing XML data streams under memory and processing power
constraints. The end goal is the construction of a complete prototype system
based on the theoretical framework, which is expected to drastically improve
performance when compared with other stream processing systems. The algorithms
and the software resulting from this project will have a broader impact on
a wide range of applications, especially in electronic commerce, since they
will improve the way services are provided to clients. They will also reduce
network traffic between servers and clients and will give the ability to small
service providers and businesses to serve a larger number of clients using
less powerful server computers and lower cost networking.
Publications and Products:
There is nothing to report yet, since this is a new award, but there
are some related papers:
- Sujoe Bose, Leonidas Fegaras, David Levine,
Vamsi Chaluvadi. A Query Algebra for Fragmented XML
Stream Data.. 9th International Workshop on Data Base Programming Languages
(DBPL) Potsdam, Germany, September 6 - 8, 2003. (ps.gz
ps pdf)
- Leonidas Fegaras, David Levine, Sujoe Bose,
and Vamsi Chaluvadi. Query Processing of Streamed XML
Data. Query Processing of Streamed XML Data. 11th International Conference
on Information and Knowledge Management (CIKM'2002). McLean, VA, November
2002. (ps.gz ps pdf)
Project Impact:
When completed, this project will make the following contributions:
- The development of a formal foundation for
broadcasting and continuous query processing of XML data streams. The focus
of this framework is high-performance query processing for a wide range of
network bandwidth, memory capacities, and processing power, measured by
query throughput and response time.
- The design of a server architecture that
is able to break XML data into manageable fragments and to broadcast them
using a scheduling algorithm that repeats data based on client needs and
limitations. The server architecture is also able to maintain, disseminate,
and broadcast condensed summaries of the broadcast data to help clients take
full advantage of their limited resources.
- The development of many efficient evaluation
algorithms for processing XML data streams under memory and processing power
constraints.
- The development of a comprehensive approach
to optimizing XQueries that utilizes our main-memory evaluation algorithms.
- The construction of a complete prototype
system based on the theoretical framework, which is expected to drastically
improve performance when compared with other stream processing systems.
This research work will have a broader impact on a wide range of
applications, especially in electronic commerce, since it will improve the
way services are provided to clients. It will also reduce network traffic
between servers and clients and will give the ability to small service providers
and businesses to serve a larger number of clients (such as PDAs) using less
powerful server computers and lower cost networking.
Goals, Objectives, and Targeted Activities:
We intend to demonstrate the effectiveness
of our proposed work through the construction of a complete querying system
for continuous XML streams. As the main application domain for our system,
we chose network traffic management,
in which network routers multicast packet traces in XML form while clients
evaluate continuous XQueries to monitor the traffic. The quality of our
push-based dissemination system will be evaluated based on the capacity of
our client engines to process continuous streams of XML data without overrunning
their buffers. The evaluation will be carried out for multiple servers broadcasting
XML data and for various client buffer sizes. As a measurement of the effectiveness
of our algebraic optimizations on XQuery, we will evaluate the performance
gain of each optimization using real network packet traces. After the first
prototypes become available, we will fine-tune the implementation and evaluate
the performance of the entire prototype system. Finally, we will evaluate
the effectiveness of our system by comparing it against the NiagaraCQ system.
Area Background:
Our goals are similar to those of the Niagara Continuous Query Processing
project, NiagaraCQ [7]. A recent work by this group proposed the use of punctuations
for handling blocking operators [13]. There are already many proposals for
algebras on semi-structured and XML data, including an algebra based on structural
recursion [4], YATL [6], SAL [5], x-algebra [9], and TAX [11]. Recently,
histograms have been used to provide approximate answers to aggregate queries
[3]. This research is focused on generating optimal histograms to provide
approximate answers to aggregate queries. Another approach is to aggregate
inside a landmark or a sliding window [10]. The former is aggregation between
a landmark point and the current point, while the latter is aggregation over
a sliding window with a fixed width. The randomized technique in [8] uses
histograms to improve accuracy (called synopses or sketches). Generating
histograms for streaming data is also being pursued with relative rigor [12].
Area References:
- B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models
and Issues in Data Stream Systems. In PODS 2002.
- S. Babu and J. Widom. Continuous Queries
Over Data Streams. SIGMOD Record, 30(3):109-120, September 2001.
- S. Acharya, P. B. Gibbons, V. Poosala, and
S. Ramaswamy. Join Synopses for Approximate Query Answering. In SIGMOD 1999.
- P. Buneman, S. Davidson, G. Hillebrand, and
D. Suciu. A Query Language and Optimization Techniques for Unstructured Data.
In SIGMOD 1996.
- C. Beeri and Y. Tzaban. SAL: An Algebra for
Semistructured Data and XML. In WebDB 1999, pp 37-42.
- V. Christophides, S. Cluet, and J. Simeon.
On Wrapping Query Languages and Efficient XML Integration. In SIGMOD 2000.
- J. Chen, D. DeWitt, F. Tian, and Y. Wang.
NiagaraCQ: A Scalable Continuous Query System for Internet Databases. In SIGMOD
2000.
- A. Dobra, M. N. Garofalakis, J. Gehrke, and
R. Rastogi. Processing Complex Aggregate Queries over Data Streams. In SIGMOD
2002.
- M. Fernandez, J. Simeon, and P. Wadler. An
Algebra for XML Query. In FST TCS 2000.
- J. Gehrke, F. Korn, and D. Srivastava. On
Computing Correlated Aggregates Over Continual Data Streams. In SIGMOD 2001,
pp 13-24.
- H. V. Jagadish, Laks V. S. Lakshmanan, Divesh
Srivastava, and Keith Thompson. TAX: A Tree Algebra for XML. In DBPL 2001,
pp 149-164.
- G. Manku and R. Motwani. Approximate Frequency
Counts over Data Streams. In VLDB 2002.
- P. Tucker, D. Maier, T. Sheard, and L. Fegaras.
Online Analysis and Querying of Continous Data Streams. IEEE Transactions
on Knowledge and Data Engineering, May-June 2002.
Potential Related Projects:
- IIS-208852: Timber: A Native XML Database Management
System.
- IIS-100681: XML Query Optimization.
- IIS-140493: Containment, Equivalence, and Related
Problems for XPath Expressions.
- IIS-86002: ITR: A Petabyte in Your
Pocket.
- IIS-118173: Management and Processing of Data
Streams.
Project Web site URL:
http://lambda.uta.edu/XStreamCast/
Online software and resources:
There is nothing to report yet, since this is a new award.