This project addresses the efficient processing of XQueries over continuous streams in which a server broadcasts XML data to multiple clients concurrently and a client may tune-in to multiple streams at the same time. The goal of this project is to develop a framework that improves query throughput and response time on all clients, taking full advantage of their limited resources. Under this framework, the unit of transmission in an XML stream is an XML fragment, which corresponds to one or a few XML elements from the transmitted document. A server may choose to disseminate XML fragments from multiple documents in the same stream, can repeat some fragments when they are critical or in high demand, can replace them when they change by sending delta changes, and can introduce new fragments or delete invalid ones.

One difficult problem in processing queries against continuous data streams is the presence of blocking operations, which require the processing of the entire stream before generating the first result, and the unbounded stateful operations, which require caching all stream data in memory. The main-memory evaluation algorithms used in this project make an effective use of condensed summaries of data to produce results earlier and flush buffers faster than conventional methods. These data summaries are broadcast by servers to clients along with the data. In this framework, XQueries over continuous XML streams are translated into a novel XML algebra, then optimized and mapped into evaluation plans based on the improved main-memory algorithms. The goal of optimization is to improve query throughput and response time under the limited resources of clients. The most important optimization technique addressed by this framework is query unnesting. Nested queries appear more often in XQueries than in relational queries, because XQuery allows complex expressions at any point in a query. Without query unnesting, nested queries require multiple passes through the stream of the inner query, which is unacceptable because it does not meet the performance objectives of stream processing.

When completed, this project will make the following contributions:

This research work will have a broader impact on a wide range of applications, especially in electronic commerce, since it will improve the way services are provided to clients. It will also reduce network traffic between servers and clients and will give the ability to small service providers and businesses to serve a larger number of clients (such as PDAs) using less powerful server computers and lower cost networking.