VOODOO

VOODOO stands for "a Visual Object-Oriented Database language for ODMG OQL". It is a visual query formulation tool for OQL. Our design goal was to develop a visual language that can capture most OQL queries and at the same time is uniform and very simple to learn and use. Since query nesting is an important feature of any OODB query language, we designed our language in such a way that the composition of complex queries from simple ones is natural and intuitive.

A VOODOO query has a tree form that reflects the structure of the database schema, where every class or type reference in the schema is expanded, potentially leading to an infinite tree. The root of the tree is the persistent root and each tree node represents a class or a structure in the database schema. The tree nodes, which are called templates, are expanded on demand, so that only the minimal information is displayed. A query is formulated by mapping templates into values in some domain using a number of buttons and menus inside the template layout. The value of a template depends on the values of its children templates in the query tree as well as on the settings of its buttons and menus. The structure of the database, which is reflected in the query tree, determines the methods for accessing the database to retrieve data (i.e. it specifies which data paths to follow and which collections to traverse) and it is specified from the root to the tree leaves. The actual data manipulation and the formulation of the query result is specified from the tree leaves to the root by mapping each template to some value, which in turn is passed to the parent of the template to form the value of the parent template. The template mappings as well as the composition of the template values are formed in a pure functional way. Since new value domains are formed in each template, type information about these mapped values is very important to the user. There is sufficient type information displayed by the system that guides every stage of the query construction. In addition, VOODOO uses type information to reject invalid queries, so that queries constructed in our system are always type correct.


The VOODOO interface is invoked by executing:

./voodoo
First, a popup window is displayed to specify the location of the schema used in the VOODOO queries:

The host name can be either an IP number or a name, such as functor.uta.edu. Here the server is the same as the client (the local machine). Each server may export many schemas, but each schema must have its unique port number. If the connection succeeds, the main VOODOO interface is displayed. The following image shows a snapshot of the VOODOO interface after the complete drawing of a query:

There are two main windows in the interface: the VOODOO query formulation canvas and the OQL query window. After a user completes the drawing of a VOODOO query on the canvas, the OQL textual form of the query can be constructed and displayed on the upper half of the OQL query window by pushing the button 'generate OQL'. The query can then be sent to a remote interpreter and evaluated by pushing the button 'evaluate'. The results of the evaluation are displayed on the lower half of the OQL query window. The button 'clear' clears the OQL query window. The menu item 'trace' indicates that all optimization phases will be displayed along with the results. The menu 'level=2' indicates how many levels deep the printer will print the nested collections and objects. Collections at a deeper level are printed as C(#n), where C is the collection name and n is the number of elements, while objects at a deeper level will contain the key values only.


Examples of Queries

In this section we present eight queries to show the basic features of our visual query language. For our examples, we use the School ODL schema, which describes a University database.

Our query formulation interface consists of a drawing canvas that contains a number of rectangles, called templates, connected by lines. For example, the following query

finds the name of the department whose head is Smith. This query can be expressed in OQL as follows:

select name: d.name
from d in Departments
where d.head.name = "Smith"
Templates are organized in a tree form. The root of the tree is the persistent root, which is an aggregation (an ODL struct) of all persistent objects in the database, including all the class extents. When VOODOO starts, the only template displayed on the canvas is the persistent root template. The large buttons inside some templates represent template attributes. They create new templates when they are pushed. For example, the persistent root template contains four attributes: Persons, Instructors, Departments, and Courses, which represent the class extents of our University database.

Initially, when its button has not been pushed, the type of an attribute is the type defined in the database schema. This is called the original attribute type of the button, to be distinguished from the derived attribute type, which depends on the types of the children templates of this button, as we will see next. For example, the original type of the Department attribute in the Persistent Root template is set<Department>. When a button is pushed, it creates a template that reflects the structure of its original attribute type. For example, if we push the Departments button we create the Department template, which is displayed next to the Persistent Root template and is connected to the Departments button by a line. This new template contains the attributes of the class Department. When the head button in the Department template is pushed, an Instructor template is created (since the original type of the head is Instructor). Finally, when the name button in the Instructor template is pushed, another template for the string type is constructed (since the original type of the Instructor name is string). This template is similar to the other templates but has no attributes.

A template, which is generated by pushing a button, specifies a function that maps the original attribute type of the button into some domain. The derived attribute type of the button is equal to this domain type, which may be different from the original attribute type. The buttons and menus inside a template are used in specifying this mapped value of the template. The P check-button at the left of an attribute button indicates whether this attribute should be considered in the mapped value (when P is on) or not (when P is off). (The G button is for grouping-by and will be explained later.) The mapped value of the template is an OQL struct that contains all the P-checked attributes in the template. If this struct contains one element only, the struct is dropped and the mapped value becomes the element itself.

At the top left side of the Department template there is a menu, called an accumulator, which indicates how to accumulate the selected attributes to form the mapped value. For each collection type in the original attribute type, there is one accumulator next to the template name that indicates how to handle this collection. The Instructor and string templates in our example do not have accumulators because their original types were not collections. The Department template though has one accumulator since the original type was set<Department>. The Department accumulator indicates that we form a bag out of the selected values. Thus the mapped value of the Department template has the type bag<string>, which also indicates that the derived attribute type of the button Departments in Persistent Root is also bag<string>, which in turn is the type of the query result.

The cond entry window in a template indicates a condition to be satisfied by the mapped value of this template. The syntax of the condition is similar to that of the Microsoft Access Query-by-Example interface. For example, if the mapped value is an integer, then one valid condition is: >100 and <200, which indicates that the integer value must be between 100 and 200. In our example, the string template indicates that the string must be equal to "Smith". If no attribute is selected and the cond is not empty, then the template is mapped into a boolean, which is the result of the condition. If the condition is empty too, the derived type is equal to the original type. For example, the string template is mapped into a boolean, which indicates that the Instructor name has now a boolean attribute type. Since the Instructor name is P-checked, the Instructor template is mapped into a boolean too. If the derived type of an attribute is a boolean, then this boolean is used as a condition to be satisfied by the template. This means that, since the Department head attribute is of type boolean, it is used as a condition, namely in d.head.name="Smith" for every department d.

To summarize, it took us five button clicks and the typing of the word "Smith" to construct our first query. In fact most OQL queries can be constructed using a small number of button clicks and the unavoidable typing of some constant values.

Type information is very important when constructing queries in our framework. To simplify this process, our interface displays all the necessary type information and, more importantly, it restricts any type inconsistent query construction. For example, if the mouse cursor is positioned on top of a button, a small label is displayed on the canvas with the derived type of the attribute associated with the button. A similar label is displayed when the mouse cursor is positioned on top of the template header. This label contains the derived type of the button associated with the template. Guiding labels with usage help are also displayed when the mouse cursor is on top of labels and buttons. There is also support for tree editing functionality, which allows us to display queries nicely. For example, if we click and drag the template from its header using the left mouse button, we can move the template into a new position on the canvas, dragging its connected lines along with it. If we do the same with the middle mouse button, we drag the template along with all its descendent templates. If we click a template header using the right mouse button, then this template, all its descendent templates, and all their connecting lines are removed from the canvas.

A more challenging example of a query construction is the following query

It finds the names and addresses of all instructors in the CSE department who earn more than 10K. This query can be expressed in OQL as follows:

select name: e.name, address: e.address
from d in Departments,
     e in d.instructors
where d.name = "CSE"
and e.salary > 10000
The new construction in this query is the accumulator "each" in the Instructor template, which stands for each single collection element. It basically captures the nested relational algebra operator, unnest. The derived type of the Instructor template is struct( name: string, address: struct( street: string, zipcode: string ) ), which is also the derived type of the instructors attribute in Department. Thus, the result type of the query is:
bag<struct( name: string, address: struct( street: string, zipcode: string ) )>

There are many other types of accumulators. For instance, the following query

uses the accumulators "all" and "some" to express universal and existential quantifications. This query can be expressed in OQL as follows:

select e.name
from e in Instructors
where for all  c in e.teaches:
          exists d in c.has_prerequisites: d.name = "CSE5331"
The selected attribute of the rightmost Course template has a boolean derived type, and this template uses "some" to accumulate the boolean values. When function "some" is applied on a set of booleans, it returns true if at least one boolean is true. That is, if there is at least one course with name CSE5331, then the mapped value of this template is true. Similarly, the only attribute in the left Course template has a boolean derived type and the "all" accumulator indicates that if all these booleans are true then the mapped value of this template is true too. Other kinds of accumulators include aggregations, such as summation, product, maximum, minimum, and average, and lists specified as an order-by accumulation (described next). For example, if we change the accumulator of Instructor to "sum" (summation) in the previous figure and we P-check the salary instead of the name, then the mapped value of this template becomes an integer, namely the sum of the salaries of all instructors, provided that all the courses these instructors teach have CSE5331 as a prerequisite.

If we select "list" as an accumulator, then a new button S is displayed next to every attribute in the template (which disappears when the accumulator changes to something else). If an S button is pushed, then the corresponding attribute is chosen for sorting. For example, the following query

sorts departments by the number of full professors. It corresponds to the OQL query:

select name: d.name, instructors: count(select  * from e in d.instructors
                                        where e.rank = "Full Professor")
from d in Departments
order by count(select  * from e in d.instructors
               where e.rank = "Full Professor")
Here, the instructors attribute of Department is mapped to the number of the instructors in the Department (since the accumulator is "count").

The group-by construct is syntactic sugar in OQL; it can be easily (and better) expressed using nested queries. The same is true for our query language. Even though the group-by construct is non-functional and often confusing, we decided to supported it directly in our framework in order to be compliant with legacy query languages. We will present later an alternative way of performing a group-by. For example, the following query

groups all instructors by their department name and by the number of CSE courses they teach. It corresponds to the OQL query:

select salary: max(e.salary), dept: e.dept.name
from e in Instructors
group by dept: e.dept.name,
      teaches: count(select  *
                     from c in e.teaches
                     where c.offered_by.name = "CSE")
The G buttons appear only on templates that correspond to collection original types (i.e. templates that have at least one accumulator). They indicate the attributes to group by. When at least one G button is pushed, then the template goes to a group-by mode. In that mode, the original attribute type of each button that has not been G-checked becomes a set of the original type. The original types of the G-checked buttons remain the same. In our query, the original type of dept remains Department, for teaches remains set<Course>, while for ssn changes to set<long>, for degrees changes to set<set<string>>, etc. Each non G-checked button corresponds to a group of values of the corresponding attribute constructed when we group by the G-checked attributes. There is also a new attribute added to the template, called partition, of type set<Instructor>, which contains all the groups. It corresponds to the partition variable of the OQL group-by construct and is used for more complex aggregations after the group by.

Arbitrary join predicates between template attributes can be constructed using join lines on the canvas. The following query

for example, finds all courses taught by the heads of the offering departments, and has the following OQL form:

select c.name
from c in Courses
where c.taught_by = c.offered_by.head
The join line in the above figure is constructed by clicking one of the participating buttons (Course.taught_by or Department.head) and dragging it and release it on top of the other button. Joins are permitted if at least one path to the closest common ancestor from the two participating attributes does not include a template with an accumulator. The join line is red in the beginning and becomes green only when the participating attributes have the same derived type and satisfy the restriction mentioned above. Our system does not permit invalid joins. The equal sign in the middle of the join line is a menu that allows the selection of a comparison relation.

The last query displayed below

is an alternative way of performing a group-by using nested queries. It represents the OQL query:

select dept: e.dept.name,
       Instructors: max(select  s.salary
                        from s in Instructors
                        where e.dept.name = s.dept.name)
from e in Instructors
which is equivalent to:
select dept: e.dept.name, Instructors: max(e.salary)
from e in Instructors
group by e.dept.name
Because of the join restriction mentioned earlier, we copied the Instructors attribute from the Persistent Root template into the Instructor template. This was done by clicking the middle mouse button on the Instructors button of Persistent Root and dragging it and release it on top of the Instructor template. We can copy any attribute in a template into any template on the canvas as long as the destination template is a descendent (or it is the template itself) of the source template. This reflects the variable scoping of OQL where variable names in outer queries are visible in inner queries but are not visible in sibling queries. Note that when an attribute is copied, its template is not copied. Note also that if we click the middle mouse over an attribute and release it immediately, this attribute is copied inside the same template. That way we can map a template attribute in multiple ways by copying it inside the template and by attaching different children templates to it. We can copy the same attribute multiple times; the system renames each duplicate attribute name, say ssn, by appending a number of quotes, e.g. ssn''.

Limitations

Even though our framework can capture most OQL queries, there are some serious limitations that we hope to address in a future research. The most important limitations are listed below:

In addition, our visual queries can only capture OQL queries whose from clauses have domains that are path expressions, not complex queries. We do not consider this as a serious limitation because earlier work shown that most queries with this type of nesting can be normalized into queries over path expressions.


Last modified: 1/17/00 by Leonidas Fegaras