lambda-DB: User's Guide

This guide assumes a basic knowledge of ODL and OQL of the ODMG standard. Readers not familiar with the ODMG standard should read a database textbook first, such as Chapter 12 in Elmasri and Navathe: Fundamentals of Database Systems, third edition, Addison-Wesley, 1999.

ODL Declarations

lambda-DB recognizes most ODMG ODL declarations. It does not support nested modules, arrays, dictionaries, and exceptions. Also, it does not allow persistent sets/bags of structs (but it allows lists of any type, persistent sets/bags of primitive types/references, and transient sets/bags/lists of any type). The ODL syntax in yacc form is given here. For example, the schema school.odl defines a School database, and the schema xml.odl captures XML data. ODL is extended with an import username:modulename; declaration to import another ODL module, such as:

import fegaras:XML;
Imported modules are inlined in the current module. When a schema is compiled, lambda-DB stores the specifications into the system catalog and creates extents and indexes for classes (one index per key).

Embedded OQL in C++

Data manipulation in lambda-DB is done using OQL embedded in C++. Embedded OQL queries are translated and optimized into C++ code at compile-time. There is also an OQL interpreter to evaluate OQL queries at run-time (described in the next section). The embedded OQL is seamlessly integrated with C++ programs, allowing C++ variables to be used in queries and query results to be passed back to C++ programs. This integration avoids the impedance mismatch problem found in most database languages.

The lambda-DB OQL compiler is actually a C++ preprocessor that parses a file with extension .oql (which is C++ with embedded OQL statements), say f.oql, and generates two C++ files: and f.tmp.h. The Makefile of the School database shows how to compile OQL files.

All queries are type checked, optimized, and compiled into C++ code during preprocessing. Therefore, all type errors are caught at compile time. The disadvantage of this approach is that, since optimization is done at compile time, the execution strategies may become suboptimal at run time due to updates or, worse, they may even become invalid if indexes are dropped. The obvious advantage of course is that the system is very fast. For the OQL file f.oql, the file f.tmp.h contains the necessary C++ functions to evaluate the queries. It also includes comments to explain the various stages of optimization. The contains the calls to the functions in file.tmp.h as well as the C++ code of file.oql.

The special character % indicates the beginning of OQL code in C++. Everything between the characters % and ; is OQL code. The only exception is the %for each statement in which the OQL code ends at the symbol do (details are given below). Since % is a special token in oql, the C++ modulo operator % has been redefined as the function mod.

The BNF of the syntax recognized by the C++ preprocessor oql is given here. The file test.oql contains many examples of embedded OQL queries. At the beginning of an .oql file the following two lines must be included:

#include <odmg_main.h>
%module module-name;
where module-name is the name of the module being imported in this file (currently, only one module can be imported). The first line includes the basic runtime library necessary to evaluate OQL queries while the second line generates include statements specific to this particular module. The main program should contain the following statement to initialize lambda-DB:
Before the main program exits, the following statement must be executed:
to cleanup the system from temporary files and indexes. All data manipulation should be performed inside a transaction. The statement %begin; starts a transaction while %commit; or %abort; commits or aborts a transaction.

The most important syntax construct in lambda-DB is iteration:

%for each v in query do C++code;
where v is a variable name, query is any OQL query that returns a collection, and C++code is any C++ code (which of course may contain other OQL statements). The above statement evaluates the query and, without materializing the returned collection in memory, it binds v to each element of the collection and executes the C++ code. For example, the query:
%for each x in select ssn: e.ssn, dept:, course:
                 from e in Instructors,
                      c in e.teaches
                where = "Smith"
            do cout << x.ssn << " " << x.dept << " " << x.course << endl;
prints the ssn, department name, and all course names of the instructor Smith. Of course one can also write the previous query as follows:
%for each e in Instructors do
     %for each c in e.teaches do
          if (strcmp(e->name,"Smith")==0)
             cout << e->ssn ...
but this form is not recommended because the optimizer will not be able to optimize it, leading to bad performance. As a rule of thump, do as much as possible using OQL, leaving only the printing or updating (described later) to C++.

Printing OQL results can be easily accomplished using the following syntax:

%print query [ max_depth ];
where max_depth indicates how many levels deep the printer will print the nested collections and objects. Collections at a deeper level are printed as C(#n), where C is the collection name and n is the number of elements. Objects at a deeper level will contain the key values only. If max_depth is ommitted, a depth of 2 is assumed. For example:
%for each v in select e.ssn, count(e.teaches)
                 from e in Instructors
                where = "D1"
                  and count(e.teaches) > 0
                order by count(e.teaches) desc
            do %print v;

Another useful language construct is:

%v := query;
which assigns the value of the query to the new variable v. This form is mostly used when query returns a non-collection type, such as an integer, an object, or a structure. For example, the query:
%n := count(select * from Instructors where age>30);
counts all instructors older than 30. When the query returns a collection, such as
%s := Instructors;
%for each v in s do cout << v->name << endl;
this collection is materialized in memory. On the other hand, %for each v in Instructors ... is processed in a stream-like fashion without materializing Instructors in memory. In addition, there are no statistics available and no indexes for the variable s, so the optimizer may produce a suboptimal plan.

A few words about variable scoping. oql follows the scoping rules of C++. If a variable is declared in oql, it is also recognized by C++, but not vise versa. To use a variable v in a query, v must be either declared in the imported module (eg. as a class extent) or declared in an oql statement, such as %for each v in ... or %v := .... For example,

%s := "Smith";
%e := element(select * from Instructors where name=s);
An alternative is to declare v explicitly using %v: ODL_type;, where ODL_type is a simple ODL type (such as class name). For example, %e: Instructor; will cause e to be recognized by both C++ and OQL as an instructor.

Object creation is done using a class constructor, as it is described in the ODMG standard. For example,

Department( name: "cse", dno: 10 )
will create a new transient department. If you want to create a persistent department, use:
persistent Department( name: "cse", dno: 10 )
This will create a new department, it will insert it into the department extent (Departments), and will update all the indexes involved. No explicit insertion into a class extent is provided or needed. "Persistent by reachability" is not supported because if an objects has been created as transient, the indexes will not be updated. If a persistent object refers to a transient object, the results may be unpredictable (a possible dangling pointer when the persistent object retrieved from storage again). The use of explicit SHORE SDL commands to create/delete/update objects and class extents is not recommended because such commands may cause an inconsistent database state since they may miss updating the indexes. To delete an object computed by the OQL query query from an extent class-extent use:
%class-extent -= query;
This destroys the object, removes the deleted object from its class extent, and updates the indexes. The object is not removed from relationships and object references though, so it may cause dangling pointers.

To update object attributes, the following syntax is supported:

query1.attribute := query2;
query1.attribute += query2;
query1.attribute -= query2;
where query1 and query2 are OQL queries and attribute is a class or structure attribute. The += command inserts a value/object in a collection while the -= command removes all values/objects from a collection equal to a value/object. For example:
%element(select * from Instructors where ssn=10).salary := 10000;
%for each v in Instructors do %v.salary := 0.2*v.salary;
%for each v in select e from e in Instructors where e.ssn=10
            do %v.degrees += "PhD";
%for each v in select * from Instructors where ssn=10
            do %v.teaches -= element(select * from Courses where code="C10");

OQL views can be defined using the syntax:

%define Id ( Id {, Id } ) as query;
as it is described in the ODMG standard. See examples/school/test.oql for some examples. All views are macroexpanded at compile time. Queries can also use methods or functions but their bodies are not opened during query optimization. The optimizer assumes that the cost of a method/function call is zero, which may not be true if it contains another OQL query. Functions, which are allowed to be recursive, are defined using the special syntax:
%define function_name ( [ Id : ODL_type {, Id : ODL_type } ] ) : ODL_type as query;
while methods (whose signature has been specified in an ODL class) can be given in C++ or using the special syntax:
%define class_name::method_name ( [ Id : ODL_type {, Id : ODL_type } ] ) : ODL_type as query;
To implement a transitive closure in lambda-DB, either a method or a function must be used. For example:
%define Course::all_prereqs () : set<Course> as
   select distinct a
      from p in this.has_prerequisites,
           a in p.all_prereqs();

Another useful syntactic construct is %collect statistics; which collects statistics from all extents and indexes in the imported module and updates the system catalog so that these statistics can be used by the optimizer. This is a very expensive operation and should be used only after a large number of updates to the database. The command %print statistics; prints various usage information (such as number of objects fetched from disk to memory) since the last time this command was invoked (or since the beginning of execution if this was the first time). The statement:

%create index index-name on class-name (attribute-names);
creates a new index with name index-name over the class class-name using the index key(s) listed in the attribute-names. If the extent of the class class-name is not empty, the execution of this statement will populate the index with the proper values at run-time.

Extensions to OQL

The OQL Interpreter

In addition to the OQL compiler, lambda-DB provides an OQL interpreter. The interpreter can evaluate any OQL query that does not contain any persistent object construction. It can not evaluate other DML commands (updates, inserts, etc). The interpreter can be used through the following class:

class plan {
  plan ( const char* query, short trace_level );
  void* evaluate ();
  void print_type ();
  void print_value ( void* value, short max_depth );
The plan constructor compiles, optimizes, and translates into intermediate code an OQL query expressed as a string. If trace_level is greater than zero, it prints tracing information during query optimization. The evaluate method evaluates the intermediate code associated with a plan. It returns NULL if there is an evaluation error. The print_type method prints the type of the query. The print_value method prints the value guided by the type of the query result stored in the plan. The parameter max_depth indicates how many levels deep the printer will print the nested collections and objects. Collections at a deeper level are printed as C(#n), where C is the collection name and n is the number of elements. Objects at a deeper level will contain the key values only.

The following is an example of a query interpretation during run-time:

plan p = plan("select from e in Instructors, c in e.teaches where = 'Smith'",1);
The OQL interpreter has a high run-time overhead due to query optimization at run-time and it has more than 5% overhead due to the interpretation of the intermediate code. It should only be used for remote accesses or for visual interfaces that require ad-hoc queries.

Last modified: 3/28/01 by Leonidas Fegaras