This project must be done individually. No copying is permitted. The purpose of this project is to build an XML search engine but for local XML documents only.
You will do this project on your own PC/laptop. First download project9.tar or project9.zip and extract the files. It contains the jar file je.jar, which is the Berkeley DB Java Edition. This system is a storage manager, not a relational database. It only supports B+-trees and transactions.
Look at the method SearchDictionary.dump() in Dictionary.java to see how you can access data from the indexes. You need to read the Java doc about the classes Iterator and Map. The classes ElementHit and TermHit are like the inverted indexes E-index and T-index in the class slides. You don't need to read the Berkeley DB documentation, but if you are interested, it is available here.
You are provided with a program project9/Dictionary.java that builds two inverted indexes using B+-trees (on Berkeley DB), one for text terms (words) and one for XML elements. You compile them using javac -cp je.jar *.java For example, if you want to index a number of XML files, you use:
java -cp .:je.jar DocumentIndexer file1.xml file2.xml ...(for Windows, you need to use semicolon instead of colon, that is -cp .;je.jar). Note that, the inverted indexes are persistent, so you would only need to index each file once (if you index a file twice, the file and all its terms will appear twice under a different document number). To remove the database and start with empty indexes, remove all the files from the directory project9/database/. To print the content of the database, use:
java -cp .:je.jar Dump
You need to modify the file project9/MyCode.java to run one simple XPath query against the indexed XML document cs.xml:
//gradstudent[name/lastname = lname]where lname is a lastname (a word) that you provide as an argument to the main program. Your program finds and prints all the hits (graduatestudents) that contain the word lname at the location //gradstudent/name/lastname. (You print the hits in the same way they are printed in dump().) For example, if you run:
java -cp .:je.jar Query Basneyit should print in the output:
You should not use any data structure, such as vectors, stacks, etc. Instead, you should write nested loops that combine the results of the indexes using containment conditions.
Use the form below to submit the file MyCode.java:
Last modified: 04/22/08 by Leonidas Fegaras