CSE5335: Web Data Management and XML

Class:TuTh 5:30-6:50pm
Instructor:Leonidas Fegaras
Office:ERB 653 (Engineering Research Bldg)
Office hours:Tuesday and Thursday 3:30-5:30pm

XML has become an important standard for data representation and information exchange among Internet co-operative applications. This course provides an in depth study of the area of web data management with an emphasis on XML standards and technologies. The course primarily covers the state of the art in designing and building web applications and services, primarily focusing on issues and challenges that revolve around the management and processing of XML data.

Prerequisites: CSE 3330/CSE 5330 (Database Systems I) or equivalent. Students are expected to have a working knowledge of Java, SQL, and basic HTML. Students without adequate preparation are at substantial risk of failing this course.

Reading material:
There is no required textbook but you are expected to read many online tutorials and references (links will be given out in class).

Optional Reading:
Although not required, you may find the following books useful for additional background and explanation (listed in order of relevance to this course):

The final grade will be based on Final grades will be assigned according to the following scale:
     A: score >= 90, B: 80 <= score < 90, C: score < 80
Sometimes, I use lower cutoff points, depending on the overall performance of the class.

Both exams are open notes (all notes must be securely bound in one notebook). The final exam will cover the material from the first lecture up to and including the last lecture. Once the exam grades are posted, you will have 10 business days to dispute your grade and get your exam re-evaluated. No re-evaluation will be entertained after the 10 day period. No makeup exams will be given unless there is a justifiable reason (such as illness, sickness or death in the family). If you miss an exam and you can prove that your reason is justifiable, you should arrange with the instructor to take the makeup exam within a week from the regular exam time. For any other case, you will get a zero grade for the missed exam.

Programming Assignments:
There will be ten small weekly programming assignments. Each project will be done individually. Details will be given out in class. Late project will be marked 20%-off per day. No further extensions will be allowed. No excuses, no exceptions.

Most projects will be done in Java (using JDK 6) but some will be done in JavaScript, PHP, and XQuery. Students are expected to have a working knowledge of Java, SQL, and basic HTML. The software used for the projects is open-source, free, platform-independent, and well-suited for Java. You can do most of the projects on your own PC/laptop under any platform (Linux, MAC OS X, MS Windows, etc). Directions of how to download the required software will be given out in class.

Note that, although we will briefly talk about it, we will not use Microsoft ASP.NET (Visual Studio, C#, etc), since this framework is platform-dependent (for IIS only).

All work in this class must be done individually. No copying is permitted. Cheating involves giving assistance to or receiving assistance from other students or from other individuals, copying material from the web, etc. I strictly adhere to the University of Texas at Arlington rules and guidelines for handling violations of academic dishonesty. Please refer to the pamphlet "CHEATING: Definitions and Consequences" for additional information. You are required to sign and return the statement about academic dishonesty. If any one is caught for cheating, or indulge in plagiarism or collusion on a programming assignment or on a exam, the grade for the entire course will be an automatic Fail grade (F).

How to do Well in this Course:
Students who get the most out of this course will be the ones who put in the most effort. If you want to do well, attend all the lectures, read the assigned reading material, and start early on your programming assignments. If you are having difficulty, the instructor and the GTA will be more than happy to help you. In addition to regular office hours, the best way of communication with the instructor or the GTA is through email. If you can't make it to the scheduled office hours but really need help, contact one of us for an appointment.

Special Accommodations:
If you require an accommodation based on disability, I would like to meet with you in the privacy of my office, during the first week of the semester, to make sure you are appropriately accommodated.

Web Page:
Please visit this web page often; it will contain the reading assignments, project description, class notes, etc.
Other related web pages:

Tentative Schedule:
  1. Introduction and motivation
    1. XML basics
    2. XPath
  2. Web application development
    1. Dynamic web pages, the HTTP protocol, RESTful web services
    2. HTML forms
    3. Client-side programming (JavaScript)
    4. XHTML and CSS stylesheets
    5. The document object model (DOM) and dynamic HTML
    6. Asynchronous server requests (AJAX), XmlHttpRequest
    7. Web mashups in JavaScript
    8. Server-side programming: PHP scripts, cookies and sessions
    9. Servlets, Java Server Pages (JSP)
    10. Database connectivity, JDBC
  3. Cloud computing
    1. Distributed file systems (HDFS, Cassandra)
    2. The Map-Reduce framework (Hadoop, Hive, Pig)
    3. Amazon Web services and Elastic Compute Cloud (EC2)
  4. XML standards
    1. DTD
    2. XML Schema
    3. XPath
    4. XML programming (DOM, SAX, StAX)
    5. XSLT
    6. XQuery
    7. Java/XML data binding (JAXB)
  5. XML data modeling
  6. Native XML storage management
    1. Indexing techniques
    2. Xindice and Berkeley DB XML
  7. Relational databases and XML
    1. XML shredding
    2. XML publishing
    3. XML on commercial databases (Oracle XML DB, SQL Server SQLXML)
  8. XML data management
    1. Query processing
    2. Query optimization
    3. Updates
    4. View maintenance
    5. Integrity constraints
    6. Compression
  9. XML search engines
    1. Information retrieval
    2. Web search engines
    3. XML ranking
  10. Web services
    1. RESTful vs SOAP-based web services
    2. Standards: SOAP, WSDL, UDDI
    3. Axis and JAX-WS
  11. Special topics
    1. Data integration
    2. Web Mashups (Yahoo Pipes)
    3. Metadata management with RDF
    4. Semantic Web

Last modified: 01/09/11 by Leonidas Fegaras