CSE5317/4305 Project


You should do this project alone. The course project is to construct a compiler for a small programming language, called PCAT. It will involve: lexical analysis, parsing, semantic analysis (type-checking), and code generation for a MIPS architecture. The project is to be completed in seven stages spaced throughout the term.

Survival Tips

Start working on programming assignments as soon as they are handed out. Do not wait till the day before the deadline. You will see that assignments take much more time when you work on them under pressure than when you are more relaxed. Remember that there is a severe penalty for late submissions. Design carefully before you code. Writing a well-designed piece of code is always easier than starting with some code that "almost works" and adding patches to make it "really work".


A major part of this project is to implement a full compiler for a subset of Pascal, called PCAT, designed by Andrew Tolmach at Portland State University. The paper that describes the language can be retrieved from http://lambda.uta.edu/cse5317/pcat04.pdf.

Platform and Tools

You can do your project either on your own PC or on omega at UTA. In either case, you have to use JDK 5 or 6 (Note: JDK 7 may not work).

If you do the project on omega, make sure that when you run 'java -version' on omega, you get 1.6.0_20.

To install the project on your own Linux/MacOS/Windows PC, you do:

You can learn more about Java from: Thinking in Java (by Bruce Eckel), Java Tutorial, API Specification.

To make coding easier in Java, you will use the Gen package to build abstract syntax trees and intermediate representation trees. You will also use a MIPS code simulator, called SPIM, to run the assembly code generated by your compiler.

Program Grading

Programs will be graded according to their correctness, style, and readability. Programs should behave as specified in the assignment handouts. Bad data should be handled gracefully; your program should never have run-time errors like dereferencing a null pointer or using an out-of-bounds index. Special cases should be handled correctly. Unnecessarily inefficient algorithms or constructs should be avoided; however, efficiency should never be pursued at the expense of clarity or simplicity. Programs should be well documented, modular, and flexible, i.e. easy to modify. Indentation should reflect program structure. Use meaningful identifiers. Avoid global variables and side effects as much as possible. You should never use side effects during the semantic actions of a parser. The grader should be able to understand the program without undue strain. I will provide some test programs, but these programs will not test your compiler exhaustively. It is your responsibility to test every statement in your program by some piece of test data. Thorough testing is essential to establish the reliability of your code. Don't even think about adding fancy features until the required work is completely debugged. A correctly working simple program is worth much more (both in this class and in actual practice) than a fancy program with bugs.


Project assignments must be done individually. No copying is permitted. Cheating involves giving assistance to or receiving assistance from other students or from other individuals, copying material from the web, etc. You are required to use the Gen package for tree construction and pattern matching. It will be taken as cheating if you use your own data structures or interface (since this would mean that you have copied the code from elsewhere). The punishment for cheating is a zero in the assignment and will be subject to the university's academic dishonesty policy. If you have any questions regarding an assignment, see the instructor or teaching assistant.


The projects and their due days are listed below. The due time of each project is the midnight of the indicated due day. You will hand-in your project source code electronically. You may hand-in your source files as many times as you want; only the last one will be taken into account. Details of what do you need to hand-in (and how) can be found by clicking on the appropriate project name. Projects will be marked 20 points off per day. So, there is no point submitting a project more than 4 days late! No further extensions will be allowed. This penalty cannot be waived, unless there was a case of illness or other substantial impediment beyond your control, with proof in documents from the school.

If you mess up a project phase, you can still do the next project phases by removing the appropriate file name from your pcat directory. That way, the missing classes will be copied from the Solution jar, rather than compiled from your source file. For example, if you messed up Project #4, then in Project #5 you can remove the file pcat/TypeCheck.gen, which will force the java compiler to get the typecheck classes from Solution.jar. Note that you can always go back and update your old projects, which is preferable from using the solution, because you will have a better control over your own programs. You can run the solution PCAT compiler over a test PCAT file, say tests/tsort.pcat, using the command:

solution 7 tsort
inside your project directory. This runs project #7 over tests/tsort.pcat using the solution jar. The goal of this project is to build a compiler for PCAT that behaves the same as the solution PCAT compiler.

Project Phases

  1. Project #1 (lexical analysis): Worth 6% of your project grade. You will implement the PCAT scanner using the JLex scanner generator. Study the JLex manual.

  2. Project #2 (parsing): Worth 14% of your project grade. You will use the CUP parser generator to implement the PCAT parser. Study the CUP manual.

  3. Project #3 (abstract syntax): Worth 14% of your project grade. You will define an abstract syntax tree (AST) type for PCAT and you will add semantic actions to the PCAT parser to generate ASTs.

  4. Project #4 (type-checking): Worth 18% of your project grade. You will implement the symbol tables and the type checking program for PCAT.

  5. Project #5 (simple IRs): Worth 18% of your project grade. You will add code to your parser to generate intermediate code (IR trees) for a subset of your ASTs.

  6. Project #6 (rest of IRs): Worth 16% of your project grade. You will extend your parser to generate IRs for the rest of your ASTs.

  7. Project #7 (instruction selection): Worth 14% of your project grade. This is the final stage in which you are asked to make your PCAT compiler generate MIPS code and run it using SPIM.

Last modified: 01/16/12 by Leonidas Fegaras