The purpose of this project is to develop a simple program for matrix multiplication using Apache Pig.
This project must be done individually. No copying is permitted. Note: We will use a system for detecting software plagiarism, called Moss, which is an automatic system for determining the similarity of programs. That is, your program will be compared with the programs of the other students in class as well as with the programs submitted in previous years. This program will find similarities even if you rename variables, move code, change code structure, etc.
Note that, if you use a Search Engine to find similar programs on the web, we will find these programs too. So don't do it because you will get caught and you will get an F in the course (this is cheating). Don't look for code to use for your project on the web or from other students (current or past). Just do your project alone using the help given in this project description and from your instructor and GTA only.
As in the previous projects, you will develop your program on SDSC Comet. You may use Pig on Comet in local mode interactively, but you need to setup your PATH first:
module load hadoop export PATH=$PATH:/oasis/projects/nsf/uot143/fegaras/pig-0.16.0/binThen, to evaluate Pig Latin commands interactively, do:
pig -x local
Login into Comet and download and untar project4:
wget http://lambda.uta.edu/cse6331/project4.tgz tar xfz project4.tgz chmod -R g-wrx,o-wrx project4Go to project4/examples and look at the join.pig example. You can run it in standalone mode using:
sbatch join.local.runor using
pig -x local join.pigOptionally, you can run it in distributed mode using:
You are asked to re-implement Project #1 (matrix multiplication) using Apache Pig. An empty multiply.pig is provided as well as scripts to run this code on Comet. The input matrices are the same as in Project1. There are two small sparce matrices 4*3 and 3*3 in the files M-matrix-small.txt and N-matrix-small.txt for testing in local mode. Their matrix multiplication must return the 4*3 matrix in result-matrix-small.txt. Then there are 2 moderate-sized matrices 200*100 and 100*300 in the files M-matrix-large.txt and M-matrix-large.txt for testing in distributed mode. Note: you can access the input matirces in Pig (which are passed as parameters) as $N and $M and the output as $O.
To run it in local mode over the two small matrices use:
sbatch multiply.local.runThe result matrix in the directory output must be similar to result-matrix-small.txt. After you make sure that your program runs correctly in local mode, you run it in distributed mode using:
sbatch multiply.distr.runThis will multiply the moderate-sized matrices and will write the result in the directory output-distr.
You can learn more about Pig at:
You need to submit the following files only:
project4/multiply.pig project4/multiply.local.out project4/output/part-r-00000 project4/multiply.distr.out project4/output-distr/part-r-00000
Last modified: 10/26/2016 by Leonidas Fegaras