Programming Assignment 1
A Simple Map-Reduce Program

Due on Tuesday October 3 before midnight


Description

The purpose of this project is to develop a simple Map-Reduce program on Hadoop to multiply two sparse matrices.

This project must be done individually. No copying is permitted. Note: We will use a system for detecting software plagiarism, called Moss, which is an automatic system for determining the similarity of programs. That is, your program will be compared with the programs of the other students in class as well as with the programs submitted in previous years. This program will find similarities even if you rename variables, move code, change code structure, etc.

Note that, if you use a Search Engine to find similar programs on the web, we will find these programs too. So don't do it because you will get caught and you will get an F in the course (this is cheating). Don't look for code to use for your project on the web or from other students (current or past). Just do your project alone using the help given in this project description and from your instructor and GTA only.

Platform

You will develop your program on SDSC Comet. Optionally, you may use Eclipse to help you develop your program, but you should test your programs on Comet before you submit them.

How to login on Comet

Your XSEDE username is you MavID (or whatever you used when you registered). Your Comet username though is based on your first and last name, so it is not your MavID. But your XSEDE and Comet passwords are the same (this is the password you specified when you registered). The first thing to do is to login on XSEDE. For example, from a Linux or a Mac PC, you may login using:

ssh xyz1234@login.xsede.org
where xyz1234 is your XSEDE username (your MavID). On Windows, you must install a secure shell client (see User Guides - Unix). Then, from XSEDE, you can login to Comet:
gsissh comet
You can see your username at the command prompt. Now that you know your username, next time you may login on Comet directly:
ssh username@comet.sdsc.xsede.org
where username is your Comet username. Please email the GTA if you need further help.

Edit the file .bashrc (note: it starts with a dot) using a text editor, such as nano .bashrc, and add the following 2 lines at the end (cut-and-paste):

alias run='srun --pty -A uot143 --partition=shared --nodes=1 --ntasks-per-node=1 --mem=5G -t 00:05:00 --wait=0 --export=ALL'
export project=/oasis/projects/nsf/uot143/fegaras
logout and login again to apply the changes.

Setting up your Project

Login into Comet and download and untar project1:

wget http://lambda.uta.edu/cse6331/project1.tgz
tar xfz project1.tgz
chmod -R g-wrx,o-wrx project1
Go to project1/examples and look at the two Map-Reduce examples Simple.java and Join.java. You can compile Simple.java using:
run simple.build
and you can run it in standalone mode using:
sbatch simple.local.run
The file simple.local.out will contain the trace log of the Map-Reduce evaluation while the directory output will contain the results (file output/part-r-00000). Optionally, you can run this program in distributed mode using:
sbatch simple.distr.run
Please note that running in distributed mode will waste at least 10 of your SUs. So better do this only for your own code (for matrix multiplication).

Project Description

In this project, you are asked to implement matrix multiplication in Map-Reduce. This is described in Section 2.3.9 (page 38) in Map-Reduce and the New Software Stack. You should use two Map-Reduce jobs in the same Java file project1/Multiply.java. Do not use the method in Section 2.3.10. An empty project1/Multiply.java is provided, as well as scripts to build and run this code on Comet. You should modify Multiply.java only. There are two small sparce matrices 4*3 and 3*3 in the files M-matrix-small.txt and N-matrix-small.txt for testing in standalone mode. Their matrix multiplication must return the 4*3 matrix in result-matrix-small.txt. Then there are 2 moderate-sized matrices 200*100 and 100*300 in the files M-matrix-large.txt and M-matrix-large.txt for testing in distributed mode.

You can compile Multiply.java using:

run multiply.build
and you can run it in standalone mode over the two small matrices using:
sbatch multiply.local.run
The result matrix in the directory output must be similar to result-matrix-small.txt. You should modify and run your programs in standalone mode until you get the correct result. After you make sure that your program runs correctly in standalone mode, you run it in distributed mode using:
sbatch multiply.distr.run
This will multiply the moderate-sized matrices and will write the result in the directory output-distr. Note that running in distributed mode will use up at least 10 of your SUs. So do this once or twice only, after you make sure that your program works correctly in standalone mode.

Optional: Use Eclipse

If you have a prior experience with Eclipse, you may want to develop your program on Eclipse on a Mac or a Linux PC, test it in standalone mode, and then test it and run it on Comet. Using Eclipse is optional; you shouldn't do this if you haven't used Eclipse before. Note that you cannot run Hadoop on Windows. The following instructions have been tested on Linux and Mac. First, download and untar hadoop-2.6.0.tar.gz on your PC. If you haven't installed Eclipse on your PC, download the latest Eclipse IDE for Java Developers from https://eclipse.org/downloads/. On Eclipse, go to Preferences→Java→Build Path→User Libraries, hit "New..." and name it hadoop2.6, select hadoop2.6 and hit "Add External JARs...". Select all the jar files from the following subdirectories from inside the directory you untared, hadoop-2.6.0/share/hadoop:
common common/lib hdfs hdfs/lib yarn yarn/lib mapreduce mapreduce/lib
(You can select multiple jars using shift-left-mouse.) Then, do New→Project→Java Project. Give the name Simple. In Java Settings, choose Libraries→Add Library→User Library and select hadoop2.6. Inside the Simple project, click on src and create a new Java Class with name Simple and package edu.uta.cse6331. Cut and paste the Simple.java code inside. Save it. Click on Simple.java→Run As→Run Configurations and go to Arguments. Add 2 arguments: simple.txt and output (separate lines). Now you can run it in standalone mode by hitting Run. You can do the same for your project.

Pseudo-Code

To help you, I am giving you the pseudo code:

class Elem extends Writable {
  short tag;  // 0 for M, 1 for N
  int index;  // one of the indexes (the other is used as a key)
  double value;
  ...
}

class Pair extends WritableComparable<Pair> {
  int i;
  int j;
  ...
}
First Map-Reduce job:
map(key,line) =             // mapper for matrix M
  split line into 3 values: i, j, and v
  emit(j,new Elem(0,i,v))

map(key,line) =             // mapper for matrix N
  split line into 3 values: i, j, and v
  emit(i,new Elem(1,j,v))

reduce(index,values) =
  A = all v in values with v.tag==0
  B = all v in values with v.tag==1
  for a in A
     for b in B
         emit(new Pair(a.index,b.index),a.value*b.value)
Second Map-Reduce job:
map(key,value) =  // do nothing
  emit(key,value)

reduce(pair,values) =  // do the summation
  m = 0
  for v in values
    m = m+v
  emit(pair,pair.i+","+pair.j+","+m)

Documentation

What to Submit

You need to submit the following files only:

project1/Multiply.java
project1/multiply.local.out
project1/output/part-r-00000
project1/multiply.distr.out
We do not accept email or hardcopy submissions. You may submit multiple files, if you like, as long as they have different names. These files are automatically uploaded directly into your personal class account for this particular project, so you don't have to include your name or student ID or project number in the file name. You may submit your files as many times as you like, but only the most recently submitted files will be retained and evaluated (newly submitted files replace the old files under the same file name). After you submit the files, please double-check that your submitted files are correct by clicking on the Status link. If you cannot login or have a problem submitting the project using this form, ask the GTA for help.

Submit Programming Assignment #1:

Last modified: 09/19/2017 by Leonidas Fegaras