Programming Assignment 1
A Simple Map-Reduce Program

Due on Thursday September 29 before midnight


The purpose of this project is to develop a simple Map-Reduce program on Hadoop to multiply two sparse matrices.

This project must be done individually. No copying is permitted. Note: We will use a system for detecting software plagiarism, called Moss, which is an automatic system for determining the similarity of programs. That is, your program will be compared with the programs of the other students in class as well as with the programs submitted in previous years. This program will find similarities even if you rename variables, move code, change code structure, etc.

Note that, if you use a Search Engine to find similar programs on the web, we will find these programs too. So don't do it because you will get caught and you will get an F in the course (this is cheating). Don't look for code to use for your project on the web or from other students (current or past). Just do your project alone using the help given in this project description and from your instructor and GTA only.


You will develop your program on SDSC Comet. Optionally, you may use Eclipse to help you develop your program, but you should test your programs on Comet before you submit them.

Setting up your Project

Login into Comet and download and untar project1:

tar xfz project1.tgz
chmod -R g-wrx,o-wrx project1
Go to project1/examples and look at the two Map-Reduce examples and You can compile using:
and you can run it in standalone mode using:
File simple.local.out will contain the trace log of the Map-Reduce evaluation while the directory output will contain the results (file output/part-r-00000). Optionally, you can run this program in distributed mode using:
Please note that running in distributed mode will waste at least 10 of your SUs. So better do this only for your own code (for matrix multiplication).

Project Description

In this project, you are asked to implement matrix multiplication in Map-Reduce. This is described in Section 2.3.9 (page 38) in Map-Reduce and the New Software Stack. You should use two Map-Reduce jobs in the same Java file project1/ Do not use the method in Section 2.3.10. An empty project1/ is provided, as well as scripts to build and run this code on Comet. You should modify only. There are two small sparce matrices 4*3 and 3*3 in the files M-matrix-small.txt and N-matrix-small.txt for testing in standalone mode. Their matrix multiplication must return the 4*3 matrix in result-matrix-small.txt. Then there are 2 moderate-sized matrices 200*100 and 100*300 in the files M-matrix-large.txt and M-matrix-large.txt for testing in distributed mode.

You can compile using:

and you can run it in standalone mode over the two small matrices using:
The result matrix in the directory output must be similar to result-matrix-small.txt. You should modify and run your programs in standalone mode until you get the correct result. After you make sure that your program runs correctly in standalone mode, you run it in distributed mode using:
This will multiply the moderate-sized matrices and will write the result in the directory output-distr. Note that running in distributed mode will use up at least 10 of your SUs. So do this once or twice only, after you make sure that your program works correctly in standalone mode.

Detailed directions and hints:

Optional: Use Eclipse

If you have a prior experience with Eclipse, you may want to develop your program on Eclipse on a Mac or a Linux PC, test it in standalone mode, and then test it and run it on Comet. Using Eclipse is optional; you shouldn't do this if you haven't used Eclipse before. Note that you cannot run Hadoop on Windows. The following instructions have been tested on Linux and Mac. First, download and untar hadoop-2.6.0.tar.gz on your PC. If you haven't installed Eclipse on your PC, download Eclipse 4.5 (Mars) IDE for Java Developers from On Eclipse, go to Preferences→Java→Build Path→User Libraries, hit "New..." and name it hadoop2.6, select hadoop2.6 and hit "Add External JARs...". Select all the jar files from the following subdirectories from inside the directory you untared, hadoop-2.6.0/share/hadoop:
common common/lib hdfs hdfs/lib yarn yarn/lib mapreduce mapreduce/lib
(You can select multiple jars using shift-left-mouse.) Then, do New→Project→Java Project. Give the name Simple. In Java Settings, choose Libraries→Add Library→User Library and select hadoop2.6. Inside the Simple project, click on src and create a new Java Class with name Simple and package edu.uta.cse6331. Cut and paste the code inside. Save it. Click on→Run As→Run Configurations and go to Arguments. Add 2 arguments: simple.txt and output (separate lines). Now you can run it in standalone mode by hitting Run. You can do the same for your project.


To help you, I am giving you the pseudo code:

class Elem extends Writable {
  short tag;  // 0 for M, 1 for N
  int index;  // one of the indexes (the other is used as a key)
  double value;

class Pair extends WritableComparable<Pair> {
  int i;
  int j;
First Map-Reduce job:
map(key,line) =             // mapper for matrix M
  split line into 3 values: i, j, and v
  emit(j,new Elem(0,i,v))

map(key,line) =             // mapper for matrix N
  split line into 3 values: i, j, and v
  emit(i,new Elem(1,j,v))

reduce(index,values) =
  A = all v in values with v.tag==0
  B = all v in values with v.tag==1
  for a in A
     for b in B
         emit(new Pair(a.index,b.index),a.value*b.value)
Second Map-Reduce job:
map(key,value) =  // do nothing

reduce(pair,values) =  // do the summation
  m = 0
  for v in values
    m = m+v


What to Submit

You need to submit the following files only:

We do not accept email or hardcopy submissions. You may submit multiple files, if you like, as long as they have different names. These files are automatically uploaded directly into your personal class account for this particular project, so you don't have to include your name or student ID or project number in the file name. You may submit your files as many times as you like, but only the most recently submitted files will be retained and evaluated (newly submitted files replace the old files under the same file name). After you submit the files, please double-check that your submitted files are correct by clicking on the Status link. If you cannot login or have a problem submitting the project using this form, ask the GTA for help.

Submit Programming Assignment #1:

Last modified: 09/17/2016 by Leonidas Fegaras