The purpose of this project is to develop a program to solve the single source shortest distance problem using one of the Big Data platforms we have used in the previous projects: Map-Reduce, plain Spark, Pig, or Hive.
This project must be done individually. No copying is permitted. Note: We will use a system for detecting software plagiarism, called Moss, which is an automatic system for determining the similarity of programs. That is, your program will be compared with the programs of the other students in class as well as with the programs submitted in previous years. This program will find similarities even if you rename variables, move code, change code structure, etc.
Note that, if you use a Search Engine to find similar programs on the web, we will find these programs too. So don't do it because you will get caught and you will get an F in the course (this is cheating). Don't look for code to use for your project on the web or from other students (current or past). Just do your project alone using the help given in this project description and from your instructor and GTA only.
As in the previous projects, you will develop your program on SDSC Comet.
Login into Comet and download and untar project6:
wget http://lambda.uta.edu/cse6331/project6.tgz tar xfz project6.tgz chmod -R g-wrx,o-wrx project6
Write a program to solve the single source shortest distance problem: for each node in a directed graph, calculate the shortest distance from the node with id=0 to this node. You may implement this projects using any of the Big Data platforms that we have used in the previous projects: Map-Reduce, plain Spark, Pig, or Hive. The input directed graph is a dataset of edges, where an edge from the node i to the node j is represented in the input text file as:
i,d,jwhere d is the distance from node i to node j. (Numbers i,j, and d are long integers.) Let distance[i] be the shortest distance from the node with id=0 to the node with id=i. The pseudo-code to calculate distance[i] is as follows:
distance = 0 for each node i <> 0: distance[i] = Long.MAX_VALUE repeat 4 times: for each edge (i,d,j): if distance[j] > distance[i]+d distance[j] = distance[i]+dYour code that calculates the new distances from the old must be repeated 4 times only.
Two graphs are provided: small-graph.txt and large-graph.txt. You should test your programs in local mode on small-graph.txt until you get the correct result. After you make sure that your program runs correctly in local mode, you run it in distributed mode on large-graph.txt. Use the scripts from the previous projects.
You need to submit your source, output, and log files using the following form:
Last modified: 11/07/2016 by Leonidas Fegaras