Matrix Multiplication using Hive

The purpose of this project is to develop a simple program for matrix multiplication using Apache Hive.

As in the previous projects, you will develop your program on SDSC Comet.

Login into Comet and download and untar project5:

wget http://lambda.uta.edu/cse6331/project5.tgz tar xfz project5.tgz chmod -R g-wrx,o-wrx project5You may use Hive on Comet in local mode interactively, but you need to setup your PATH (you need to do this every time you login to comet):

source ~/project5/setupYou also need to create an empty metastore database first (this must be done only once):

cd schematool -dbType derby -initSchemaThen, to evaluate Hive commands interactively, do:

hiveGo to project5/example and look at the join.hql example. You can run it in local mode (after you setup your PATH) using:

hive -f join.hql

You are asked to re-implement Project #1 (matrix multiplication) using Apache Hive.
This time, you need to store the result of the multiplication into a Hive
table and then write a Hive query that counts the number of matrix elements and the
average matrix value of the multiplication result.
An empty `multiply.hql` is provided
as well as a script to run this code on Comet.
The input matrices are the same as in Project1.
There are two small sparce matrices 4*3 and 3*3 in the files
M-matrix-small.txt and N-matrix-small.txt for testing in local mode.
For these matrices, your program should print the following COUNT and AVG:

12 15.5Then there are 2 moderate-sized matrices 200*100 and 100*300 in the files M-matrix-large.txt and M-matrix-large.txt for testing in distributed mode. For these matrices, your program should print the following COUNT and AVG:

59889 -0.06041565604668677Note: you can access the input matirces in Hive (which are passed as parameters) as ${hiveconf:N} and ${hiveconf:M}.

To run it in local mode over the two small matrices do:

hive -f multiply.hql --hiveconf M=M-matrix-small.txt --hiveconf N=N-matrix-small.txtTo dump the output to the file multiply.local.out, do:

hive -f multiply.hql --hiveconf M=M-matrix-small.txt --hiveconf N=N-matrix-small.txt &>multiply.local.outAfter you make sure that your program runs correctly in local mode, you run it in distributed mode using:

sbatch multiply.distr.runThis will multiply the moderate-sized matrices.

You need to submit the following files only:

project5/multiply.hql project5/multiply.local.out project5/multiply.distr.out

