Write mapper and reducer functions in MATLAB®.
Create a MAT-file that contains a datastore that describes the structure of the data and the names of the variables to analyze. The datastore in the MAT-file can be created from a test data set that is representative of the actual data set.
Create a text file that contains Hadoop® settings such as the name of the mapper, reducer, and the type of data being analyzed. This file is automatically created if you are using the Hadoop Compiler app.
Use the Hadoop Compiler app or the mcc
command to package the components into a deployable archive. Both
options generate a deployable archive (.ctf file) that can be
incorporated into a Hadoop mapreduce job.
Incorporate the deployable archive into a Hadoop mapreduce job using the hadoop
command
and syntax.
Execution Signature
Key
Letter | Description |
---|---|
A | Hadoop command |
B | JAR option |
C | The standard name of the JAR file. All applications
have the same JAR:
mwmapreduce.jar .The path to the JAR
is also fixed relative to the MATLAB Runtime location. |
D | The standard name of the driver. All applications
have the same driver name:
MWMapReduceDriver |
E | A generic option specifying the MATLAB Runtime location as a key-value pair. |
F | Deployable archive (.ctf file)
generated by the Hadoop Compiler app or
mcc is passed as a payload
argument to the job. |
G | Location of input files on HDFS™. |
H | Location on HDFS where output can be written. |
To simplify the inclusion of the deployable archive (.ctf
file)
into a Hadoop mapreduce job, both the Hadoop Compiler app and the
mcc
command generate a shell script alongside the
deployable archive. The shell script has the following naming convention:
run_<deployableArchiveName>.sh
To run the deployable archive using the shell script, use the following syntax: