This example shows how to modify a MATLAB® example
that calculates mean airline delays and creates a standalone application.
The standalone application is a MATLAB program that runs against Hadoop® using
the mcc
command. The mapreducer
defines
the environment for Hadoop.
This example uses the MaxMapReduceExample.m
example
file and the airline dataset, airlinesmall.csv
,
both available at the toolbox/matlab/demos
folder.
Move your example code to a new working folder for deployment. The
new working folder on the path ensures that the files are accessible
by MATLAB Compiler™.
Note:
Standalone application that runs against Hadoop using |
Set environment variables and cluster properties for your Hadoop configuration. These properties are necessary for submitting jobs to your Hadoop cluster.
Set up the environment variable, HADOOP_HOME
to
point at your Hadoop install folder. Modify the system path to
include $HADOOP_HOME/bin
.
setenv('HADOOP_HOME','/share/hadoop/a1.2.1')
Install the MATLAB Runtime in a folder that is
accessible by every worker node in the Hadoop cluster. The following
steps use /hd-shared/MCR/v84
.
Download the MATLAB Runtime from the website at http://www.mathworks.com/products/compiler/mcr
.
Copy the airlinesmall.csv
into Hadoop Distributed
File System (HDFS™) folder /datasets/airlinemod
.
Copy the map function maxArrivalDelayMapper.m
from toolbox/matlab/demos
folder
to the working folder.
function maxArrivalDelayMapper (data, info, intermKVStore) partMax = max(data.ArrDelay); add(intermKVStore,'PartialMaxArrivalDelay',partMax);
For more information, see Write a Map Function.
Copy the reduce function maxArrivalDelayReducer.m
from toolbox/matlab/demos
folder
to the working folder.
function maxArrivalDelayReducer(intermKey, intermValIter, outKVStore) maxVal = -inf; while hasnext(intermValIter) maxVal = max(getnext(intermValIter), maxVal); end add(outKVStore,'MaxArrivalDelay',maxVal);
For more information, see Write a Reduce Function.
Create a datastore
that points to
the airline data in Hadoop Distributed
File System (HDFS) .
ds = datastore(... 'hdfs://hadoop01/datasets/airlinemod/airlinesmall.csv',... 'TreatAsMissing','NA') ds.SelectedVariableNames = {'Year','Month',... 'DayofMonth','UniqueCarrier'};
If the files are located in HDFS,
then the datastore
should point to HDFS. For more information, see Read from HDFS.
Create a mapreducer
object to set
the properties of Hadoop in
deployed mode. The mapreducer
passes information
about the execution environment to standalone applications that run
against Hadoop. The mapreducer
must
point to the location of the MATLAB Runtime that is accessible
from all the Hadoop worker
nodes.
mr = mapreducer(matlab.mapreduce.DeployHadoopMapReducer('MCRRoot',... '/hd-shared/hadoop-2.2.0/MCR/v84'))
For more information, see matlab.mapreduce.DeployHadoopMapReducer
.
The new application maxMapreduceapp.m
consists
of a datastore
, a mapreducer
object
that specifies the deployed environment variables, a mapreduce
command,
and a command to view the results of mapreduce
:
ds = datastore(... 'hdfs://hadoop01/datasets/airlinemod/airlinesmall.csv',... 'TreatAsMissing','NA') ds.SelectedVariableNames = {'Year','Month','DayofMonth',... 'UniqueCarrier'}; mr = mapreducer(matlab.mapreduce.DeployHadoopMapReducer('MCRRoot',... '/hd-shared/hadoop-2.2.0/MCR/v84')) result = mapreduce(ds,@maxArrivalDelayMapper,@maxArrivalDelayReducer,... mr,'OutputType','Binary', ... 'OutputFolder','hdfs://hadoop01/user/username/myresults'); maxMapreduceappResult = readall(result)
Use the mcc
command with the -m
flag
to create a standalone application. The -m
flag
creates a standard executable that can be run from a command line.
However, the mcc
command cannot package the results
in an installer.
mcc -m maxmapreduceapp.m
For more information, see mcc
.
MATLAB Compiler creates maxmapreduceapp.m
,
shell script run_maxarrivaldelay.sh
, and a log
file mccExcludedfiles.log
.
Run the standalone application from MATLAB command prompt using the following command:
!./maxmapreduce
Key Value ____________ _____________ 'AA' [92X1 double] 'AS' [92X1 double] 'CO' [92X1 double] 'DL' [92X1 double] 'EA' [92X1 double]
Results display in MATLAB.
Other examples of map
and reduce
functions
are available at toolbox/matlab/demos
folder. You
can use other examples to prototype similar standalone applications
that run against Hadoop. For more information, see Build Effective Algorithms with MapReduce.
datastore
| KeyValueDatastore
| matlab.mapreduce.DeployHadoopMapReducer
| mcc
| TabularTextDatastore