This example shows how to modify a MATLAB® example
that calculates mean airline delays and creates a standalone application.
The standalone application is a MATLAB program that runs against Hadoop® using
the mcc command. The mapreducer defines
the environment for Hadoop.
This example uses the MaxMapReduceExample.m example
file and the airline dataset, airlinesmall.csv,
both available at the toolbox/matlab/demos folder.
Move your example code to a new working folder for deployment. The
new working folder on the path ensures that the files are accessible
by MATLAB Compiler™.
Note:
Standalone application that runs against Hadoop using |
Set environment variables and cluster properties for your Hadoop configuration. These properties are necessary for submitting jobs to your Hadoop cluster.
Set up the environment variable, HADOOP_HOME to
point at your Hadoop install folder. Modify the system path to
include $HADOOP_HOME/bin.
setenv('HADOOP_HOME','/share/hadoop/a1.2.1')Install the MATLAB Runtime in a folder that is
accessible by every worker node in the Hadoop cluster. The following
steps use /hd-shared/MCR/v84.
Download the MATLAB Runtime from the website at http://www.mathworks.com/products/compiler/mcr.
Copy the airlinesmall.csv into Hadoop Distributed
File System (HDFS™) folder /datasets/airlinemod.
Copy the map function maxArrivalDelayMapper.m from toolbox/matlab/demos folder
to the working folder.
function maxArrivalDelayMapper (data, info, intermKVStore) partMax = max(data.ArrDelay); add(intermKVStore,'PartialMaxArrivalDelay',partMax);
For more information, see Write a Map Function.
Copy the reduce function maxArrivalDelayReducer.m from toolbox/matlab/demos folder
to the working folder.
function maxArrivalDelayReducer(intermKey, intermValIter, outKVStore) maxVal = -inf; while hasnext(intermValIter) maxVal = max(getnext(intermValIter), maxVal); end add(outKVStore,'MaxArrivalDelay',maxVal);
For more information, see Write a Reduce Function.
Create a datastore that points to
the airline data in Hadoop Distributed
File System (HDFS) .
ds = datastore(... 'hdfs://hadoop01/datasets/airlinemod/airlinesmall.csv',... 'TreatAsMissing','NA') ds.SelectedVariableNames = {'Year','Month',... 'DayofMonth','UniqueCarrier'};
If the files are located in HDFS,
then the datastore should point to HDFS. For more information, see Read from HDFS.
Create a mapreducer object to set
the properties of Hadoop in
deployed mode. The mapreducer passes information
about the execution environment to standalone applications that run
against Hadoop. The mapreducer must
point to the location of the MATLAB Runtime that is accessible
from all the Hadoop worker
nodes.
mr = mapreducer(matlab.mapreduce.DeployHadoopMapReducer('MCRRoot',... '/hd-shared/hadoop-2.2.0/MCR/v84'))
For more information, see matlab.mapreduce.DeployHadoopMapReducer.
The new application maxMapreduceapp.m consists
of a datastore, a mapreducer object
that specifies the deployed environment variables, a mapreduce command,
and a command to view the results of mapreduce:
ds = datastore(...
'hdfs://hadoop01/datasets/airlinemod/airlinesmall.csv',...
'TreatAsMissing','NA')
ds.SelectedVariableNames = {'Year','Month','DayofMonth',...
'UniqueCarrier'};
mr = mapreducer(matlab.mapreduce.DeployHadoopMapReducer('MCRRoot',...
'/hd-shared/hadoop-2.2.0/MCR/v84'))
result = mapreduce(ds,@maxArrivalDelayMapper,@maxArrivalDelayReducer,...
mr,'OutputType','Binary', ...
'OutputFolder','hdfs://hadoop01/user/username/myresults');
maxMapreduceappResult = readall(result)Use the mcc command with the -m flag
to create a standalone application. The -m flag
creates a standard executable that can be run from a command line.
However, the mcc command cannot package the results
in an installer.
mcc -m maxmapreduceapp.m
For more information, see mcc.
MATLAB Compiler creates maxmapreduceapp.m,
shell script run_maxarrivaldelay.sh, and a log
file mccExcludedfiles.log.
Run the standalone application from MATLAB command prompt using the following command:
!./maxmapreduce Key Value
____________ _____________
'AA' [92X1 double]
'AS' [92X1 double]
'CO' [92X1 double]
'DL' [92X1 double]
'EA' [92X1 double] Results display in MATLAB.
Other examples of map and reduce functions
are available at toolbox/matlab/demos folder. You
can use other examples to prototype similar standalone applications
that run against Hadoop. For more information, see Build Effective Algorithms with MapReduce.
datastore | KeyValueDatastore | matlab.mapreduce.DeployHadoopMapReducer | mcc | TabularTextDatastore