This example shows how to create a deployable
archive with mcc command that calculates mean airline delays. The
archive that you create contains all the MATLAB® content associated
with the component. The mcc
command creates a shell
script to run the deployable archive against Hadoop®. You can use shell script to
customize the execution of the deployable archive within your particular Hadoop environment.
This example uses the MaxMapReduceExample.m
example
file and the airline dataset, airlinesmall.csv
,
both available at the toolbox/matlab/demos
folder.
Move your example code to a new working folder for deployment. The
new working folder on the path ensures that the files are accessible
by MATLAB Compiler™.
Note: Deployable archive that runs against Hadoop using Hadoop Compiler app is supported only on Linux®. |
Set environment variables and cluster properties for your Hadoop configuration. These properties are necessary for submitting jobs to your Hadoop cluster.
Set up the environment variable, HADOOP_HOME
to
point at your Hadoop install
folder. Modify the system path to include $HADOOP_HOME/bin
.
Install the MATLAB Runtime in a folder that is accessible
by every worker node in the Hadoop cluster. The following example
uses /hd-shared/MCR/v84
.
Download the MATLAB Runtime from the website at http://www.mathworks.com/products/compiler/mcr
.
Copy the airlinesmall.csv
into Hadoop Distributed
File System (HDFS™) folder /datasets/airlinemod
.
Copy the map function maxArrivalDelayMapper.m
from toolbox/matlab/demos
folder
to the working folder.
function maxArrivalDelayMapper (data, info, intermKVStore) partMax = max(data.ArrDelay); add(intermKVStore,'PartialMaxArrivalDelay',partMax);
For more information, see Write a Map Function.
Copy the reduce function maxArrivalDelayReducer.m
from toolbox/matlab/demos
folder
to the working folder.
function maxArrivalDelayReducer(intermKey, intermValIter, outKVStore) maxVal = -inf; while hasnext(intermValIter) maxVal = max(getnext(intermValIter), maxVal); end add(outKVStore,'MaxArrivalDelay',maxVal);
For more information, see Write a Reduce Function.
Create a datastore
object from the MaxMapReduceExample.m
and
save the datastore
to a .mat
file.
ds = datastore('airlinesmall.csv','TreatAsMissing','NA',... 'SelectedVariableNames','ArrDelay','ReadSize',1000);
save('airlinesmall.mat','ds')
For more information, Getting Started with Datastore
A Hadoop settings
file specifies input type tabulartext
, output type binary
,
the map function, the reduce function, and previously created datastore
.
mw.ds.in.type = tabulartext mw.ds.in.format = airlinesmall.mat mw.ds.out.type = binary mw.mapper = maxArrivalDelayMapper mw.reducer = maxArrivalDelayReducer
Use the mcc
command with the -m
flag
to create a deployable archive. The -m
flag creates
a standard executable that can be run from a command line. However,
the mcc
command cannot package the results in an
installer. The command must be entered as a single line.
mcc -H -W 'hadoop:airlinesmall,CONFIG:MWHadoopSetting.txt' maxArrivalDelayMapper.m maxArrivalDelayReducer.m -a airlinesmall.mat
For more information, see mcc
.
MATLAB Compiler creates a shell script run_maxarrivaldelay.sh
,
a deployable archive airlinesmall.ctf
, and a log
file mccExcludedfiles.log
.
Deploy the archive as a Hadoop
job by pointing the job to the csv files in the airline dataset. The
arguments in the command are MCRRoot
, Hadoop properties
defined using -D
flag, the data file, and the new
results folder. The command must be entered as a single line.
!./run_airlinesmall.sh /hd-shared/MCR/v84
-D mw.mcrroot = /hd-shared/MCR/v84 "/datasets/airline/*.csv"
myresults
Visualize and plot the results.
ds = datastore('hdfs://hadoop01/user/username/myresults/part*',... 'Type', 'keyvalue') airlinesmallResult = readall(ds)
Key Value __________________ ________ 'MaxArrivalDelay' [1014]
Other examples of map
and reduce
functions
are available at toolbox/matlab/demos
folder. You
can use other examples to prototype similar deployable archives that
run against Hadoop. For
more information, see Build Effective Algorithms with MapReduce.
datastore
| deploytool
| KeyValueDatastore
| mcc
| TabularTextDatastore