Supported Platform: Linux® only.
This example shows you how to create a standalone MATLAB® MapReduce application using the mcc command and
run it against a Hadoop® cluster.
Goal: Calculate the maximum arrival delay of an airline from the given dataset.
| Dataset: | airlinesmall.csv |
| Description: |
Airline departure and arrival information from 1987-2008. |
| Location: | /usr/local/MATLAB/R2017b/toolbox/matlab/demos |
Start this example by creating a new work folder that is visible to the MATLAB search path.
Before starting MATLAB, at a terminal, set the environment variable HADOOP_PREFIX to point to the Hadoop installation folder. For example:
| Shell | Command |
|---|---|
| csh / tcsh | % setenv HADOOP_PREFIX /usr/lib/hadoop |
| bash | $ export HADOOP_PREFIX=/usr/lib/hadoop |
This example uses /usr/lib/hadoop as directory where Hadoop is installed. Your Hadoop installation directory maybe different.
If you forget setting the HADOOP_PREFIX environment variable prior to starting MATLAB, set it up using the MATLAB function setenv at the MATLAB command prompt as soon as you start MATLAB. For example:
setenv('HADOOP_PREFIX','/usr/lib/hadoop')
Install the MATLAB Runtime in a folder that is accessible by every worker node in the Hadoop cluster. This example uses /usr/local/MATLAB/MATLAB_Runtime/v as the location of the MATLAB Runtime folder.##
If you don’t have the MATLAB Runtime, you can download it from the website at: http://www.mathworks.com/products/compiler/mcr.
Replace all references to the MATLAB Runtime version v in this example with the MATLAB Runtime version number corresponding to your MATLAB release. For example, MATLAB R2017b has MATLAB Runtime version number ##v92. For information about MATLAB Runtime version numbers corresponding MATLAB releases, see this list.
Copy the map function maxArrivalDelayMapper.m from /usr/local/MATLAB/R2017b/toolbox/matlab/demos folder to the work folder.
For more information, see Write a Map Function (MATLAB).
Copy the reduce function maxArrivalDelayReducer.m from folder to the work folder.matlabroot/toolbox/matlab/demos
For more information, see Write a Reduce Function (MATLAB).
Create the directory /user/ on HDFS™ and copy the file <username>/datasetsairlinesmall.csv to that directory. Here refers to your user name in HDFS. <username>
$ ./hadoop fs -copyFromLocal airlinesmall.csv hdfs://host:54310/user/<username>/datasets |
Start MATLAB and verify that the HADOOP_PREFIX
environment variable has been set. At the command prompt,
type:
>> getenv('HADOOP_PREFIX')If ans is empty, review the Prerequisites section above to see how you can set the
HADOOP_PREFIX environment variable.
Create a new MATLAB script with the name
depMapRedStandAlone.m. You will add the code
listed in the steps listed below to this script file.
Create a datastore that points to the airline
data in Hadoop Distributed
File System (HDFS) .
ds = datastore('hdfs:///user/username/datasets/airlinesmall.csv',... 'TreatAsMissing','NA',... 'SelectedVariableNames',{'UniqueCarrier','ArrDelay'});
For more information, see Read Remote Data (MATLAB).
Configure the application for deployment against Hadoop with default settings.
config = matlab.mapreduce.DeployHadoopMapReducer;
The class matlab.mapreduce.DeployHadoopMapReducer
can be used to configure a standalone application based on the
Hadoop environment where it is going to be deployed.
For example, if you want to specify the location of the MATLAB Runtime on each of the worker nodes on the cluster, include a line of code similar to this:
config = matlab.mapreduce.DeployHadoopMapReducer('MCRRoot','/opt/MATLAB/MATLAB_Runtime/v##');
/opt/MATLAB/MATLAB_Runtime on the worker
nodes.For information on specifying additional cluster specific properties,
see matlab.mapreduce.DeployHadoopMapReducer.
Specifying a MATLAB Runtime location as part of the class
matlab.mapreduce.DeployHadoopMapReducer will
override any MATLAB Runtime location specified during the execution of the
standalone application.
Define the execution environment using the mapreducer.
mr = mapreducer(config);
Apply the mapreduce function.
result = mapreduce(... ds,... @maxArrivalDelayMapper,@maxArrivalDelayReducer,... mr,... 'OutputType','Binary', ... 'OutputFolder','hdfs:///user/<username>/results/myresults');
An HDFS directory such as .../myresults can
be written to only once. If you plan on running your standalone
application multiple times against the Hadoop cluster, make sure you delete the
.../myresults directory on HDFS prior to each execution. Another option is to change
the name of the .../myresults directory in the
MATLAB code and recompile the application.
Read the result from the resulting datastore.
myAppResult = readall(result)
Use the mcc command with the -m
flag to create a standalone application.
mcc -m depMapRedStandAlone.m
The -m flag creates a standard executable that can
be run from a command line. However, the mcc command
cannot package the results in an installer.
Run the standalone application from a Linux shell using the following command:
$ ./run_depMapRedStandAlone.sh /usr/local/MATLAB/MATLAB_Runtime/v## |
/usr/local/MATLAB/MATLAB_Runtime/v is an argument indicating the
location of the MATLAB Runtime.##
Prior to executing the above command, verify that the
HADOOP_PREFIX environment variable is set in the
Terminal by typing:
$ echo $HADOOP_PREFIX
echo comes up empty, see the Prerequisites section above to see how you
can set the HADOOP_PREFIX environment
variable.Your application will fail to execute if the
HADOOP_PREFIX environment variable is not
set.
You will see the following output:
myAppResult =
Key Value
_________________ ______
'MaxArrivalDelay' [1014]Other examples of map and reduce
functions are available at toolbox/matlab/demos folder. You
can use other examples to prototype similar standalone applications that run
against Hadoop. For more information, see Build Effective Algorithms with MapReduce (MATLAB).
Complete code for the standalone application
depMapRedStandAlone can be found here:
KeyValueDatastore | TabularTextDatastore | datastore | matlab.mapreduce.DeployHadoopMapReducer | mcc