This topic describes the requirements to allow jobs to run on an existing Hadoop® cluster.
The requirements are:
MATLAB® Distributed Computing Server™ must be installed or available on the cluster nodes. See Install Products and Choose Cluster Configuration.
If the cluster is running in Kerberos authentication that requires the Java
Cryptography Extension, you must download and install the Oracle version of this
extension to each MATLAB
Distributed Computing Server
installation. You must also perform this step for the MATLAB client installation. To install the
extension, place the Java Cryptography Extension jar files into the folder
${MATLABROOT}/sys/jre/${ARCH}/jre/lib/security
.
You must have a Hadoop installation on the MATLAB client machine, that can submit normal (non-MATLAB) jobs to the cluster.
The cluster must identify its user home directory as a valid location that the
nodes can access. You must choose a local filesystem path and typically use a
local folder such as /tmp/hduserhome
or
/home/${USER}
. Set
yarn.nodemanager.user-home-dir
for Hadoop version 2.X.
There is one Hadoop property that must not be “final.” (If properties are “final”, they are locked to a fixed predefined value, and jobs cannot alter them.)
The software needs to append a value to this property so that task processes are able to correctly run MATLAB. This property is passed as part of the job metadata given to Hadoop during job submission.
This property is mapred.child.env
, which controls
environment variables for the job’s task processes.
You must provide necessary information to the parallel.cluster.Hadoop
object
in the MATLAB client session. For
example, see Run mapreduce on a Hadoop Cluster (Parallel Computing Toolbox) and Use Tall Arrays on a Spark Enabled Hadoop Cluster (Parallel Computing Toolbox).
For Hortonworks, add the following to the beginning of the static class path of MATLAB and MATLAB Distributed Computing Server:
$HADOOP_PREFIX/lib/commons-codec-1.9.jar
For more information, see the documentation for Static Path (MATLAB).
For Cloudera, add the following to the beginning of the static class path of MATLAB and MATLAB Distributed Computing Server:
$HADOOP_PREFIX/jars/commons-codec-1.9.jar
For more information, see the documentation for Static Path (MATLAB).
MATLAB MapReduce is supported on Hadoop 2.x clusters. Note that support for Hadoop 1.x clusters has been removed, see the table.
MATLAB Tall Array is supported on Spark® enabled Hadoop 2.x clusters.
You can use tall arrays on Spark enabled Hadoop clusters supporting all architectures for the client, while supporting Linux and Mac architectures for the cluster. This includes cross-platform support.
Functionality | Result | Use Instead | Compatibility Considerations |
---|---|---|---|
Support for running MATLAB MapReduce on Hadoop 1.x clusters has been removed. | Errors | Use clusters that have Hadoop 2.x or higher installed to run MATLAB MapReduce. | Migrate MATLAB MapReduce code that runs on Hadoop 1.x to Hadoop 2.x. |