Configure a Hadoop Cluster

This topic describes the requirements to allow jobs to run on an existing Hadoop^® cluster.

The requirements are:

MATLAB^® Distributed Computing Server™ must be installed or available on the cluster nodes. See Install Products and Choose Cluster Configuration.
If the cluster is running in Kerberos authentication that requires the Java Cryptography Extension, you must download and install the Oracle version of this extension to each MATLAB Distributed Computing Server installation. You must also perform this step for the MATLAB client installation. To install the extension, place the Java Cryptography Extension jar files into the folder ${MATLABROOT}/sys/jre/${ARCH}/jre/lib/security.
You must have a Hadoop installation on the MATLAB client machine, that can submit normal (non-MATLAB) jobs to the cluster.
The cluster must identify its user home directory as a valid location that the nodes can access. You must choose a local filesystem path and typically use a local folder such as /tmp/hduserhome or /home/${USER}. Set yarn.nodemanager.user-home-dir for Hadoop version 2.X.
There is one Hadoop property that must not be “final.” (If properties are “final”, they are locked to a fixed predefined value, and jobs cannot alter them.)
The software needs to append a value to this property so that task processes are able to correctly run MATLAB. This property is passed as part of the job metadata given to Hadoop during job submission.
This property is mapred.child.env, which controls environment variables for the job’s task processes.
You must provide necessary information to the parallel.cluster.Hadoop object in the MATLAB client session. For example, see Run mapreduce on a Hadoop Cluster (Parallel Computing Toolbox) and Use Tall Arrays on a Spark Enabled Hadoop Cluster (Parallel Computing Toolbox).
For Hortonworks, add the following to the beginning of the static class path of MATLAB and MATLAB Distributed Computing Server:
$HADOOP_PREFIX/lib/commons-codec-1.9.jar
For more information, see the documentation for Static Path (MATLAB).
For Cloudera, add the following to the beginning of the static class path of MATLAB and MATLAB Distributed Computing Server:
$HADOOP_PREFIX/jars/commons-codec-1.9.jar
For more information, see the documentation for Static Path (MATLAB).

Hadoop Version Support

MATLAB MapReduce is supported on Hadoop 2.x clusters. Note that support for Hadoop 1.x clusters has been removed, see the table.
MATLAB Tall Array is supported on Spark^® enabled Hadoop 2.x clusters.
You can use tall arrays on Spark enabled Hadoop clusters supporting all architectures for the client, while supporting Linux and Mac architectures for the cluster. This includes cross-platform support.

Functionality	Result	Use Instead	Compatibility Considerations
Support for running MATLAB MapReduce on Hadoop 1.x clusters has been removed.	Errors	Use clusters that have Hadoop 2.x or higher installed to run MATLAB MapReduce.	Migrate MATLAB MapReduce code that runs on Hadoop 1.x to Hadoop 2.x.

Documentation

Configure a Hadoop Cluster

Hadoop Version Support

See Also

Related Topics

MATLAB Distributed Computing Server Documentation

Other Documentation

Support