Configure a Hadoop Cluster

This topic describes the requirements to allow jobs to run on an existing Hadoop® cluster.

The requirements are:

  1. MATLAB® Distributed Computing Server™ must be installed or available on the cluster nodes. See Install Products and Choose Cluster Configuration.

  2. You must have a Hadoop installation on the MATLAB client machine, that can submit normal (non-MATLAB) jobs to the cluster.

  3. The cluster must identify its user home directory as a valid location that the nodes can access. Set one of the following properties (depending on Hadoop version):

    • mapreduce.admin.user.home.dir — (for Hadoop version 1.X)

    • yarn.nodemanager.user-home-dir — (for Hadoop version 2.X)

  4. There are two Hadoop properties that must not be "final." (If properties are "final", they are locked to a fixed predefined value, and jobs cannot alter them.)

    The software needs to append values to these properties so that task processes are able to correctly run MATLAB. These properties are passed as part of the job metadata given to Hadoop during job submission.

    The properties are:

    • mapred.child.env — Controls environment variables for the job's task processes.

    • mapred.child.java.opts — Controls Java® system properties for the task processes (needed only for Hadoop version 1.X clusters).

  5. You must provide necessary information to the parallel.cluster.Hadoop object in the MATLAB client session. For example, see Run mapreduce on a Hadoop Cluster.

Was this topic helpful?