Define parallel execution environment for mapreduce and tall arrays
mapreducermapreducer(0)mapreducer(poolobj)mapreducer(hadoopCluster)mapreducer(mr)mr = mapreducer(___)mr = mapreducer(___,'ObjectVisibility','Off')mapreducer defines the execution environment
for mapreduce or tall arrays. Use the mapreducer function
to change the execution environment to use a different cluster or
to switch between serial and parallel development.
The default execution environment uses either the local MATLAB® session, or a parallel pool
if you have Parallel
Computing Toolbox™. If you have Parallel
Computing Toolbox installed,
when you use the tall or mapreduce functions, MATLAB automatically starts a parallel
pool of workers, unless you have changed the default preferences.
By default, a parallel pool uses local workers, typically one worker
for each core in your machine. If you turn off the Automatically create a parallel pool option, then you must explicitly start a pool if you
want to use parallel resources. See Specify Your Parallel Preferences.
When working with tall arrays, use mapreducer to
set the execution environment prior to creating the tall array. Tall
arrays are bound to the current global execution environment when
they are constructed. If you subsequently change the global execution
environment, then the tall array is invalid, and you must recreate
it.
In MATLAB, you do
not need to specify configuration settings using mapreducer because mapreduce algorithms
and tall array calculations automatically run in the local MATLAB
session only. If you also have Parallel
Computing Toolbox, then
you can use the additional mapreducer configuration
options listed on this page for running in parallel. If you have MATLAB Compiler™,
then you can use separate mapreducer configuration
options for running in deployed environments.
See: mapreducer in the MATLAB documentation, or mapreducer in the MATLAB Compiler documentation.
mapreducer with no input arguments creates
a new mapreducer execution environment with all
the defaults and sets this to be the current mapreduce or
tall array execution environment. You can use gcmr to
get the current mapreducer configuration.
If you have default preferences (Automatically create a parallel pool is enabled), and you have not opened a parallel pool,
then mapreducer opens a pool using the default
cluster profile, sets gcmr to a mapreducer based
on this pool and returns this mapreducer.
If you have opened a parallel pool, then mapreducer sets gcmr to
a mapreducer based on the current pool and returns this mapreducer.
If you have disabled Automatically
create a parallel pool, and you have not
opened a parallel pool, then mapreducer sets gcmr to
a mapreducer based on the local MATLAB session,
and mapreducer returns this mapreducer.
mapreducer(0) specifies that mapreduce or tall array
calculations run in the MATLAB client session without using any parallel
resources.
mapreducer( specifies
a parallel pool for parallel execution of poolobj)mapreduce or
tall arrays. poolobj is a parallel.Pool object. The default
pool is the current pool that is returned or opened by gcp.
mapreducer( specifies
a Hadoop® cluster for parallel execution of hadoopCluster)mapreduce or
tall arrays. hadoopCluster is a parallel.cluster.Hadoop object.
mapreducer( sets
the global execution environment for mr)mapreduce or
tall arrays, using a previously created MapReducer object, mr,
if its ObjectVisibility property is 'On'.
returns
a MapReducer object to specify the execution environment. You can
define several MapReducer objects, which enables you to swap execution
environments by passing one as an input argument to mr = mapreducer(___)mapreduce or mapreducer.
hides
the visibility of the MapReducer object, mr = mapreducer(___,'ObjectVisibility','Off')mr, using
any of the previous syntaxes. Use this syntax to create new MapReducer
objects without affecting the global execution environment of mapreduce.
If you want to develop in serial and not use local workers or your specified cluster, enter:
mapreducer(0);
mapreducer to change the execution environment
after creating a tall array, then the tall array is invalid and you
must recreate it. To use local workers or your specified cluster again,
enter:mapreducer(gcp);
mapreducer with Automatically Create a Parallel Pool Switched OffIf you have turned off the Automatically create a parallel pool option, then you must explicitly start a pool if you want to use parallel resources. See Specify Your Parallel Preferences for details.
The following code shows how you can use mapreducer without
input arguments to set the execution environment to your local MATLAB session and then specify a local
parallel pool:
>> mapreducer
>> parpool('local',1);
Starting parallel pool (parpool) using the 'local' profile ... connected to 1 workers.
>> gather(min(tall(rand(1000,1))))
Evaluating tall expression using the Local MATLAB Session: Evaluation completed in 0 sec ans = 5.2238e-04
One of the benefits of developing your algorithms with tall
arrays is that you only need to write the code once. You can develop
your code locally, then use mapreducer to scale
up and take advantage of the capabilities offered by Parallel
Computing Toolbox, MATLAB
Distributed Computing Server™,
or MATLAB Compiler, without needing to rewrite your
algorithm.
gcmr | gcp | mapreduce | parallel.cluster.Hadoop | tall