If your data is currently in the memory of your local machine,
you can use the distributed
function to distribute
an existing array from the client workspace to the workers of a parallel
pool. Distributed
arrays use the combined memory
of multiple workers in a parallel pool to store the elements of an
array. For alternative ways of partitioning data, see Distributing Arrays.You
can use distributed
arrays to scale up your big
data computation. Consider distributed
arrays
when you have access to a cluster, as you can combine the memory of
multiple machines in your cluster.
A distributed
array is a single variable,
split over multiple workers in your parallel pool. You can work with
this variable as one single entity, without having to worry about
its distributed nature. Explore the functionalities available for distributed
arrays
in the Parallel
Computing Toolbox™: Using MATLAB Functions on Distributed Arrays.
When you create a distributed
array, you
cannot control the details of the distribution. On the other hand, codistributed
arrays
allow you to control all aspects of distribution, including dimensions
and partitions. In the following, you learn how to create both distributed
and codistributed
arrays.
You can create a distributed array in different ways:
Use the distributed
function to distribute an
existing array from the client workspace to the workers of a parallel
pool.
You can directly construct a distributed array on
the workers. You do not need to first create the array in the client,
so that client workspace memory requirements are reduced. The functions
available include
, eye
(___,'distributed')
, etc.
For a full list, see the rand
(___,'distributed')distributed
object
reference page.
Create a codistributed
array
inside an spmd
statement, see Single Program Multiple Data (spmd). Then access it
as a distributed
array outside the spmd
statement.
This lets you use distribution schemes other than the default.
In this example, you create an array in the client workspace, then turn it into a distributed array:
parpool('local',4) % Create pool A = magic(4); % Create magic 4-by-4 matrix B = distributed(A); % Distribute to the workers B % View results in client. whos % B is a distributed array here. delete(gcp) % Stop pool
You have createdB
as a distributed
array,
split over the workers in your parallel pool. This is shown in the
figure.
Unlike distributed
arrays, codistributed
arrays
allow you to control all aspects of distribution, including dimensions
and partitions. You can create a codistributed
array
in different ways:
Partitioning a Larger Array — Start with a large array that is replicated on all workers, and partition it so that the pieces are distributed across the workers. This is most useful when you have sufficient memory to store the initial replicated array.
Building from Smaller Arrays — Start with smaller replicated arrays stored on each worker, and combine them so that each array becomes a segment of a larger codistributed array. This method reduces memory requirements as it lets you build a codistributed array from smaller pieces.
Using MATLAB Constructor Functions —
Use any of the MATLAB® constructor functions like rand
or zeros
with
a codistributor object argument. These functions offer a quick means
of constructing a codistributed array of any size in just one step.
In this example, you create a codistributed
array
inside an spmd
statement, using a nondefault distribution
scheme. First, define 1-D distribution along the third dimension,
with 4 parts on worker 1, and 12 parts on worker 2. Then create a
3-by-3-by-16 array of zeros.
parpool('local',2) % Create pool spmd codist = codistributor1d(3,[4,12]); Z = zeros(3,3,16,codist); Z = Z + labindex; end Z % View results in client. whos % Z is a distributed array here. delete(gcp) % Stop pool
For more details on codistributed arrays, see Working with Codistributed Arrays.