Create distributed array from data in client workspace
D = distributed(X)
D = distributed(X)
creates a distributed array from X
. X
can
be an array stored on the MATLAB client workspace or a datastore. D
is
a distributed array stored in parts on the workers of the open parallel
pool.
Constructing a distributed array from local data this way is
appropriate only if the MATLAB client can store the entirety of X
in
its memory. To construct large distributed arrays, use one of the
constructor methods such as
, ones
(___,'distributed')
,
etc.zeros
(___,'distributed')
If the input argument is already a distributed array, the result is the same as the input.
Use gather
to retrieve the distributed array
elements from the pool back to an array in the MATLAB workspace.
Create a small array and distribute it:
Nsmall = 50; D1 = distributed(magic(Nsmall));
Create a large distributed array directly, using a build method:
Nlarge = 1000;
D2 = rand(Nlarge,'distributed');
Retrieve elements of a distributed array, and note where the
arrays are located by their Class
:
D3 = gather(D2); whos
Name Size Bytes Class D1 50x50 733 distributed D2 1000x1000 733 distributed D3 1000x1000 8000000 double Nlarge 1x1 8 double Nsmall 1x1 8 double
This example shows how to create and load distributed arrays
using datastore
. You first create a datastore
using an example data set. This data set is too small to show equal
partitioning of the data over the workers. To simulate a real big
data set, artificially increase the size of the datastore using repmat
:
files = repmat({'airlinesmall.csv'}, 10, 1);
ds = tabularTextDatastore(files);
Select the example variables:
ds.SelectedVariableNames = {'DepTime','DepDelay'}; ds.TreatAsMissing = 'NA';
Create a distributed table by reading the datastore in parallel. Partition the datastore with one partition per worker. Each worker then reads all data from the corresponding partition. The files must be in a shared location accessible from the workers.
dt = distributed(ds);
Starting parallel pool (parpool) using the 'local' profile ... connected to 4 workers.
Finally, display summary information about the distributed table:
summary(dt)
Variables: DepTime: 1,235,230×1 double Values: min 1 max 2505 NaNs 23,510 DepDelay: 1,235,230×1 double Values: min -1036 max 1438 NaNs 23,510
A distributed array is created on the workers of the
existing parallel pool. If no pool exists, distributed
will
start a new parallel pool, unless the automatic starting of pools
is disabled in your parallel preferences. If there is no parallel
pool and distributed
cannot start one, the result
is the full array in the client workspace.