With a parfor
-loop, it might be faster to
have each MATLAB® worker create its own arrays or portions of
them in parallel, rather than to create a large array in the client
before the loop and send it out to all the workers separately. Having
each worker create its own copy of these arrays inside the loop saves
the time of transferring the data from client to workers, because
all the workers can be creating it at the same time. This might challenge
your usual practice to do as much variable initialization before a for
-loop
as possible, so that you do not needlessly repeat it inside the loop.
Whether to create arrays before the parfor
-loop
or inside the parfor
-loop depends on the size of
the arrays, the time needed to create them, whether the workers need
all or part of the arrays, the number of loop iterations that each
worker performs, and other factors. While many for
-loops
can be directly converted to parfor
-loops, even
in these cases there might be other issues involved in optimizing
your code.
Another option to consider is to use the parallel.pool.Constant
function to establish
variables on the pool workers before the loop. These variables remain
on the workers after the loop finishes, and remain available for multiple parfor
-loops.
If a variable is initialized before a parfor
-loop,
then used inside the parfor
-loop, it has to be
passed to each MATLAB worker evaluating the loop iterations.
Only those variables used inside the loop are passed from the client
workspace. However, if all occurrences of the variable are indexed
by the loop variable, each worker receives only the part of the array
it needs.
Running your code on local workers might offer the convenience of testing your application without requiring the use of cluster resources. However, there are certain drawbacks or limitations with using local workers. Because the transfer of data does not occur over the network, transfer behavior on local workers might not be indicative of how it will typically occur over a network.
With local workers, because all the MATLAB worker sessions
are running on the same machine, you might not see any performance
improvement from a parfor
-loop regarding execution
time. This can depend on many factors, including how many processors
and cores your machine has. You might experiment to see if it is faster
to create the arrays before the loop (as shown on the left below),
rather than have each worker create its own arrays inside the loop
(as shown on the right).
Try the following examples running a parallel pool locally, and notice the difference in time execution for each loop. First open a local parallel pool:
parpool('local')
Then enter the following examples. (If you are viewing this documentation in the MATLAB help browser, highlight each segment of code below, right-click, and select Evaluate Selection in the context menu to execute the block in MATLAB. That way the time measurement will not include the time required to paste or type.)
tic; n = 200; M = magic(n); R = rand(n); parfor i = 1:n A(i) = sum(M(i,:).*R(n+1-i,:)); end toc | tic; n = 200; parfor i = 1:n M = magic(n); R = rand(n); A(i) = sum(M(i,:).*R(n+1-i,:)); end toc |
Running on a remote cluster, you might find different behavior, as workers can simultaneously create their arrays, saving transfer time. Therefore, code that is optimized for local workers might not be optimized for cluster workers, and vice versa.