parfor
-Loops and for
-LoopsYou cannot use a parfor
-loop inside another parfor
-loop.
As an example, the following nesting of parfor
-loops
is not allowed:
parfor i = 1:10 parfor j = 1:5 ... end end
You cannot nest parfor
directly within
another parfor
-loop. A parfor
-loop
can call a function that contains a parfor
-loop,
but you do not get any additional parallelism.
parfor
inside
another parfor
-loop:
You cannot nest parfor
-loops
because parallellization can be performed at only one level. Therefore,
choose which loop to run in parallel, and convert the other loop to
a for
-loop.
Consider the following performance issues when dealing with nested loops:
Parallel processing incurs overhead. Generally, you
should run the outer loop in parallel, because overhead only occurs
once. If you run the inner loop in parallel, then each of the multiple parfor
executions
incurs an overhead. See Convert Nested for-Loops to parfor for an example how to
measure parallel overhead.
Make sure that the number of iterations exceeds the number of workers. Otherwise, you do not use all available workers.
Try to balance the parfor
-loop
iteration times. parfor
tries to compensate for
some load imbalance.
Always run the outermost loop in parallel, because you reduce parallel overhead.
You can also use a function that uses parfor
and
embed it in a parfor
-loop. Parallellization occurs
only at the outer level. In the following example, call a function MyFun.m
inside
the outer parfor
-loop. The inner parfor
-loop
embedded in MyFun.m
runs sequentially, not in
parallel.
parfor i = 1:10 MyFun(i) end function MyFun(i) parfor j = 1:5 ... end end
Nested parfor
-loops generally give you
no computational benefit.
A typical use of nested loops is to step through an array using a one-loop variable to index one dimension, and a nested-loop variable to index another dimension. The basic form is:
X = zeros(n,m); for a = 1:n for b = 1:m X(a,b) = fun(a,b) end end
The following code shows a simple example. Use tic
and toc
to
measure the computing time needed.
A = 100; tic for i = 1:100 for j = 1:100 a(i,j) = max(abs(eig(rand(A)))); end end toc
Elapsed time is 49.376732 seconds.
You can parallelize either of the nested loops, but you cannot run both in parallel. The reason is that the workers in a parallel pool cannot start or access further parallel pools.
If the loop counted by i
is converted to
a parfor
-loop, then each worker in the pool executes
the nested loops using the j
loop counter. The j
loops
themselves cannot run as a parfor
on each worker.
Because parallel processing incurs overhead, you must choose
carefully whether you want to convert either the inner or the outer for
-loop
to a parfor
-loop. The following example shows how
to measure the parallel overhead.
First convert only the outer for
-loop
to a parfor
-loop. Use tic
and toc
to
measure the computing time needed. Use ticBytes
and tocBytes
to
measure how much data is transferred to and from the workers in the
parallel pool.
Run the new code, and run it again. The first run is slower than subsequent runs, because the parallel pool takes some time to start and make the code available to the workers.
A = 100; tic ticBytes(gcp); parfor i = 1:100 for j = 1:100 a(i,j) = max(abs(eig(rand(A)))); end end tocBytes(gcp) toc
BytesSentToWorkers BytesReceivedFromWorkers __________________ ________________________ 1 32984 24512 2 33784 25312 3 33784 25312 4 34584 26112 Total 1.3514e+05 1.0125e+05 Elapsed time is 14.130674 seconds.
Next convert only the inner loop to a parfor
-loop.
Measure the time needed and data transferred as in the previous case.
A = 100; tic ticBytes(gcp); for i = 1:100 parfor j = 1:100 a(i,j) = max(abs(eig(rand(A)))); end end tocBytes(gcp) toc
BytesSentToWorkers BytesReceivedFromWorkers __________________ ________________________ 1 1.3496e+06 5.487e+05 2 1.3496e+06 5.4858e+05 3 1.3677e+06 5.6034e+05 4 1.3476e+06 5.4717e+05 Total 5.4144e+06 2.2048e+06 Elapsed time is 48.631737 seconds.
If you convert the inner loop to a parfor
-loop,
both the time and amount of data transferred are much greater than
in the parallel outer loop. In this case, the elapsed time is almost
the same as in the nested for
-loop example. The
speedup is smaller than running the outer loop in parallel, because
you have more data transfer and thus more parallel overhead. Therefore
if you execute the inner loop in parallel, you
get no computational benefit compared to running the serial for
-loop.
If you want to reduce parallel overhead and speed up your computation, run the outer loop in parallel.
If you convert the inner loop instead,
then each iteration of the outer loop initiates a separate parfor
-loop.
That is, the inner loop conversion creates 100 parfor
-loops.
Each of the multiple parfor
executions incurs overhead.
If you want to reduce parallel overhead, you should run the outer
loop in parallel instead, because overhead only occurs once.
If you want to speed up your code, always run the outer loop in parallel, because you reduce parallel overhead.
If you want to convert a nested for
-loop
to a parfor
-loop, you must ensure that your loop
variables are properly classified, see Troubleshoot Variables in parfor-Loops.
For proper variable classification, you must define the range of a for
-loop
nested in a parfor
-loop by constant numbers or
variables. In the following example, the code on the left does not
work because you define the upper limit of the for
-loop
by a function call. The code on the right provides a workaround by
first defining a broadcast or constant variable outside the parfor
-loop:
Invalid | Valid |
---|---|
A = zeros(100, 200); parfor i = 1:size(A, 1) for j = 1:size(A, 2) A(i, j) = i + j; end end | A = zeros(100, 200); n = size(A, 2); parfor i = 1:size(A,1) for j = 1:n A(i, j) = i + j; end end |
The index variable for the nested for
-loop
must never be explicitly assigned other than in its for
statement.
When using the nested for
-loop variable for indexing
the sliced array, you must use the variable in plain form, not as
part of an expression. For example, the following code on the left
does not work, but the code on the right does:
Invalid | Valid |
---|---|
A = zeros(4, 11); parfor i = 1:4 for j = 1:10 A(i, j + 1) = i + j; end end | A = zeros(4, 11); parfor i = 1:4 for j = 2:11 A(i, j) = i + j - 1; end end |
If you use a nested for
-loop to index into
a sliced array, you cannot use that array elsewhere in the parfor
-loop.
In the following example, the code on the left does not work because A
is
sliced and indexed inside the nested for
-loop.
The code on the right works because v
is assigned
to A
outside of the nested loop:
Invalid | Valid |
---|---|
A = zeros(4, 10); parfor i = 1:4 for j = 1:10 A(i, j) = i + j; end disp(A(i, 1)) end | A = zeros(4, 10); parfor i = 1:4 v = zeros(1, 10); for j = 1:10 v(j) = i + j; end disp(v(1)) A(i, :) = v; end |
Suppose that you use multiple for
-loops
(not nested inside each other) inside a parfor
-loop,
to index into a single sliced array. In this case, the for
-loops
must loop over the same range of values. A sliced output variable
can be used in only one nested for-loop. In the following example,
the code on the left does not work because j
and k
loop
over different values. The code on the right works to index different
portions of the sliced array A
:
Invalid | Valid |
---|---|
A = zeros(4, 10); parfor i = 1:4 for j = 1:5 A(i, j) = i + j; end for k = 6:10 A(i, k) = pi; end end | A = zeros(4, 10); parfor i = 1:4 for j = 1:10 if j < 6 A(i, j) = i + j; else A(i, j) = pi; end end end |
The body of a parfor
-loop cannot make reference
to a nested function, see Nested Functions (MATLAB). However, it can call a nested function
by a function handle. Try the following example. Note that A(idx)
= nfcn(idx)
in the parfor
-loop does
not work. You must use feval
to invoke the fcn
handle
in the parfor
-loop body:
function A = pfeg function out = nfcn(in) out = 1 + in; end fcn = @nfcn; parfor idx = 1:10 A(idx) = feval(fcn, idx); end end
>> pfeg Starting parallel pool (parpool) using the 'local' profile ... connected to 4 workers. ans = 2 3 4 5 6 7 8 9 10 11
If you use function handles that refer to nested functions inside
a parfor
-loop, then the values of externally scoped
variables are not synchronized among the workers. For more information
on handles, see Copying Objects (MATLAB).
spmd
StatementsThe body of a parfor
-loop cannot contain
an spmd
statement, and an spmd
statement
cannot contain a parfor
-loop.
The body of a parfor
-loop cannot contain break
or return
statements.
Consider parfeval
or parfevalOnAll
instead.
You can call P-code script files from within a parfor
-loop,
but P-code script cannot contain a parfor
-loop.
However, if a script introduces a variable, you cannot call
this script from within a parfor
-loop or spmd
statement.
The reason is that this script would cause a transparency violation.
For more details, see Ensure Transparency in parfor-Loops.
break
| feval
| parfeval
| parfevalOnAll
| parfor
| return