Nested parfor-Loops and for-Loops

Parallellizing Nested Loops

You cannot use a parfor-loop inside another parfor-loop. As an example, the following nesting of parfor-loops is not allowed:

   parfor i = 1:10
       parfor j = 1:5
           ...
       end
   end

Tip

You cannot nest parfor directly within another parfor-loop. A parfor-loop can call a function that contains a parfor-loop, but you do not get any additional parallelism.

Code Analyzer in the MATLAB® Editor flags the use of parfor inside another parfor-loop:

You cannot nest parfor-loops because parallellization can be performed at only one level. Therefore, choose which loop to run in parallel, and convert the other loop to a for-loop.

Consider the following performance issues when dealing with nested loops:

  • Parallel processing incurs overhead. Generally, you should run the outer loop in parallel, because overhead only occurs once. If you run the inner loop in parallel, then each of the multiple parfor executions incurs an overhead. See Convert Nested for-Loops to parfor for an example how to measure parallel overhead.

  • Make sure that the number of iterations exceeds the number of workers. Otherwise, you do not use all available workers.

  • Try to balance the parfor-loop iteration times. parfor tries to compensate for some load imbalance.

Tip

Always run the outermost loop in parallel, because you reduce parallel overhead.

You can also use a function that uses parfor and embed it in a parfor-loop. Parallellization occurs only at the outer level. In the following example, call a function MyFun.m inside the outer parfor-loop. The inner parfor-loop embedded in MyFun.m runs sequentially, not in parallel.

parfor i = 1:10
    MyFun(i)
end

function MyFun(i)
parfor j = 1:5
    ...
end
end

Tip

Nested parfor-loops generally give you no computational benefit.

Convert Nested for-Loops to parfor

A typical use of nested loops is to step through an array using a one-loop variable to index one dimension, and a nested-loop variable to index another dimension. The basic form is:

X = zeros(n,m);
for a = 1:n
    for b = 1:m
        X(a,b) = fun(a,b)
    end
end

The following code shows a simple example. Use tic and toc to measure the computing time needed.

A = 100;
tic
for i = 1:100
    for j = 1:100
        a(i,j) = max(abs(eig(rand(A))));
    end
end
toc
Elapsed time is 49.376732 seconds.

You can parallelize either of the nested loops, but you cannot run both in parallel. The reason is that the workers in a parallel pool cannot start or access further parallel pools.

If the loop counted by i is converted to a parfor-loop, then each worker in the pool executes the nested loops using the j loop counter. The j loops themselves cannot run as a parfor on each worker.

Because parallel processing incurs overhead, you must choose carefully whether you want to convert either the inner or the outer for-loop to a parfor-loop. The following example shows how to measure the parallel overhead.

First convert only the outer for-loop to a parfor-loop. Use tic and toc to measure the computing time needed. Use ticBytes and tocBytes to measure how much data is transferred to and from the workers in the parallel pool.

Run the new code, and run it again. The first run is slower than subsequent runs, because the parallel pool takes some time to start and make the code available to the workers.

A = 100;
tic
ticBytes(gcp);
parfor i = 1:100
    for j = 1:100
        a(i,j) = max(abs(eig(rand(A))));
    end
end
tocBytes(gcp)
toc
             BytesSentToWorkers    BytesReceivedFromWorkers
             __________________    ________________________

    1             32984                 24512              
    2             33784                 25312              
    3             33784                 25312              
    4             34584                 26112              
    Total    1.3514e+05            1.0125e+05              

Elapsed time is 14.130674 seconds.

Next convert only the inner loop to a parfor-loop. Measure the time needed and data transferred as in the previous case.

A = 100;
tic
ticBytes(gcp);
for i = 1:100
    parfor j = 1:100
        a(i,j) = max(abs(eig(rand(A))));
    end
end
tocBytes(gcp)
toc
             BytesSentToWorkers    BytesReceivedFromWorkers
             __________________    ________________________

    1        1.3496e+06             5.487e+05              
    2        1.3496e+06            5.4858e+05              
    3        1.3677e+06            5.6034e+05              
    4        1.3476e+06            5.4717e+05              
    Total    5.4144e+06            2.2048e+06              

Elapsed time is 48.631737 seconds.

If you convert the inner loop to a parfor-loop, both the time and amount of data transferred are much greater than in the parallel outer loop. In this case, the elapsed time is almost the same as in the nested for-loop example. The speedup is smaller than running the outer loop in parallel, because you have more data transfer and thus more parallel overhead. Therefore if you execute the inner loop in parallel, you get no computational benefit compared to running the serial for-loop.

If you want to reduce parallel overhead and speed up your computation, run the outer loop in parallel.

If you convert the inner loop instead, then each iteration of the outer loop initiates a separate parfor-loop. That is, the inner loop conversion creates 100 parfor-loops. Each of the multiple parfor executions incurs overhead. If you want to reduce parallel overhead, you should run the outer loop in parallel instead, because overhead only occurs once.

Tip

If you want to speed up your code, always run the outer loop in parallel, because you reduce parallel overhead.

Nested Loops: Requirements and Limitations

If you want to convert a nested for-loop to a parfor-loop, you must ensure that your loop variables are properly classified, see Troubleshoot Variables in parfor-Loops. For proper variable classification, you must define the range of a for-loop nested in a parfor-loop by constant numbers or variables. In the following example, the code on the left does not work because you define the upper limit of the for-loop by a function call. The code on the right provides a workaround by first defining a broadcast or constant variable outside the parfor-loop:

InvalidValid
A = zeros(100, 200);
parfor i = 1:size(A, 1)
   for j = 1:size(A, 2)
      A(i, j) = i + j;
   end
end
A = zeros(100, 200);
n = size(A, 2);
parfor i = 1:size(A,1)
   for j = 1:n
      A(i, j) = i + j;
   end
end

The index variable for the nested for-loop must never be explicitly assigned other than in its for statement. When using the nested for-loop variable for indexing the sliced array, you must use the variable in plain form, not as part of an expression. For example, the following code on the left does not work, but the code on the right does:

InvalidValid
A = zeros(4, 11);
parfor i = 1:4
   for j = 1:10
      A(i, j + 1) = i + j;
   end
end
A = zeros(4, 11);
parfor i = 1:4
   for j = 2:11
      A(i, j) = i + j - 1;
   end
end

If you use a nested for-loop to index into a sliced array, you cannot use that array elsewhere in the parfor-loop. In the following example, the code on the left does not work because A is sliced and indexed inside the nested for-loop. The code on the right works because v is assigned to A outside of the nested loop:

InvalidValid
A = zeros(4, 10);
parfor i = 1:4
    for j = 1:10
        A(i, j) = i + j;
    end
    disp(A(i, 1))
end
A = zeros(4, 10);
parfor i = 1:4
    v = zeros(1, 10);
    for j = 1:10
        v(j) = i + j;
    end
    disp(v(1))
    A(i, :) = v;
end

Suppose that you use multiple for-loops (not nested inside each other) inside a parfor-loop, to index into a single sliced array. In this case, the for-loops must loop over the same range of values. A sliced output variable can be used in only one nested for-loop. In the following example, the code on the left does not work because j and k loop over different values. The code on the right works to index different portions of the sliced array A:

InvalidValid
A = zeros(4, 10);
parfor i = 1:4
   for j = 1:5
      A(i, j) = i + j;
   end
   for k = 6:10
      A(i, k) = pi;
   end
end
A = zeros(4, 10);
parfor i = 1:4
   for j = 1:10
      if j < 6
         A(i, j) = i + j;
      else
         A(i, j) = pi;
      end
   end
end

Nested Functions

The body of a parfor-loop cannot make reference to a nested function, see Nested Functions (MATLAB). However, it can call a nested function by a function handle. Try the following example. Note that A(idx) = nfcn(idx) in the parfor-loop does not work. You must use feval to invoke the fcn handle in the parfor-loop body:

function A = pfeg
    function out = nfcn(in)
        out = 1 + in;
    end

fcn = @nfcn;

parfor idx = 1:10
    A(idx) = feval(fcn, idx); 
end

end
>> pfeg
Starting parallel pool (parpool) using the 'local' profile ... connected to 4 workers.

ans =

     2     3     4     5     6     7     8     9    10    11

Tip

If you use function handles that refer to nested functions inside a parfor-loop, then the values of externally scoped variables are not synchronized among the workers. For more information on handles, see Copying Objects (MATLAB).

Nested spmd Statements

The body of a parfor-loop cannot contain an spmd statement, and an spmd statement cannot contain a parfor-loop.

Break and Return Statements

The body of a parfor-loop cannot contain break or return statements. Consider parfeval or parfevalOnAll instead.

P-Code Scripts

You can call P-code script files from within a parfor-loop, but P-code script cannot contain a parfor-loop.

However, if a script introduces a variable, you cannot call this script from within a parfor-loop or spmd statement. The reason is that this script would cause a transparency violation. For more details, see Ensure Transparency in parfor-Loops.

See Also

| | | | |

Related Topics

Was this topic helpful?