Dividing MATLAB® Computations into Tasks

The Parallel Computing Toolbox™ enables us to execute our MATLAB® programs on a cluster of computers. In this example, we look at how to divide a large collection of MATLAB operations into smaller work units, called tasks, which the workers in our cluster can execute. We do this programmatically using the pctdemo_helper_split_scalar and pctdemo_helper_split_vector functions.

Prerequisites:

For further reading, see:

Obtaining the Profile

Like every other example in the Parallel Computing Toolbox, this example needs to know what cluster to use. We use the cluster identified by the default profile. See Cluster ProfilesCluster Profiles in the documentation for how to create new profile and how to change the default profile.

profileName = parallel.defaultClusterProfile();

Starting with Sequential Code

One of the important advantages of the Parallel Computing Toolbox is that it builds very well on top of existing sequential code. It is actually beneficial to focus on sequential MATLAB code during the algorithm development, debugging and performance evaluation stages, because we then benefit from the rapid prototyping and interactive editing, debugging, and execution capabilities that MATLAB offers. During the development of the sequential code, we should separate the computations from the pre- and the post-processing, and make the core of the computations as simple and independent from the rest of the code as possible. Once our code is somewhat stable, it is time to look at distributing the computations. If we do a good job of creating modular sequential code for a coarse grained application, it should be rather easy to distribute those computations.

Analyzing the Sequential Problem

The Parallel Computing Toolbox supports the execution of coarse grained applications, that is, independent, simultaneous executions of a single program using multiple input arguments. We now try to show examples of what coarse grained computations often look like in MATLAB code and explain how to distribute those kinds of computations. We focus on two common scenarios, arising when the original, sequential MATLAB code consists of either

  • Invoking a single function several times, using different values for the input parameter. Computations of this nature area sometimes referred to as parameter sweeps, and the code often looks similar to the following MATLAB code:

       for i = 1:n
         y(i) = f(x(i));
       end
  • Invoking a single stochastic function several times. Suppose that the calculations of g(x) involve random numbers, and the function thus returns a different value every time it is invoked (even though the input parameter x remains the same). Such computations are sometimes referred to as Monte Carlo simulations, and the code often looks similar to the following MATLAB code:

       for i = 1:n
         y(i) = g(x);
       end

It is quite possible that the parameter sweeps and simulations appear in a slightly different form in our sequential MATLAB code. For example, if the function f is vectorized, the parameter sweep may simply appear as

       y = f(x);

and the Monte Carlo simulation may appear as

       y = g(x, n);

Example: Dividing a Simulation into Tasks

We use a very small example in what follows, using rand as our function of interest. Imagine that we have a cluster with four workers, and we want to divide the function call rand(1, 10) between them. This is by far simplest to do with parfor because it divides the computations between the workers without our having to make any decisions about how to best do that.

We can expand the function call rand(1, 10) into the corresponding for loop:

       for i = 1:10
         y(i) = rand()
       end

The parallelization using parfor simply consists of replacing the for with a parfor. If the parallel pool is open on the four workers, this executes on the workers:

       parfor i = 1:10
         y(i) = rand()
       end

Alternatively, we can use createJob and createTask to divide the execution of rand(1, 10) between the four workers. We use four tasks, and have them generate random vectors of length 3, 3, 2, and 2. We have created a function called pctdemo_helper_split_scalar that helps divide the generation of the 10 random numbers between the 4 tasks:

numRand = 10; % We want this many random numbers.
numTasks = 4; % We want to split into this many tasks.
clust = parcluster(profileName);
job = createJob(clust);
[numPerTask, numTasks] = pctdemo_helper_split_scalar(numRand, numTasks);

Notice how pctdemo_helper_split_scalar splits the work of generating 10 random numbers between the numTasks tasks. The elements of numPerTask are all positive, the vector length is numTasks, and its sum equals numRand:

disp(numPerTask)
     3
     3
     2
     2

We can now write a for-loop that creates all the tasks in the job. Task i is to create a matrix of the size 1-by-numPerTask(i). When all the tasks have been created, we submit the job, wait for it to finish, and then retrieve the results.

for i = 1:numTasks
   createTask(job, @rand, 1, {1, numPerTask(i)});
end
submit(job);
wait(job);
y = fetchOutputs(job);
cat(2, y{:})   % Concatenate all the cells in y into one column vector.
delete(job);
ans =

  Columns 1 through 7

    0.3246    0.6618    0.6349    0.2646    0.0968    0.5052    0.8847

  Columns 8 through 10

    0.9993    0.8939    0.2502

Example: Dividing a Parameter Sweep into Tasks

For the purposes of this example, let's use the sin function as a very simple example. We let x be a vector of length 10:

x = 0.1:0.1:1;

and now we want to distribute the calculations of sin(x) on a cluster with 4 workers. As before, this is easiest to achieve with parfor:

      parfor i = 1:length(x)
          y(i) = sin(x(i));
      end

If we decide to achieve this using jobs and tasks, we first need to determine how to divide the computations among the tasks. We have the 4 workers evaluate sin(x(1:3)), sin(x(4:6)), sin(x(7:8)), and sin(x(9:10)) simultaneously. Because this kind of a division of a parameter sweep into separate tasks occurs frequently in our examples, we have created a function that does exactly that:

numTasks = 4;
[xSplit, numTasks] = pctdemo_helper_split_vector(x, numTasks);
celldisp(xSplit);
 
xSplit{1} =
 
    0.1000    0.2000    0.3000

 
 
xSplit{2} =
 
    0.4000    0.5000    0.6000

 
 
xSplit{3} =
 
    0.7000    0.8000

 
 
xSplit{4} =
 
    0.9000    1.0000

 

and it is now relatively easy to use createJob and createTask, to perform the computations:

job = createJob(clust);
for i = 1:numTasks
   xThis = xSplit{i};
   createTask(job, @sin, 1, {xThis});
end
submit(job);
wait(job);
y = fetchOutputs(job);
delete(job);
cat(2, y{:})  % Concatenate all the cells in y into one column vector.
ans =

  Columns 1 through 7

    0.0998    0.1987    0.2955    0.3894    0.4794    0.5646    0.6442

  Columns 8 through 10

    0.7174    0.7833    0.8415

More on Parameter Sweeps

The example involving the sin function was particularly simple, because the sin function is vectorized. We look at how to deal with nonvectorized functions in the Writing Task Functions example.

Dividing MATLAB Operations into Tasks: Best Practices

When using jobs and tasks, we have to decide how to divide our computations into appropriately sized tasks, paying attention to the following:

  • The number of function calls we want to make

  • The time it takes to execute each function call

  • The number of workers that we want to utilize in our cluster

We want at least as many tasks as there are workers so that we can possibly use all of them simultaneously, and this encourages us to break our work into small units. On the other hand, there is an overhead associated with each task, and that encourages us to minimize the number of tasks. Consequently, we arrive at the following:

  • If we only need to invoke our function a few times, and it takes only one or two seconds to evaluate it, we are better off not using the Parallel Computing Toolbox. Instead, we should simply perform our computations using MATLAB running on our local machine.

  • If we can evaluate our function very quickly, but we have to calculate many function values, we should let a task consist of calculating a number of function values. This way, we can potentially use many of our workers simultaneously, yet the task and job overhead is negligible relative to the running time. Note that we may have to write a new task function to do this, see the Writing Task Functions example. The rule of thumb is: The quicker we can evaluate the function, the more important it is to combine several function evaluations into a single task.

  • If it takes a long time to invoke our function, but we only need to calculate a few function values, it seems sensible to let one task consist of calculating one function value. This way, the startup cost of the job is negligible, and we can have several workers in our cluster work simultaneously on the tasks in our job.

  • If it takes a long time to invoke our function, and we need to calculate many function values, we can choose either of the two approaches we have presented: let a task consist of invoking our function once or several times.

There is a drawback to having many tasks in a single job: Due to network overhead, it may take a long time to create a job with a large number of tasks, and during that time the cluster may be idle. It is therefore advisable to split the MATLAB operations into as many tasks as needed, but to limit the number of tasks in a job to a reasonable number, say never more than a few hundred tasks in a job.

Was this topic helpful?