The Parallel Computing Toolbox™ enables us to execute our MATLAB® programs on a cluster of computers. In this example, we look at how to divide a large collection of MATLAB operations into smaller work units, called tasks, which the workers in our cluster can execute. We do this programmatically using the pctdemo_helper_split_scalar
and pctdemo_helper_split_vector
functions.
Prerequisites:
For further reading, see:
Like every other example in the Parallel Computing Toolbox, this example needs to know what cluster to use. We use the cluster identified by the default profile. See Cluster Profiles in the documentation for how to create new profile and how to change the default profile.
profileName = parallel.defaultClusterProfile();
One of the important advantages of the Parallel Computing Toolbox is that it builds very well on top of existing sequential code. It is actually beneficial to focus on sequential MATLAB code during the algorithm development, debugging and performance evaluation stages, because we then benefit from the rapid prototyping and interactive editing, debugging, and execution capabilities that MATLAB offers. During the development of the sequential code, we should separate the computations from the pre- and the post-processing, and make the core of the computations as simple and independent from the rest of the code as possible. Once our code is somewhat stable, it is time to look at distributing the computations. If we do a good job of creating modular sequential code for a coarse grained application, it should be rather easy to distribute those computations.
The Parallel Computing Toolbox supports the execution of coarse grained applications, that is, independent, simultaneous executions of a single program using multiple input arguments. We now try to show examples of what coarse grained computations often look like in MATLAB code and explain how to distribute those kinds of computations. We focus on two common scenarios, arising when the original, sequential MATLAB code consists of either
Invoking a single function several times, using different values for the input parameter. Computations of this nature area sometimes referred to as parameter sweeps, and the code often looks similar to the following MATLAB code:
for i = 1:n y(i) = f(x(i)); end
Invoking a single stochastic function several times. Suppose that the calculations of g(x)
involve random numbers, and the function thus returns a different value every time it is invoked (even though the input parameter x
remains the same). Such computations are sometimes referred to as Monte Carlo simulations, and the code often looks similar to the following MATLAB code:
for i = 1:n y(i) = g(x); end
It is quite possible that the parameter sweeps and simulations appear in a slightly different form in our sequential MATLAB code. For example, if the function f
is vectorized, the parameter sweep may simply appear as
y = f(x);
and the Monte Carlo simulation may appear as
y = g(x, n);
We use a very small example in what follows, using rand
as our function of interest. Imagine that we have a cluster with four workers, and we want to divide the function call rand(1, 10)
between them. This is by far simplest to do with parfor
because it divides the computations between the workers without our having to make any decisions about how to best do that.
We can expand the function call rand(1, 10)
into the corresponding for
loop:
for i = 1:10 y(i) = rand() end
The parallelization using parfor
simply consists of replacing the for
with a parfor
. If the parallel pool is open on the four workers, this executes on the workers:
parfor i = 1:10 y(i) = rand() end
Alternatively, we can use createJob
and createTask
to divide the execution of rand(1, 10)
between the four workers. We use four tasks, and have them generate random vectors of length 3, 3, 2, and 2. We have created a function called pctdemo_helper_split_scalar
that helps divide the generation of the 10 random numbers between the 4 tasks:
numRand = 10; % We want this many random numbers. numTasks = 4; % We want to split into this many tasks. clust = parcluster(profileName); job = createJob(clust); [numPerTask, numTasks] = pctdemo_helper_split_scalar(numRand, numTasks);
Notice how pctdemo_helper_split_scalar
splits the work of generating 10 random numbers between the numTasks
tasks. The elements of numPerTask
are all positive, the vector length is numTasks
, and its sum equals numRand
:
disp(numPerTask)
3 3 2 2
We can now write a for-loop that creates all the tasks in the job. Task i
is to create a matrix of the size 1-by-numPerTask(i). When all the tasks have been created, we submit the job, wait for it to finish, and then retrieve the results.
for i = 1:numTasks createTask(job, @rand, 1, {1, numPerTask(i)}); end submit(job); wait(job); y = fetchOutputs(job); cat(2, y{:}) % Concatenate all the cells in y into one column vector. delete(job);
ans = Columns 1 through 7 0.3246 0.6618 0.6349 0.2646 0.0968 0.5052 0.8847 Columns 8 through 10 0.9993 0.8939 0.2502
For the purposes of this example, let's use the sin
function as a very simple example. We let x
be a vector of length 10:
x = 0.1:0.1:1;
and now we want to distribute the calculations of sin(x)
on a cluster with 4 workers. As before, this is easiest to achieve with parfor
:
parfor i = 1:length(x) y(i) = sin(x(i)); end
If we decide to achieve this using jobs and tasks, we first need to determine how to divide the computations among the tasks. We have the 4 workers evaluate sin(x(1:3))
, sin(x(4:6))
, sin(x(7:8))
, and sin(x(9:10))
simultaneously. Because this kind of a division of a parameter sweep into separate tasks occurs frequently in our examples, we have created a function that does exactly that:
numTasks = 4; [xSplit, numTasks] = pctdemo_helper_split_vector(x, numTasks); celldisp(xSplit);
xSplit{1} = 0.1000 0.2000 0.3000 xSplit{2} = 0.4000 0.5000 0.6000 xSplit{3} = 0.7000 0.8000 xSplit{4} = 0.9000 1.0000
and it is now relatively easy to use createJob
and createTask
, to perform the computations:
job = createJob(clust); for i = 1:numTasks xThis = xSplit{i}; createTask(job, @sin, 1, {xThis}); end submit(job); wait(job); y = fetchOutputs(job); delete(job); cat(2, y{:}) % Concatenate all the cells in y into one column vector.
ans = Columns 1 through 7 0.0998 0.1987 0.2955 0.3894 0.4794 0.5646 0.6442 Columns 8 through 10 0.7174 0.7833 0.8415
The example involving the sin
function was particularly simple, because the sin
function is vectorized. We look at how to deal with nonvectorized functions in the Writing Task Functions example.
When using jobs and tasks, we have to decide how to divide our computations into appropriately sized tasks, paying attention to the following:
The number of function calls we want to make
The time it takes to execute each function call
The number of workers that we want to utilize in our cluster
We want at least as many tasks as there are workers so that we can possibly use all of them simultaneously, and this encourages us to break our work into small units. On the other hand, there is an overhead associated with each task, and that encourages us to minimize the number of tasks. Consequently, we arrive at the following:
If we only need to invoke our function a few times, and it takes only one or two seconds to evaluate it, we are better off not using the Parallel Computing Toolbox. Instead, we should simply perform our computations using MATLAB running on our local machine.
If we can evaluate our function very quickly, but we have to calculate many function values, we should let a task consist of calculating a number of function values. This way, we can potentially use many of our workers simultaneously, yet the task and job overhead is negligible relative to the running time. Note that we may have to write a new task function to do this, see the Writing Task Functions example. The rule of thumb is: The quicker we can evaluate the function, the more important it is to combine several function evaluations into a single task.
If it takes a long time to invoke our function, but we only need to calculate a few function values, it seems sensible to let one task consist of calculating one function value. This way, the startup cost of the job is negligible, and we can have several workers in our cluster work simultaneously on the tasks in our job.
If it takes a long time to invoke our function, and we need to calculate many function values, we can choose either of the two approaches we have presented: let a task consist of invoking our function once or several times.
There is a drawback to having many tasks in a single job: Due to network overhead, it may take a long time to create a job with a large number of tasks, and during that time the cluster may be idle. It is therefore advisable to split the MATLAB operations into as many tasks as needed, but to limit the number of tasks in a job to a reasonable number, say never more than a few hundred tasks in a job.