Run Single Programs on Multiple Data Sets

Introduction

The single program multiple data (spmd) language construct allows seamless interleaving of serial and parallel programming. The spmd statement lets you define a block of code to run simultaneously on multiple workers. Variables assigned inside the spmd statement on the workers allow direct access to their values from the client by reference via Composite objects.

This chapter explains some of the characteristics of spmd statements and Composite objects.

When to Use spmd

The “single program” aspect of spmd means that the identical code runs on multiple workers. You run one program in the MATLAB® client, and those parts of it labeled as spmd blocks run on the workers. When the spmd block is complete, your program continues running in the client.

The “multiple data” aspect means that even though the spmd statement runs identical code on all workers, each worker can have different, unique data for that code. So multiple data sets can be accommodated by multiple workers.

Typical applications appropriate for spmd are those that require running simultaneous execution of a program on multiple data sets, when communication or synchronization is required between the workers. Some common cases are:

  • Programs that take a long time to execute — spmd lets several workers compute solutions simultaneously.

  • Programs operating on large data sets — spmd lets the data be distributed to multiple workers.

Define an spmd Statement

The general form of an spmd statement is:

spmd
    <statements>
end

Note

If a parallel pool is not running, spmd creates a pool using your default cluster profile, if your parallel preferences are set accordingly.

The block of code represented by <statements> executes in parallel simultaneously on all workers in the parallel pool. If you want to limit the execution to only a portion of these workers, specify exactly how many workers to run on:

spmd (n)
    <statements>
end

This statement requires that n workers run the spmd code. n must be less than or equal to the number of workers in the open parallel pool. If the pool is large enough, but n workers are not available, the statement waits until enough workers are available. If n is 0, the spmd statement uses no workers, and runs locally on the client, the same as if there were not a pool currently running.

You can specify a range for the number of workers:

spmd (m,n)
    <statements>
end

In this case, the spmd statement requires a minimum of m workers, and it uses a maximum of n workers.

If it is important to control the number of workers that execute your spmd statement, set the exact number in the cluster profile or with the spmd statement, rather than using a range.

For example, create a random matrix on three workers:

spmd (3)
    R = rand(4,4);
end

Note

All subsequent examples in this chapter assume that a parallel pool is open and remains open between sequences of spmd statements.

Unlike a parfor-loop, the workers used for an spmd statement each have a unique value for labindex. This lets you specify code to be run on only certain workers, or to customize execution, usually for the purpose of accessing unique data.

For example, create different sized arrays depending on labindex:

spmd (3)
    if labindex==1 
        R = rand(9,9);
      else
        R = rand(4,4);
    end
end

Load unique data on each worker according to labindex, and use the same function on each worker to compute a result from the data:

spmd (3)
    labdata = load(['datafile_' num2str(labindex) '.ascii'])
    result = MyFunction(labdata)
end

The workers executing an spmd statement operate simultaneously and are aware of each other. As with a communicating job, you are allowed to directly control communications between the workers, transfer data between them, and use codistributed arrays among them.

For example, use a codistributed array in an spmd statement:

spmd (3)
    RR = rand(30, codistributor());
end

Each worker has a 30-by-10 segment of the codistributed array RR. For more information about codistributed arrays, see Working with Codistributed Arrays.

Display Output

When running an spmd statement on a parallel pool, all command-line output from the workers displays in the client Command Window. Because the workers are MATLAB sessions without displays, any graphical output (for example, figure windows) from the pool does not display at all.

MATLAB Path

All workers executing an spmd statement must have the same MATLAB search path as the client, so that they can execute any functions called in their common block of code. Therefore, whenever you use cd, addpath, or rmpath on the client, it also executes on all the workers, if possible. For more information, see the parpool reference page. When the workers are running on a different platform than the client, use the function pctRunOnAll to properly set the MATLAB path on all workers.

Error Handling

When an error occurs on a worker during the execution of an spmd statement, the error is reported to the client. The client tries to interrupt execution on all workers, and throws an error to the user.

Errors and warnings produced on workers are annotated with the worker ID (labindex) and displayed in the client’s Command Window in the order in which they are received by the MATLAB client.

The behavior of lastwarn is unspecified at the end of an spmd if used within its body.

spmd Limitations

Nested Functions

Inside a function, the body of an spmd statement cannot make any direct reference to a nested function (MATLAB). However, it can call a nested function by means of a variable defined as a function handle to the nested function.

Because the spmd body executes on workers, variables that are updated by nested functions called inside an spmd statement do not get updated in the workspace of the outer function.

Anonymous Functions

The body of an spmd statement cannot define an anonymous function (MATLAB). However, it can reference an anonymous function by means of a function handle.

Nested spmd Statements

The body of an spmd statement cannot directly contain another spmd. However, it can call a function that contains another spmd statement. The inner spmd statement does not run in parallel in another parallel pool, but runs serially in a single thread on the worker running its containing function.

Nested parfor-Loops

The body of a parfor-loop cannot contain an spmd statement, and an spmd statement cannot contain a parfor-loop.

Break and Return Statements

The body of an spmd statement cannot contain break or return statements.

Global and Persistent Variables

The body of an spmd statement cannot contain global or persistent variable declarations.

Related Topics

Was this topic helpful?