Although you create only one task for a communicating job, the
system copies this task for each worker that runs the job. For example,
if a communicating job runs on four workers, the Tasks
property
of the job contains four task objects. The first task in the job’s Tasks
property
corresponds to the task run by the worker whose labindex
is 1
, and
so on, so that the ID
property for the task object
and labindex
for the worker that ran that task
have the same value. Therefore, the sequence of results returned by
the fetchOutputs
function
corresponds to the value of labindex
and to the
order of tasks in the job’s Tasks
property.
Because code running in one worker for a communicating job can
block execution until some corresponding code executes on another
worker, the potential for deadlock exists in communicating jobs. This
is most likely to occur when transferring data between workers or
when making code dependent upon the labindex
in
an if
statement. Some examples illustrate common
pitfalls.
Suppose you have a codistributed array D
,
and you want to use the gather
function
to assemble the entire array in the workspace of a single worker.
if labindex == 1 assembled = gather(D); end
The reason this fails is because the gather
function
requires communication between all the workers across which the array
is distributed. When the if
statement limits execution
to a single worker, the other workers required for execution of the
function are not executing the statement. As an alternative, you can
use gather
itself to collect the data into the
workspace of a single worker: assembled = gather(D, 1)
.
In another example, suppose you want to transfer data from every
worker to the next worker on the right (defined as the next higher labindex
).
First you define for each worker what the workers on the left and
right are.
from_lab_left = mod(labindex - 2, numlabs) + 1; to_lab_right = mod(labindex, numlabs) + 1;
Then try to pass data around the ring.
labSend (outdata, to_lab_right); indata = labReceive(from_lab_left);
The reason this code might fail is because, depending on the
size of the data being transferred, the labSend
function
can block execution in a worker until the corresponding receiving
worker executes its labReceive
function.
In this case, all the workers are attempting to send at the same time,
and none are attempting to receive while labSend
has
them blocked. In other words, none of the workers get to their labReceive
statements
because they are all blocked at the labSend
statement.
To avoid this particular problem, you can use the labSendReceive
function.