Although you create only one task for a communicating job, the
system copies this task for each worker that runs the job. For example,
if a communicating job runs on four workers, the Tasks property
of the job contains four task objects. The first task in the job's Tasks property
corresponds to the task run by the worker whose labindex is 1, and
so on, so that the ID property for the task object
and labindex for the worker that ran that task
have the same value. Therefore, the sequence of results returned by
the fetchOutputs function
corresponds to the value of labindex and to the
order of tasks in the job's Tasks property.
Because code running in one worker for a communicating job can
block execution until some corresponding code executes on another
worker, the potential for deadlock exists in communicating jobs. This
is most likely to occur when transferring data between workers or
when making code dependent upon the labindex in
an if statement. Some examples illustrate common
pitfalls.
Suppose you have a codistributed array D,
and you want to use the gather function
to assemble the entire array in the workspace of a single worker.
if labindex == 1
assembled = gather(D);
endThe reason this fails is because the gather function
requires communication between all the workers across which the array
is distributed. When the if statement limits execution
to a single worker, the other workers required for execution of the
function are not executing the statement. As an alternative, you can
use gather itself to collect the data into the
workspace of a single worker: assembled = gather(D, 1).
In another example, suppose you want to transfer data from every
worker to the next worker on the right (defined as the next higher labindex).
First you define for each worker what the workers on the left and
right are.
from_lab_left = mod(labindex - 2, numlabs) + 1; to_lab_right = mod(labindex, numlabs) + 1;
Then try to pass data around the ring.
labSend (outdata, to_lab_right); indata = labReceive(from_lab_left);
The reason this code might fail is because, depending on the
size of the data being transferred, the labSend function
can block execution in a worker until the corresponding receiving
worker executes its labReceive function.
In this case, all the workers are attempting to send at the same time,
and none are attempting to receive while labSend has
them blocked. In other words, none of the workers get to their labReceive statements
because they are all blocked at the labSend statement.
To avoid this particular problem, you can use the labSendReceive function.