You can run a communicating job using any type of scheduler. This section illustrates how to program communicating jobs for supported schedulers (MJS, local scheduler, Microsoft® Windows HPC Server (including CCS), Platform LSF®, PBS Pro®, or TORQUE).
To use this supported interface for communicating jobs, the following conditions must apply:
You must have a shared file system between client and cluster machines
You must be able to submit jobs directly to the scheduler from the client machine
When using any third-party scheduler for running a communicating
job, if all these conditions are not met, you must use the generic
scheduler interface. (Communicating jobs also include pmode
, parpool
, spmd
, and parfor
.)
See Program Communicating Jobs for a Generic Scheduler.
In this section a simple example illustrates the basic principles
of programming a communicating job with a third-party scheduler. In
this example, the worker whose labindex
value
is 1
creates a magic square comprised of a number
of rows and columns that is equal to the number of workers running
the job (numlabs
). In this
case, four workers run a communicating job with a 4-by-4 magic square.
The first worker broadcasts the matrix with labBroadcast
to
all the other workers , each of which calculates the sum of one column
of the matrix. All of these column sums are combined with the gplus
function to calculate the total
sum of the elements of the original magic square.
The function for this example is shown below.
function total_sum = colsum if labindex == 1 % Send magic square to other workers A = labBroadcast(1,magic(numlabs)) else % Receive broadcast on other workers A = labBroadcast(1) end % Calculate sum of column identified by labindex for this worker column_sum = sum(A(:,labindex)) % Calculate total sum by combining column sum from all workers total_sum = gplus(column_sum)
This function is saved as the file colsum.m
on
the path of the MATLAB® client. It will be sent to each worker
by the job’s AttachedFiles
property.
While this example has one worker create the magic square and
broadcast it to the other workers, there are alternative methods of
getting data to the workers. Each worker could create the matrix for
itself. Alternatively, each worker could read its part of the data
from a file on disk, the data could be passed in as an argument to
the task function, or the data could be sent in a file contained in
the job’s AttachedFiles
property. The
solution to choose depends on your network configuration and the nature
of the data.
As with independent jobs, you choose a profile and create a
cluster object in your MATLAB client by using the parcluster
function. There are slight
differences in the profiles, depending on the scheduler you use, but
using profiles to define as many properties as possible minimizes
coding differences between the scheduler types.
You can create and configure the cluster object with this code:
c = parcluster('MyProfile')
where 'MyProfile'
is the name of a cluster
profile for the type of scheduler you are using. Any required differences
for various cluster options are controlled in the profile. You can
have one or more separate profiles for each type of scheduler. For
complete details, see Discover Clusters and Use Cluster Profiles. Create or modify profiles
according to the instructions of your system administrator.
When your cluster object is defined, you create the job object
with the createCommunicatingJob
function.
The job Type
property must be set as 'SPMD'
when
you create the job.
cjob = createCommunicatingJob(c,'Type','SPMD');
The function file colsum.m
(created in Code the Task Function) is on the MATLAB client
path, but it has to be made available to the workers. One way to do
this is with the job’s AttachedFiles
property,
which can be set in the profile you used, or by:
cjob.AttachedFiles = {'colsum.m'}
Here you might also set other properties on the job, for example, setting the number of workers to use. Again, profiles might be useful in your particular situation, especially if most of your jobs require many of the same property settings. To run this example on four workers, you can established this in the profile, or by the following client code:
cjob.NumWorkersRange = 4
You create the job’s one task with the usual createTask
function. In this example,
the task returns only one argument from each worker, and there are
no input arguments to the colsum
function.
t = createTask(cjob, @colsum, 1, {})
Use submit
to run the
job.
submit(cjob)
Make the MATLAB client wait for the job to finish before
collecting the results. The results consist of one value from each
worker. The gplus
function
in the task shares data between the workers, so that each worker has
the same result.
wait(cjob) results = fetchOutputs(cjob) results = [136] [136] [136] [136]