This section discusses programming communicating jobs using the generic scheduler interface. This interface lets you execute jobs on your cluster with any scheduler you might have.
The principles of using the generic scheduler interface for communicating jobs are the same as those for independent jobs. The overview of the concepts and details of submit and decode functions for independent jobs are discussed fully in Program Independent Jobs for a Generic Scheduler.
The basic steps follow.
Coding a communicating job for a generic scheduler involves the same procedure as coding an independent job.
Create an object representing your cluster with parcluster
.
Set the appropriate properties on the cluster object if they are not defined in the profile. Because the scheduler itself is often common to many users and applications, it is probably best to use a profile for programming these properties. See Clusters and Cluster Profiles.
Among the properties required for a communicating job is CommunicatingSubmitFcn
.
You can write your own communicating submit and decode functions,
or use those come with the product for various schedulers and platforms;
see the following section, Supplied Submit and Decode Functions.
Use createCommunicatingJob
to
create a communicating job object for your cluster.
Create a task, run the job, and retrieve the results as usual.
There are several submit and decode functions provided with the toolbox for your use with the generic scheduler interface. These files are in the folder
matlabroot/toolbox/distcomp/examples/integration
In this folder are subfolders for each of several types of scheduler.
Depending on your network and cluster configuration, you might need to modify these files before they will work in your situation. Ask your system administrator for help.
At the time of publication, there are folders for PBS (pbs
),
and Platform LSF® (lsf
) schedulers, generic UNIX®-based
scripts (ssh
), and Sun™ Grid Engine (sge
).
In addition, the pbs
, lsf
, and sge
folders
have subfolders called shared
, nonshared
,
and remoteSubmission
, which contain scripts for
use in particular cluster configurations. Each of these subfolders
contains a file called README
, which provides instruction
on where and how to use its scripts.
For each scheduler type, the folder (or configuration subfolder)
contains wrappers, submit functions, and other job management scripts
for independent and communicating jobs. For example, the folder
contains
the following files for use with a PBS scheduler:matlabroot
/toolbox/distcomp/examples/integration/pbs/shared
Filename | Description |
---|---|
independentSubmitFcn.m | Submit function for an independent job |
communicatingSubmitFcn.m | Submit function for a communicating job |
independentJobWrapper.sh | Script that is submitted to PBS to start workers that evaluate the tasks of an independent job |
communicatingJobWrapper.sh | Script that is submitted to PBS to start workers that evaluate the tasks of a communicating job |
deleteJobFcn.m | Script to delete a job from the scheduler |
extractJobId.m | Script to get the job's ID from the scheduler |
getJobStateFcn.m | Script to get the job's state from the scheduler |
getSubmitString.m | Script to get the submission string for the scheduler |
These files are all programmed to use the standard decode functions
provided with the product, so they do not have specialized decode
functions. For communicating jobs, the standard decode function provided
with the product is parallel.cluster.generic.communicatingDecodeFcn
.
You can view the required variables in this file by typing
edit parallel.cluster.generic.communicatingDecodeFcn
The folders for other scheduler types contain similar files.
As more files or solutions for more schedulers might become
available at any time, visit the product page at http://www.mathworks.com/products/distriben/
.
This Web page provides links to updates, supported schedulers, requirements,
and contact information in case you have any questions.