Program Communicating Jobs for a Generic Scheduler

Introduction

This section discusses programming communicating jobs using the generic scheduler interface. This interface lets you execute jobs on your cluster with any scheduler you might have.

The principles of using the generic scheduler interface for communicating jobs are the same as those for independent jobs. The overview of the concepts and details of submit and decode functions for independent jobs are discussed fully in Program Independent Jobs for a Generic Scheduler.

The basic steps follow.

Code in the Client

Configure the Scheduler Object

Coding a communicating job for a generic scheduler involves the same procedure as coding an independent job.

  1. Create an object representing your cluster with parcluster.

  2. Set the appropriate properties on the cluster object if they are not defined in the profile. Because the scheduler itself is often common to many users and applications, it is probably best to use a profile for programming these properties. See Clusters and Cluster Profiles.

    Among the properties required for a communicating job is CommunicatingSubmitFcn. You can write your own communicating submit and decode functions, or use those come with the product for various schedulers and platforms; see the following section, Supplied Submit and Decode Functions.

  3. Use createCommunicatingJob to create a communicating job object for your cluster.

  4. Create a task, run the job, and retrieve the results as usual.

Supplied Submit and Decode Functions

There are several submit and decode functions provided with the toolbox for your use with the generic scheduler interface. These files are in the folder

matlabroot/toolbox/distcomp/examples/integration

In this folder are subfolders for each of several types of scheduler.

Depending on your network and cluster configuration, you might need to modify these files before they will work in your situation. Ask your system administrator for help.

At the time of publication, there are folders for PBS (pbs), and Platform LSF® (lsf) schedulers, generic UNIX®-based scripts (ssh), and Sun™ Grid Engine (sge). In addition, the pbs, lsf, and sge folders have subfolders called shared, nonshared, and remoteSubmission, which contain scripts for use in particular cluster configurations. Each of these subfolders contains a file called README, which provides instruction on where and how to use its scripts.

For each scheduler type, the folder (or configuration subfolder) contains wrappers, submit functions, and other job management scripts for independent and communicating jobs. For example, the folder matlabroot/toolbox/distcomp/examples/integration/pbs/shared contains the following files for use with a PBS scheduler:

FilenameDescription
independentSubmitFcn.mSubmit function for an independent job
communicatingSubmitFcn.mSubmit function for a communicating job
independentJobWrapper.shScript that is submitted to PBS to start workers that evaluate the tasks of an independent job
communicatingJobWrapper.shScript that is submitted to PBS to start workers that evaluate the tasks of a communicating job
deleteJobFcn.mScript to delete a job from the scheduler
extractJobId.mScript to get the job's ID from the scheduler
getJobStateFcn.mScript to get the job's state from the scheduler
getSubmitString.mScript to get the submission string for the scheduler

These files are all programmed to use the standard decode functions provided with the product, so they do not have specialized decode functions. For communicating jobs, the standard decode function provided with the product is parallel.cluster.generic.communicatingDecodeFcn. You can view the required variables in this file by typing

edit parallel.cluster.generic.communicatingDecodeFcn

The folders for other scheduler types contain similar files.

As more files or solutions for more schedulers might become available at any time, visit the product page at http://www.mathworks.com/products/distriben/. This Web page provides links to updates, supported schedulers, requirements, and contact information in case you have any questions.

Was this topic helpful?