This section details the steps of a typical programming session with Parallel Computing Toolbox™ software using a supported job scheduler on a cluster. Supported schedulers include the MATLAB job scheduler (MJS), Platform LSF® (Load Sharing Facility), Microsoft® Windows HPC Server (including CCS), PBS Pro®, or a TORQUE scheduler.
This section assumes you have anMJS, LSF®, PBS Pro,
TORQUE, or Windows HPC Server (including CCS and HPC Server 2008)
scheduler installed and running on your network. For more information
about LSF, see http://www.platform.com/Products/
.
For more information about Windows HPC Server, see http://www.microsoft.com/hpc
.
With all of these cluster types, the basic job programming sequence
is the same:
Note that the objects that the client session uses to interact
with the MJS are only references to data that is actually contained
in the MJS, not in the client session. After jobs and tasks are created,
you can close your client session and restart it, and your job is
still stored in the MJS. You can find existing jobs using the findJob
function or the Jobs
property
of the MJS cluster object.
A cluster profile identifies the type of cluster to use and its specific properties. In a profile, you define how many workers a job can access, where the job data is stored, where MATLAB is accessed and many other cluster properties. The exact properties are determined by the type of cluster.
The step in this section all assume the profile with the name MyProfile identifies the cluster you want to use, with all necessary property settings. With the proper use of a profile, the rest of the programming is the same, regardless of cluster type. After you define or import your profile, you can set it as the default profile in the Profile Manager GUI, or with the command:
parallel.defaultClusterProfile('MyProfile')
A few notes regarding different cluster types and their properties:
Notes
In a shared file system, all nodes require access to the folder
specified in the cluster object's Because Windows HPC Server requires a shared file system, all
nodes require access to the folder specified in the cluster object's In a shared file system, MATLAB® clients on many computers can access the same job data on the network. Properties of a particular job or task should be set from only one client computer at a time. When you use an LSF scheduler in a nonshared file system, the scheduler might report that a job is in the finished state even though the LSF scheduler might not yet have completed transferring the job's files. |
You use the parcluster
function to identify
a cluster and to create an object representing the cluster in your
local MATLAB session.
To find a specific cluster, user the cluster profile to match
the properties of the cluster you want to use. In this example, MyProfile
is
the name of the profile that defines the specific cluster.
c = parcluster('MyProfile');
MJS Cluster Properties Name: my_mjs Profile: MyProfile Modified: false Host: node345 Username: mylogin NumWorkers: 1 NumBusyWorkers: 0 NumIdleWorkers: 1 JobStorageLocation: Database on node345 ClusterMatlabRoot: C:\apps\matlab OperatingSystem: windows AllHostAddresses: 0:0:0:0 SecurityLevel: 0 (No security) HasSecureCommunication: false Associated Jobs Number Pending: 0 Number Queued: 0 Number Running: 0 Number Finished: 0
You create a job with the createJob
function.
Although this command executes in the client session, it actually
creates the job on the cluster, c
, and creates
a job object, job1
, in the client session.
job1 = createJob(c)
Job Properties: ID: 1 Type: Independent Username: mylogin State: pending SubmitTime: StartTime: Running Duration: 0 days 0h 0m 0s AutoAttachFiles: true Auto Attached Files: List files AttachedFiles: {} AdditionalPaths: {} Associated Tasks: Number Pending: 0 Number Running: 0 Number Finished: 0 Task ID of Errors: []
Note that the job's State
property
is pending
. This means the job has not been queued
for running yet, so you can now add tasks to it.
The cluster's display now includes one pending job, as shown in this partial listing:
c
Associated Jobs Number Pending: 1 Number Queued: 0 Number Running: 0 Number Finished: 0
You can transfer files to the worker by using the AttachedFiles
property
of the job object. For details, see Share Code with the Workers.
After you have created your job, you can create tasks for the
job using the createTask
function.
Tasks define the functions to be evaluated by the workers during the
running of the job. Often, the tasks of a job are all identical. In
this example, each task will generate a 3-by-3 matrix of random numbers.
createTask(job1, @rand, 1, {3,3}); createTask(job1, @rand, 1, {3,3}); createTask(job1, @rand, 1, {3,3}); createTask(job1, @rand, 1, {3,3}); createTask(job1, @rand, 1, {3,3});
The Tasks
property of job1
is
now a 5-by-1 matrix of task objects.
job1.Tasks
ID State FinishTime Function Error ----------------------------------------------------- 1 1 pending @rand 2 2 pending @rand 3 3 pending @rand 4 4 pending @rand 5 5 pending @rand
Alternatively, you can create the five tasks with one call to createTask
by
providing a cell array of five cell arrays defining the input arguments
to each task.
T = createTask(job1, @rand, 1, {{3,3} {3,3} {3,3} {3,3} {3,3}});
In this case, T
is a 5-by-1 matrix of task
objects.
To run your job and have its tasks evaluated, you submit the
job to the job queue with the submit
function.
submit(job1)
The job manager distributes the tasks of job1
to
its registered workers for evaluation.
Each worker performs the following steps for task evaluation:
Receive AttachedFiles
and AdditionalPaths
from
the job. Place files and modify the path accordingly.
Run the jobStartup
function
the first time evaluating a task for this job. You can specify this
function in AttachedFiles
or AdditionalPaths
.
When using an MJS, ff the same worker evaluates subsequent tasks for
this job, jobStartup
does not run between tasks.
Run the taskStartup
function.
You can specify this function in AttachedFiles
or AdditionalPaths
.
This runs before every task evaluation that the worker performs, so
it could occur multiple times on a worker for each job.
If the worker is part of forming a new parallel pool,
run the poolStartup
function.
(This occurs when executing parpool
or
when running other types of jobs that form and use a parallel pool,
such as batch
.)
Receive the task function and arguments for evaluation.
Evaluate the task function, placing the result in
the task's OutputArguments
property. Any
error information goes in the task's Error
property.
Run the taskFinish
function.
The results of each task's evaluation are stored in that task
object's OutputArguments
property as a
cell array. Use the function fetchOutputs
to
retrieve the results from all the tasks in the job.
wait(job1) results = fetchOutputs(job1);
Display the results from each task.
results{1:5}
0.9501 0.4860 0.4565 0.2311 0.8913 0.0185 0.6068 0.7621 0.8214 0.4447 0.9218 0.4057 0.6154 0.7382 0.9355 0.7919 0.1763 0.9169 0.4103 0.3529 0.1389 0.8936 0.8132 0.2028 0.0579 0.0099 0.1987 0.6038 0.0153 0.9318 0.2722 0.7468 0.4660 0.1988 0.4451 0.4186 0.8462 0.6721 0.6813 0.5252 0.8381 0.3795 0.2026 0.0196 0.8318
Because all the data of jobs and tasks resides in the cluster job storage location, these objects continue to exist even if the client session that created them has ended. The following sections describe how to access these objects and how to permanently remove them:
When you close the client session of Parallel Computing Toolbox software, all of the objects in the workspace are cleared. However, the objects in MATLAB Distributed Computing Server™ software or other cluster resources remain in place. When the client session ends, only the local reference objects are lost, not the actual job and task data in the cluster.
Therefore, if you have submitted your job to the cluster job queue for execution, you can quit your client session of MATLAB, and the job will be executed by the cluster. You can retrieve the job results later in another client session.
A client session of Parallel Computing Toolbox software can access any of the objects in MATLAB Distributed Computing Server software, whether the current client session or another client session created these objects.
You create cluster objects in the client session by using the parcluster
function.
c = parcluster('MyProfile');
When you have access to the cluster by the object c
,
you can create objects that reference all those job contained in that
cluster. The jobs are accessible in cluster object's Jobs
property,
which is an array of job objects:
all_jobs = c.Jobs
You can index through the array all_jobs
to
locate a specific job.
Alternatively, you can use the findJob
function
to search in a cluster for any jobs or a particular job identified
by any of its properties, such as its State
.
all_jobs = findJob(c); finished_jobs = findJob(c,'State','finished')
This command returns an array of job objects that reference
all finished jobs on the cluster c
.
When restarting a client session, you lose the settings of any
callback properties (for example, the FinishedFcn
property)
on jobs or tasks. These properties are commonly used to get notifications
in the client session of state changes in their objects. When you
create objects in a new client session that reference existing jobs
or tasks, you must reset these callback properties if you intend to
use them.
Jobs in the cluster continue to exist even after they are finished, and after the MJS is stopped and restarted. The ways to permanently remove jobs from the cluster are explained in the following sections:
Delete Selected Objects. From the command line in the MATLAB client session, you
can call the delete
function
for any job or task object. If you delete a job, you also remove all
tasks contained in that job.
For example, find and delete all finished jobs in your cluster
that belong to the user joep
.
c = parcluster('MyProfile') finished_jobs = findJob(c,'State','finished','Username','joep') delete(finished_jobs) clear finished_jobs
The delete
function permanently removes these
jobs from the cluster. The clear
function
removes the object references from the local MATLAB workspace.
Start an MJS from a Clean State. When an MJS starts, by default it starts so that it resumes its former session with all jobs intact. Alternatively, an MJS can start from a clean state with all its former history deleted. Starting from a clean state permanently removes all job and task data from the MJS of the specified name on a particular host.
As a network administration feature, the -clean
flag
of the startjobmanager
script is described in Start in a Clean State in
the MATLAB Distributed Computing Server System Administrator's Guide.