Because the tasks of a job are evaluated on different machines, each machine must have access to all the files needed to evaluate its tasks. The basic mechanisms for sharing code are explained in the following sections:
If the workers all have access to the same drives on the network, they can access the necessary files that reside on these shared resources. This is the preferred method for sharing data, as it minimizes network traffic.
You must define each worker session’s search path so that it looks for files in the right places. You can define the path:
By using the job’s AdditionalPaths
property.
This is the preferred method for setting the path, because it is specific
to the job.
AdditionalPaths
identifies folders to be
added to the top of the command search path of worker sessions for
this job. If you also specify AttachedFiles
,
the AttachedFiles
are above AdditionalPaths
on
the workers’ path.
When you specify AdditionalPaths
at the
time of creating a job, the settings are combined with those specified
in the applicable cluster profile. Setting AdditionalPaths
on
a job object after it is created does not combine the new setting
with the profile settings, but overwrites existing settings for that
job.
AdditionalPaths
is empty by default. For
a mixed-platform environment, the character vectors can specify both UNIX® and Microsoft® Windows® style
paths; those setting that are not appropriate or not found for a particular
machine generate warnings and are ignored.
This example sets the MATLAB® worker path in a mixed-platform
environment to use functions in both the central repository /central/funcs
and
the department archive /dept1/funcs
, which each
also have a Windows UNC path.
c = parcluster(); % Use default job1 = createJob(c); ap = {'/central/funcs','/dept1/funcs', ... '\\OurDomain\central\funcs','\\OurDomain\dept1\funcs'}; job1.AdditionalPaths = ap;
By putting the path
command
in any of the appropriate startup files for the worker:
matlabroot
\toolbox\local\startup.m
matlabroot
\toolbox\distcomp\user\jobStartup.m
matlabroot
\toolbox\distcomp\user\taskStartup.m
Access to these files can be passed to the worker by the job’s AttachedFiles
or AdditionalPaths
property.
Otherwise, the version of each of these files that is used is the
one highest on the worker’s path.
Access to files among shared resources can depend upon permissions
based on the user name. You can set the user name with which the MJS
and worker services of MATLAB
Distributed Computing Server™ software
run by setting the MDCEUSER
value in the mdce_def
file
before starting the services. For Microsoft Windows operating
systems, there is also MDCEPASS
for providing the
account password for the specified user. For an explanation of service
default settings and the mdce_def
file, see Define Script Defaults (MATLAB Distributed Computing Server) in
the MATLAB
Distributed Computing Server System Administrator's Guide.
A number of properties on task and job objects are designed for passing code or data from client to scheduler to worker, and back. This information could include MATLAB code necessary for task evaluation, or the input data for processing or output data resulting from task evaluation. The following properties facilitate this communication:
InputArguments
— This
property of each task contains the input data you specified when creating
the task. This data gets passed into the function when the worker
performs its evaluation.
OutputArguments
— This
property of each task contains the results of the function’s
evaluation.
JobData
— This property
of the job object contains data that gets sent to every worker that
evaluates tasks for that job. This property works efficiently because
the data is passed to a worker only once per job, saving time if that
worker is evaluating more than one task for the job. (Note: Do not
confuse this property with the UserData
property
on any objects in the MATLAB client. Information in UserData
is
available only in the client, and is not available to the scheduler
or workers.)
AttachedFiles
— This property
of the job object is a cell array in which you manually specify all
the folders and files that get sent to the workers. On the worker,
the files are installed and the entries specified in the property
are added to the search path of the worker session.
AttachedFiles
contains a list of folders
and files that the worker need to access for evaluating a job’s
tasks. The value of the property (empty by default) is defined in
the cluster profile or in the client session. You set the value for
the property as a cell array of character vectors. Each character
vector is an absolute or relative pathname to a folder or file. (Note:
If these files or folders change while they are being transferred,
or if any of the folders are empty, a failure or error can result.
If you specify a pathname that does not exist, an error is generated.)
The first time a worker evaluates a task for a particular job,
the scheduler passes to the worker the files and folders in the AttachedFiles
property.
On the worker machine, a folder structure is created that is exactly
the same as that accessed on the client machine where the property
was set. Those entries listed in the property value are added to the
top of the command search path in the worker session. (Subfolders
of the entries are not added to the path, even though they are included
in the folder structure.) To find out where the files are placed on
the worker machine, use the function getAttachedFilesFolder
in
code that runs on the worker.
When the worker runs subsequent tasks for the same job, it uses
the folder structure already set up by the job’s AttachedFiles
property
for the first task it ran for that job.
When you specify AttachedFiles
at the time
of creating a job, the settings are combined with those specified
in the applicable profile. Setting AttachedFiles
on
a job object after it is created does not combine the new setting
with the profile settings, but overwrites the existing settings for
that job.
The transfer of AttachedFiles
occurs for
each worker running a task for that particular job on a machine, regardless
of how many workers run on that machine. Normally, the attached files
are deleted from the worker machine when the job is completed, or
when the next job begins.
AutoAttachFiles
— This
property of the job object uses a logical value to specify that you
want MATLAB to perform an analysis on the task functions in the job
and on manually attached files to determine which code files are necessary
for the workers, and to automatically send those files to the workers.
You can set this property value in a cluster profile using the Profile
Manager, or you can set it programmatically on a job object at the
command line.
c = parcluster(); j = createJob(c); j.AutoAttachFiles = true;
The supported code file formats for automatic attachment are
MATLAB files (.m
extension), P-code files (.p
),
and MEX-files (.mex
). Note that AutoAttachFiles
does
not include data files for your job; use the AttachedFiles
property
to explicitly transfer these files to the workers.
Use listAutoAttachedFiles
to
get a listing of the code files that are automatically attached to
a job.
If the AutoAttachFiles
setting is true
for
the cluster profile used when starting a parallel pool, MATLAB performs
an analysis on spmd
blocks, parfor
-loops, and other attached files
to determine what other code files are necessary for execution, then
automatically attaches those files to the parallel pool so that the
code is available to the workers.
There is a default maximum amount of data that can be sent in
a single call for setting properties. This limit applies to the OutputArguments
property
as well as to data passed into a job as input arguments or AttachedFiles
.
If the limit is exceeded, you get an error message. For more information
about this data transfer size limit, see Attached Files Size Limitations.
As a session of MATLAB, a worker session executes its startup
.m
file each
time it starts. You can place the startup.m
file
in any folder on the worker’s MATLAB search path, such
as toolbox/distcomp/user
.
These additional files can initialize and clean up a worker session as it begins or completes evaluations of tasks for a job:
jobStartup
.m
automatically
executes on a worker when the worker runs its first task of a job.
taskStartup
.m
automatically
executes on a worker each time the worker begins evaluation of a task.
poolStartup
.m
automatically
executes on a worker each time the worker is included in a newly started
parallel pool.
taskFinish
.m
automatically
executes on a worker each time the worker completes evaluation of
a task.
Empty versions of these files are provided in the folder:
matlabroot/toolbox/distcomp/user
You can edit these files to include whatever MATLAB code you want the worker to execute at the indicated times.
Alternatively, you can create your own versions of these files
and pass them to the job as part of the AttachedFiles
property,
or include the path names to their locations in the AdditionalPaths
property.
The worker gives precedence to the versions provided in the AttachedFiles
property,
then to those pointed to in the AdditionalPaths
property.
If any of these files is not included in these properties, the worker
uses the version of the file in the toolbox/distcomp/user
folder
of the worker’s MATLAB installation.