This section offers advice on solving problems you might encounter with MATLAB® Distributed Computing Server™ software.
When starting a MATLAB worker, a licensing problem might result in the message
License checkout failed. No such FEATURE exists. License Manager Error -5
There are many reasons why you might receive this error:
This message usually indicates that you are trying
to use a product for which you are not licensed. Look at your license.dat
file
located within your MATLAB installation to see if you are licensed
to use this product.
If you are licensed for this product, this error may
be the result of having extra carriage returns or tabs in your license
file. To avoid this, ensure that each line begins with either #
, SERVER
, DAEMON
,
or INCREMENT
.
After fixing your license.dat
file, restart
your license manager and MATLAB should work properly.
This error may also be the result of an incorrect system date. If your system date is before the date that your license was made, you will get this error.
If you receive this error when starting a worker with MATLAB Distributed Computing Server software:
You may be calling the startworker
command
from an installation that does not have access to a worker license.
For example, starting a worker from a client installation of the Parallel Computing Toolbox™ product
causes the following error:
The mdce service on the host hostname returned the following error: Problem starting the MATLAB worker. The cause of this problem is: ============================================================== Most likely, the MATLAB worker failed to start due to a licensing problem, or MATLAB crashed during startup. Check the worker log file /tmp/mdce_user/node_node_worker_05-11-01_16-52-03_953.log for more detailed information. The mdce log file /tmp/mdce_user/mdce-service.log may also contain some additional information. ===============================================================
In the worker log files, you see the following information:
License checkout failed. License Manager Error -15 MATLAB is unable to connect to the license server. Check that the license manager has been started, and that the MATLAB client machine can communicate with the license server. Troubleshoot this issue by visiting: http://www.mathworks.com/support/lme/R2009a/15 Diagnostic Information: Feature: MATLAB_Distrib_Comp_Engine License path: /apps/matlab/etc/license.dat FLEXnet Licensing error: -15,570. System Error: 115
If you installed only the Parallel Computing Toolbox product, and you are attempting to run a worker on the same machine, you will receive this error because the MATLAB Distributed Computing Server product is not installed, and therefore the worker cannot obtain a license.
If the number of threads created by the server services on a
machine running a UNIX® operating system (Linux® or Macintosh)
exceeds the limitation set by the maxproc
value,
the services fail and generate an out-of-memory error. Check your maxproc
value
on a UNIX operating system with the limit
command.
(Different versions of UNIX software might have different names
for this property.)
Many networks are configured not to allow LocalSystem
to
have access to UNC or mapped network shares. In this case, run the
mdce process under a different user with rights to log on as a service.
See Set the User.
BASE_PORT. The mdce_def
file specifies and describes
the ports required by the job manager and all workers. See the following
file in the MATLAB installation used for each cluster process:
(on UNIX operating
systems)matlabroot
/toolbox/distcomp/bin/mdce_def.sh
(on Windows® operating
systems)matlabroot
\toolbox\distcomp\bin\mdce_def.bat
Communicating Jobs. On worker machines running a UNIX operating system, the
number of ports required by MPICH for the running of communicating
jobs ranges from BASE_PORT + 1000
to BASE_PORT + 2000
.
Before the worker processes start, you can control the range
of ports used by the workers for communicating jobs by defining the
environment variable MPICH_PORT_RANGE
with the
value minport:maxport
.
With the pctconfig
function,
you specify the ports used by the client. If the default ports cannot
be used, this function allows you to configure ports separately for
communication with the job scheduler and communication with pmode
or a parallel pool.
If you use the job manager on a cluster of nodes running Windows operating systems, you must make sure that a large number of ephemeral TCP ports are available on the job manager machine. By default, the maximum valid ephemeral TCP port number on a Windows operating system is 5000, but transfers of large data sets might fail if this setting is not increased. In particular, if your cluster has 32 or more workers, you should increase the maximum valid ephemeral TCP port number using the following procedure:
Start the Registry Editor.
Locate the following subkey in the registry, and click Parameters:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters
On the Registry Editor window, select Edit > New > DWORD Value.
In the list of entries on the right, change the new
value name to MaxUserPort
and press Enter.
Right-click on the MaxUserPort
entry
name and select Modify.
In the Edit DWORD Value dialog, enter 65534
in
the Value data field. Select Decimal
for
the Base value. Click OK.
This parameter controls the maximum port number that is used when a program requests any available user port from the system. Typically, ephemeral (short-lived) ports are allocated between the values of 1024 and 5000 inclusive. This action allows allocation for port numbers up to 65534.
Quit the Registry Editor.
Reboot your machine.
If a worker is not able to make a connection with its MATLAB job scheduler (MJS, or job manager), or if a client session cannot validate a profile that uses that scheduler, this might indicate communications problems between nodes.
First, be sure that the machines in question agree on their
IP resolutions. The IP address for a particular host should be the
same for itself as it is from the perspective of another host. For
example, if a process on hostB
cannot connect to
one on hostA
, find out the hostA
IP
address for itself, then see what the IP address for hostA
is
from hostB
. They should be the same.
If the machines can identify each other, the nodestatus
command can be useful
for diagnosing problems between their processes. Use the function
to determine what MATLAB Distributed Computing Server processes
are running on the local host, and which are accessible from remote
hosts. If a worker on hostA
cannot register with
its job manager on hostB
, run nodestatus
on
both hosts to see what each can see on hostB
.
On hostB
, execute:
nodestatus -remotehost hostB
Then on hostA
, run exactly the same command:
nodestatus -remotehost hostB
The results should be the same, showing the same listing of job managers and workers.
If the output indicates problems, run the command again with a higher information level to receive more detailed information:
nodestatus -remotehost hostB -infolevel 3
You can diagnose some communications problems using Admin Center.
If you cannot successfully add hosts to the listing by specifying host name, you can use their IP addresses instead (see Add Hosts). If you suspect any communications problems, in the Admin Center GUI click Test Connectivity (see Test Connectivity). This testing verifies that the nodes can identify each other and allow their processes to communicate with each other.
If you want to use the discover cluster capabilities in Parallel Computing Toolbox, your network must be configured with at least one of the following:
Using DNS for cluster discovery requires that you have a DNS SRV record of the following general form:
_mdcs._tcp.domainname.com SSSS IN SRV PPPP WWWW MJS_PORT MJS_FQDN_HOSTNAME
The parts of this record are:
_mdcs._tcp.
The record must start
with this text, followed by your domain name (like company.com
or university.edu
)
that the client machine searches.
SSSS indicates how long (in seconds) the DNS record can be cached; 3600 is recommended.
IN SRV
is required as shown, indicating
that this is a service record.
PPPP
and WWWW
indicate
priority and wait values. These are not used, so 0 is recommended
for each.
MJS_PORT
is the port on which you
connect to the MJS server. The default is 27350, but if you change
it for the server you must change it here accordingly.
MJS_FQDN_HOSTNAME
is the fully
qualified domain name for the host serving the MJS. For example, mjs-1.company.com
.
A valid DNS SRV record for the company.com
network
running an MJS on machine mjs-1
might look like
this:
_mdcs._tcp.company.com 3600 IN SRV 0 0 27350 mjs-1.company.com
For your network, create the appropriate DNS SRV record using
the standard procedure for your DNS system. Then you can verify that
your network is configured with the necessary DNS SRV records by using
standard utilities, such as the nslookup
command.
For example, this system command indicates the existence of the applicable
DNS SRV records:
nslookup -type=SRV _mdcs._tcp.company.com
To use multicast, it is required on the head node running the MATLAB job scheduler (MJS) and on the client system.
Multicast, unlike TCP/IP or UDP, is a subscription-based protocol where a number of machines on a network indicate to the network their interest in particular packets originating somewhere on that network. By contrast, both UDP and TCP packets are always bound for a single machine, usually indicated by its IP address.
The main tools for investigating this type of packet are:
tcpdump
for UNIX operating
systems
winpcap
and ethereal
for Microsoft® Windows operating
systems
A Java® class included with the parallel computing products.
The Java class is called com.mathworks.toolbox.distcomp.test.MulticastTester
.
Both its static main method and its constructor take two input arguments:
the multicast group to join and the port number to use.
This Java class has a number of simple methods to attempt to join a specified multicast group. Once the class has successfully joined the group, it has methods to send messages to the group, listen for messages from the group, and display what it receives. You can use this class both from a command-line call to Java software and inside MATLAB.
From a shell prompt (assuming that java
is
on your path), type
java -cp distcomp.jar com.mathworks.toolbox.distcomp.test.MulticastTester
You should see an output something like this:
0 : host1name : 0 1 : host2name : 0
The following example shows how to use the Java class inside MATLAB.
Start MATLAB on two machines (e.g., host1name
and host2name
)
for which you want to test multicast. In each MATLAB session, enter
the following commands:
m = com.mathworks.toolbox.distcomp.test.MulticastTester('239.1.1.1', 9999); m.startSendingThread; m.startListeningThread;
These instructions cause each MATLAB session to issue a stream of multicast test packets, and to listen for test packets. If multicast is working between the machines, you see a stream of lines like the following:
0 : host1name : 0 1 : host2name : 0 2 : host2name : 1 3 : host2name : 2
The number on the left in each string is the line number for the received packet. The text in the center is the host from which the packet is received. The number on the right is the packet number sent by the sending host. It is normal for a host to report a test packet from itself.
If either machine does not receive a stream of test packets, or if the remote host is not included in either stream, then multicast communication is not operating properly.
To terminate the test stream, execute the following in both MATLAB sessions:
m.stopSendingThread; m.stopListeningThread;