Supported Platform: Linux® only.
Using the MATLAB API for Spark to deploy an application consists of two parts :
Creating your application using the MATLAB API for Spark and packaging it as a standalone application in the MATLAB desktop environment.
Executing the standalone application against a Spark enabled cluster from a Linux shell.
While creating your application using the MATLAB API for Spark, you will be able to use Spark functions such as flatMap
,
mapPartitions
, aggregate
and
others in your MATLAB code.
The API exposes the Spark programing model to MATLAB, allowing for MATLAB implementations of numerous Spark functions. Many of these MATLAB implementations accept
function handles or anonymous functions as inputs to perform various
types of analyses.
The API lets you interactively run your application from within the MATLAB desktop environment in a nondistributed mode on a single machine. A second MATLAB session on the same machine serves as a worker. This functionality can be helpful in debugging your application prior to deploying it on a Spark enabled cluster. It is necessary to configure your MATLAB environment for interactive debugging using the MATLAB API for Spark. For more information, see Configure Environment for Interactive Debugging.
The general workflow for using the MATLAB API for Spark is as follows :
Specify Spark properties.
Create a SparkConf object.
Create a SparkContext object.
Create an RDD object from the data.
Perform operations on the RDD object.
You can package an application created with this API into a standalone
application using the mcc
command or deploytool
. You can
then run the application on a Spark enabled cluster from a Linux shell.
MATLAB applications developed using the MATLAB API for Spark cannot be deployed if they contain tall arrays.
For a complete example, see Example on Deploying Applications to Spark Using the MATLAB API for Spark. You can follow the same instructions to deploy applications created using the MATLAB API for Spark to Cloudera® CDH.
matlab.compiler.mlspark.SparkConf | Interface class to configure an application with Spark parameters as key-value pairs |
matlab.compiler.mlspark.SparkContext | Interface class to initialize a connection to a Spark enabled cluster |
matlab.compiler.mlspark.RDD | Interface class to represent a Spark Resilient Distributed Dataset (RDD) |
Configure Environment for Interactive Debugging
Configure your MATLAB environment to interactively make calls and debug your application using the MATLAB API for Spark.
Learn basic Apache Spark™ concepts and see how these concepts relate to deploying MATLAB applications to Spark.