Big Data

Accelerate mapreduce and datastore programs by running on a parallel pool or Hadoop^® cluster

Parallel Computing Toolbox™ extends the capabilities of MATLAB^® MapReduce and Datastore, so that you can run big data applications on a parallel pool for improved performance. MATLAB Distributed Computing Server™ also supports running parallel MapReduce programs on Hadoop clusters.

Functions

mapreduce	Programming technique for analyzing data sets that do not fit in memory
mapreducer	Define parallel execution environment for mapreduce

partition	Partition a datastore
numpartitions	Number of partitions

parpool	Create parallel pool on cluster
gcp	Get current parallel pool

Classes

parallel.Pool	Access parallel pool
parallel.cluster.Hadoop	Hadoop cluster for mapreducer

Examples and How To

Run mapreduce on a Parallel Pool
Run mapreduce on a Hadoop Cluster
Partition a Datastore in Parallel

Concepts

Parallel Pools
Parallel Preferences
Clusters and Cluster Profiles

Related Information

MapReduce
Datastore

Was this topic helpful?

Documentation

Big Data

Functions

Classes

Examples and How To

Concepts

Related Information

Parallel Computing Toolbox Documentation

Other Documentation

Support