partition

Partition a datastore

Syntax

  • subds = partition(ds,N,index)
    example
  • subds = partition(ds,'Files',index)
    example
  • subds = partition(ds,'Files',filename)

Description

example

subds = partition(ds,N,index) partitions datastore ds into the number of parts specified by N and returns the partition corresponding to the index index.

example

subds = partition(ds,'Files',index) partitions the datastore by files and returns the partition corresponding to the file of index index in the Files property.

subds = partition(ds,'Files',filename) partitions the datastore by files and returns the partition corresponding to the file specified by filename.

Examples

collapse all

Partition Datastore into Specific Number of Parts

Create a datastore from the sample file, airlinesmall.csv, which contains tabular data.

ds = datastore('airlinesmall.csv');

Partition the datastore into three parts.

subds = partition(ds,3,1)
subds = 
  TabularTextDatastore with properties:

                      Files: {
                             ' ...\matlab\toolbox\matlab\demos\airlinesmall.csv'
                             }
          ReadVariableNames: true
              VariableNames: {'Year', 'Month', 'DayofMonth' ... and 26 more}

  Text Format Properties:
             NumHeaderLines: 0
                  Delimiter: ','
               RowDelimiter: '\r\n'
             TreatAsMissing: ''
               MissingValue: NaN

  Advanced Text Format Properties:
            TextscanFormats: {'%f', '%f', '%f' ... and 26 more}
         ExponentCharacters: 'eEdD'
               CommentStyle: ''
                 Whitespace: ' \b\t'
    MultipleDelimitersAsOne: false

  Properties that control the table returned by preview, read, readall:
      SelectedVariableNames: {'Year', 'Month', 'DayofMonth' ... and 26 more}
            SelectedFormats: {'%f', '%f', '%f' ... and 26 more}
                   ReadSize: 20000 rows

Partition Datastore into Default Number of Parts

Create a datastore from the sample file, mapredout.mat, which is the output file of the mapreduce function.

ds = datastore('mapredout.mat');

Get the default number of partitions for ds.

N = numpartitions(ds);

Partition the datastore into the default number of partitions and return the datastore corresponding to the first partition.

subds = partition(ds,N,1);

Read the data in subds.

while hasdata(subds)
	data = read(subds);
end

Partition Datastore by Files

Create a datastore that contains three image files.

ds = datastore({'street1.jpg','peppers.png','corn.tif'},...
'Type','image','FileExtensions',{'.jpg','.png','.tif'})
ds = 

  ImageDatastore with properties:

      Files: {
             ' ...\matlab\toolbox\matlab\demos\street1.jpg';
             ' ...\matlab\toolbox\matlab\imagesci\peppers.png';
             ' ...\matlab\toolbox\matlab\imagesci\corn.tif'
             }
    ReadFcn: @readDatastoreImage

Partition the datastore by files and return the part corresponding to the second file.

subds = partition(ds,'Files',2)
subds = 

  ImageDatastore with properties:

      Files: {
             ' ...\matlab\toolbox\matlab\imagesci\peppers.png'
             }
    ReadFcn: @readDatastoreImage

subds contains one file.

Partition Data in Parallel

Create a datastore from the sample file, mapredout.mat, which is the output file of the mapreduce function.

ds = datastore('mapredout.mat');

Partition the datastore into three parts on three workers in a parallel pool.

N = 3;
p = parpool('local',N);

parfor ii=1:N
    subds = partition(ds,N,ii);
    while hasdata(subds)
        data = read(subds);
    end
end

Related Examples

Input Arguments

collapse all

ds — Input datastoredatastore

Input datastore. To create a datastore object from your data, use the datastore function.

N — Number of partitionspositive integer

Number of partitions, specified as a positive integer.

Example: 3

Data Types: double

index — Indexpositive integer

Index, specified as a positive integer.

Example: 1

Data Types: double

filename — file namestring

File name, specified as a string.

Example: 'file1.csv'

Example: '../dir/data/file1.csv'

Example: 'hdfs://myserver:7867/data/file1.txt'

Output Arguments

collapse all

subds — Output datastoredatastore

Output datastore. The output datastore is of the same type as the input datastore ds.

Introduced in R2015a

Was this topic helpful?