This example shows how arrayfun
can be used to run a MATLAB® function natively on the GPU. When the MATLAB function contains many element-wise operations, arrayfun
can provide improved performance when compared to simply executing the MATLAB function directly on the GPU with gpuArray input data. The MATLAB function can be in its own file or can be a nested or anonymous function. It must contain only scalar operations and arithmetic.
We put the example into a function to allow nested functions:
function paralleldemo_gpu_arrayfun
Horner's rule allows the efficient evaluation of power series expansions. We will use it to calculate the first 10 terms of the power series expansion for the exponential function exp
. We can implement this as a MATLAB function.
function y = horner(x) %HORNER - series expansion for exp(x) using Horner's rule y = 1 + x.*(1 + x.*((1 + x.*((1 + ... x.*((1 + x.*((1 + x.*((1 + x.*((1 + ... x.*((1 + x./9)./8))./7))./6))./5))./4))./3))./2)); end
horner
for the GPUTo run this function on the GPU with minimal code changes, we could pass a gpuArray
object as input to the horner
function. Since horner
contains only individual element-wise operations, we might not realize very good performance on the GPU when performing each operation one at a time. However, we can improve the performance by executing all of the element-wise operations in the horner
function at one time using arrayfun
.
To run this function on the GPU using arrayfun
, we use a handle to the horner
function. horner
automatically adapts to different size and type inputs. We can compare the results computed on the GPU using both gpuArray
objects and arrayfun
with standard MATLAB CPU execution simply by evaluating the function directly.
hornerFcn = @horner;
We create some inputs of different types and sizes, and use gpuArray
to send them to the GPU.
data1 = rand( 2000, 'single' ); data2 = rand( 1000, 'double' ); gdata1 = gpuArray( data1 ); gdata2 = gpuArray( data2 );
horner
on the GPUTo evaluate the horner
function on the GPU, we have two choices. With minimal code changes we can evaluate the original function on the GPU by providing a gpuArray
object as input. However, to improve the performance on the GPU call arrayfun
, using the same calling convention as the original MATLAB function.
We can compare the accuracy of the results by evaluating the original function directly in MATLAB on the CPU. We expect some slight numerical differences because the floating-point arithmetic on the GPU does not precisely match the arithmetic performed on the CPU.
gresult1 = arrayfun( hornerFcn, gdata1 ); gresult2 = arrayfun( hornerFcn, gdata2 ); comparesingle = max( max( abs( gresult1 - horner( data1 ) ) ) ); comparedouble = max( max( abs( gresult2 - horner( data2 ) ) ) );
fprintf( 'Maximum discrepancy for single precision: %g\n', comparesingle ); fprintf( 'Maximum discrepancy for double precision: %g\n', comparedouble );
Maximum discrepancy for single precision: 2.38419e-07 Maximum discrepancy for double precision: 0
We can compare the performance of the GPU versions to the native MATLAB CPU version. Current generation GPUs have much better performance in single precision, so we compare that.
% CPU execution tic hornerFcn( data1 ); tcpu = toc; % GPU execution using only gpuArray objects tgpuObject = gputimeit(@() hornerFcn(gdata1)); % GPU execution using gpuArray objects with arrayfun tgpuArrayfun = gputimeit(@() arrayfun(hornerFcn, gdata1)); fprintf( 'Speed-up achieved using gpuArray objects only: %g\n',... tcpu / tgpuObject ); fprintf( 'Speed-up achieved using gpuArray objects with arrayfun: %g\n',... tcpu / tgpuArrayfun );
Speed-up achieved using gpuArray objects only: 24.6764 Speed-up achieved using gpuArray objects with arrayfun: 98.3555
end