Run Built-In Functions on a GPU

MATLAB Functions with gpuArray Arguments

Many MATLAB^® built-in functions support gpuArray input arguments. Whenever any of these functions is called with at least one gpuArray as an input argument, the function executes on the GPU and generates a gpuArray as the result. You can mix inputs using both gpuArray and MATLAB arrays in the same function call; the MATLAB arrays are transferred to the GPU for the function execution. Supporting functions include the discrete Fourier transform (fft), matrix multiplication (mtimes), and left matrix division (mldivide).

The following functions and their symbol operators are enhanced to accept gpuArray input arguments so that they execute on the GPU:

abs
acos
acosd
acosh
acot
acotd
acoth
acsc
acscd
acsch
accumarray
all
and
angle
any
arrayfun
asec
asecd
asech
asin
asind
asinh
assert
atan
atan2
atan2d
atand
atanh
bandwidth
besselj
bessely
beta
betainc
betaincinv
betaln
bicg
bicgstab
bitand
bitcmp
bitget
bitor
bitset
bitshift
bitxor
blkdiag
bounds
bsxfun
cart2pol
cart2sph
cast
cat
cdf2rdf
ceil
cgs
chol
circshift
classUnderlying
colon

compan
complex
cond
conj
conv
conv2
convn
corrcoef
cos
cosd
cosh
cot
cotd
coth
cov
cross
csc
cscd
csch
ctranspose
cummax
cummin
cumprod
cumsum
deg2rad
del2
det
detrend
diag
diff
discretize
disp
display
dot
double
eig
eps
eq
erf
erfc
erfcinv
erfcx
erfinv
exp
expint
expm
expm1
eye
factorial
false
fft
fft2
fftn
fftshift
filter
filter2
find
fix

flip
fliplr
flipud
floor
fprintf
full
gamma
gammainc
gammaincinv
gammaln
gather
ge
gmres
gradient
gt
hankel
head
histcounts
horzcat
hsv2rgb
hypot
idivide
ifft
ifft2
ifftn
ifftshift
imag
ind2sub
Inf
inpolygon
int16
int2str
int32
int64
int8
interp1
interp2
interp3
interpn
intersect
inv
ipermute
isaUnderlying
isbanded
iscolumn
isdiag
isempty
isequal
isequaln
isfinite
isfloat
ishermitian
isinf
isinteger
islogical
ismatrix
ismember
ismembertol

isnan
isnumeric
isreal
isrow
issorted
issparse
issymmetric
istril
istriu
isvector
kron
ldivide
le
legendre
length
log
log10
log1p
log2
logical
lsqr
lt
lu
mat2str
max
median
mean
meshgrid
min
minus
mldivide
mod
mode
movmean
movstd
movsum
movvar
mpower
mrdivide
mtimes
NaN
ndgrid
ndims
ne
nextpow2
nnz
nonzeros
norm
normest
not
nthroot
null
num2str
numel
ones
or
orth
pagefun

pcg
perms
permute
pinv
planerot
plot (and related)
plus
pol2cart
poly
polyarea
polyder
polyfit
polyint
polyval
polyvalm
pow2
power
prod
psi
qmr
qr
rad2deg
rand
randi
randn
randperm
rank
rdivide
real
reallog
realpow
realsqrt
rectint
rem
repelem
repmat
reshape
rgb2hsv
roots
rot90
round
sec
secd
sech
setdiff
setxor
shiftdim
sign
sin
sind
single
sinh
size
sort
sortrows
svds
spconvert
spdiags

sph2cart
sprand
sprandn
sprandsym
spconvert
sph2cart
sprand
sprandn
sprandsym
sprintf
sqrt
squeeze
std
sub2ind
subsasgn
subsindex
subspace
subsref
sum
superiorfloat
svd
svds
swapbytes
tail
tan
tand
tanh
times
toeplitz
trace
transpose
trapz
tril
triu
true
typecast
uint16
uint32
uint64
uint8
uminus
union
unique
uniquetol
unwrap
uplus
vander
var
vertcat
xor
zeros

See the release notes for information about updates to individual functions.

To get information about any restrictions or limitations concerning the support of any of these functions for gpuArray objects, type:

help gpuArray/functionname

For example, to see the help on the overload of lu, type

help gpuArray/lu

In most cases, if any of the input arguments to these functions is a gpuArray, any output arrays are gpuArrays. If the output is always scalar, it returns as MATLAB data in the workspace. If the result is a gpuArray of complex data and all the imaginary parts are zero, these parts are retained and the data remains complex. This could have an impact when using sort, isreal, etc.

Example: Functions with gpuArray Input and Output

This example uses the fft and real functions, along with the arithmetic operators + and *. All the calculations are performed on the GPU, then gather retrieves the data from the GPU back to the MATLAB workspace.

Ga = rand(1000,'single','gpuArray');
Gfft = fft(Ga); 
Gb = (real(Gfft) + Ga) * 6;
G = gather(Gb);

The whos command is instructive for showing where each variable's data is stored.

whos

 Name       Size         Bytes  Class

 G       1000x1000     4000000  single
 Ga      1000x1000         108  gpuArray
 Gb      1000x1000         108  gpuArray
 Gfft    1000x1000         108  gpuArray

Notice that all the arrays are stored on the GPU (gpuArray), except for G, which is the result of the gather function.

Sparse Arrays on a GPU

The following functions support sparse gpuArrays.

abs
angle
bicg
bicgstab
ceil
classUnderlying
conj
ctranspose
deg2rad
end
expm1
find
fix
floor
full
gmres
gpuArray.speye
imag
isaUnderlying
isdiag
isempty

isequal
isequaln
isfloat
isinteger
islogical
isnumeric
isreal
issparse
istril
istriu
length
log1p
minus
mtimes
ndims
nextpow2
nnz
nonzeros
numel
nzmax
pcg

plus
rad2deg
real
realsqrt
round
sign
size
sparse
spfun
spones
sprandsym
sqrt
sum
transpose
tril
triu
uminus
uplus

You can create a sparse gpuArray either by calling sparse with a gpuArray input, or by calling gpuArray with a sparse input. For example,

x = [0 1 0 0 0; 0 0 0 0 1]

     0     1     0     0     0
     0     0     0     0     1

s = sparse(x)

   (1,2)        1
   (2,5)        1

g = gpuArray(s);   % g is a sparse gpuArray
gt = transpose(g); % gt is a sparse gpuArray
f = full(gt)       % f is a full gpuArray

Considerations for Complex Numbers

If the output of a function running on the GPU could potentially be complex, you must explicitly specify its input arguments as complex. This applies to gpuArray or to functions called in code run by arrayfun.

For example, if creating a gpuArray which might have negative elements, use G = gpuArray(complex(p)), then you can successfully execute sqrt(G).

Or, within a function passed to arrayfun, if x is a vector of real numbers, and some elements have negative values, sqrt(x) will generate an error; instead you should call sqrt(complex(x)).

The following table lists the functions that might return complex data, along with the input range over which the output remains real.

Function	Input Range for Real Output
`acos(x)`	`abs(x) <= 1`
`acosh(x)`	`x >= 1`
`acoth(x)`	`abs(x) >= 1`
`acsc(x)`	`abs(x) >= 1`
`asec(x)`	`abs(x) >= 1`
`asech(x)`	`0 <= x <= 1`
`asin(x)`	`abs(x) <= 1`
`atanh`	`abs(x) <= 1`
`log(x)`	`x >= 0`
`log1p(x)`	`x >= -1`
`log10(x)`	`x >= 0`
`log2(x)`	`x >= 0`
`power(x,y)`	`x >= 0`
`reallog(x)`	`x >= 0`
`realsqrt(x)`	`x >= 0`
`sqrt(x)`	`x >= 0`

Acknowledgments

MAGMA is a library of linear algebra routines that take advantage of GPU acceleration. Linear algebra functions implemented for gpuArrays in Parallel Computing Toolbox™ leverage MAGMA to achieve high performance and accuracy.

Documentation