Run Built-In Functions on a GPU

MATLAB Functions with gpuArray Arguments

Many MATLAB® built-in functions support gpuArray input arguments. Whenever any of these functions is called with at least one gpuArray as an input argument, the function executes on the GPU and generates a gpuArray as the result. You can mix inputs using both gpuArray and MATLAB arrays in the same function call; the MATLAB arrays are transferred to the GPU for the function execution. Supporting functions include the discrete Fourier transform (fft), matrix multiplication (mtimes), and left matrix division (mldivide).

The following functions and their symbol operators are enhanced to accept gpuArray input arguments so that they execute on the GPU:

See the release notes for information about updates to individual functions.

To get information about any restrictions or limitations concerning the support of any of these functions for gpuArray objects, type:

help gpuArray/functionname

For example, to see the help on the overload of lu, type

help gpuArray/lu

In most cases, if any of the input arguments to these functions is a gpuArray, any output arrays are gpuArrays. If the output is always scalar, it returns as MATLAB data in the workspace. If the result is a gpuArray of complex data and all the imaginary parts are zero, these parts are retained and the data remains complex. This could have an impact when using sort, isreal, etc.

Example: Functions with gpuArray Input and Output

This example uses the fft and real functions, along with the arithmetic operators + and *. All the calculations are performed on the GPU, then gather retrieves the data from the GPU back to the MATLAB workspace.

Ga = rand(1000,'single','gpuArray');
Gfft = fft(Ga); 
Gb = (real(Gfft) + Ga) * 6;
G = gather(Gb);

The whos command is instructive for showing where each variable's data is stored.

whos
 Name       Size         Bytes  Class

 G       1000x1000     4000000  single
 Ga      1000x1000         108  gpuArray
 Gb      1000x1000         108  gpuArray
 Gfft    1000x1000         108  gpuArray

Notice that all the arrays are stored on the GPU (gpuArray), except for G, which is the result of the gather function.

Sparse Arrays on a GPU

The following functions support sparse gpuArrays.

abs
angle
bicg
bicgstab
ceil
classUnderlying
conj
ctranspose
deg2rad
end
expm1
find
fix
floor
full
gmres
gpuArray.speye
imag
isaUnderlying
isdiag
isempty
isequal
isequaln
isfloat
isinteger
islogical
isnumeric
isreal
issparse
istril
istriu
length
log1p
minus
mtimes
ndims
nextpow2
nnz
nonzeros
numel
nzmax
pcg
plus
rad2deg
real
realsqrt
round
sign
size
sparse
spfun
spones
sprandsym
sqrt
sum
transpose
tril
triu
uminus
uplus  

You can create a sparse gpuArray either by calling sparse with a gpuArray input, or by calling gpuArray with a sparse input. For example,

x = [0 1 0 0 0; 0 0 0 0 1]
     0     1     0     0     0
     0     0     0     0     1
s = sparse(x)
   (1,2)        1
   (2,5)        1
g = gpuArray(s);   % g is a sparse gpuArray
gt = transpose(g); % gt is a sparse gpuArray
f = full(gt)       % f is a full gpuArray
     0     0
     1     0
     0     0
     0     0
     0     1

Considerations for Complex Numbers

If the output of a function running on the GPU could potentially be complex, you must explicitly specify its input arguments as complex. This applies to gpuArray or to functions called in code run by arrayfun.

For example, if creating a gpuArray which might have negative elements, use G = gpuArray(complex(p)), then you can successfully execute sqrt(G).

Or, within a function passed to arrayfun, if x is a vector of real numbers, and some elements have negative values, sqrt(x) will generate an error; instead you should call sqrt(complex(x)).

The following table lists the functions that might return complex data, along with the input range over which the output remains real.

FunctionInput Range for Real Output
acos(x)abs(x) <= 1
acosh(x)x >= 1
acoth(x)abs(x) >= 1
acsc(x)abs(x) >= 1
asec(x)abs(x) >= 1
asech(x)0 <= x <= 1
asin(x)abs(x) <= 1
atanhabs(x) <= 1
log(x)x >= 0
log1p(x)x >= -1
log10(x)x >= 0
log2(x)x >= 0
power(x,y)x >= 0
reallog(x)x >= 0
realsqrt(x)x >= 0
sqrt(x)x >= 0

Acknowledgments

MAGMA is a library of linear algebra routines that take advantage of GPU acceleration. Linear algebra functions implemented for gpuArrays in Parallel Computing Toolbox™ leverage MAGMA to achieve high performance and accuracy.

See Also

| | |

Was this topic helpful?