As with any MEX-files, those containing CUDA® code have
a single entry point, known as mexFunction
. The
MEX-function contains the host-side code that interacts with gpuArray
objects from MATLAB® and launches the CUDA code. The CUDA code
in the MEX-file must conform to the CUDA runtime API.
You should call the function mxInitGPU
at
the entry to your MEX-file. This ensures that the GPU device is properly
initialized and known to MATLAB.
The interface you use to write a MEX-file for gpuArray objects is different from the MEX interface for standard MATLAB arrays.
You can see an example of a MEX-file containing CUDA code at:
matlabroot
/toolbox/distcomp/gpu/extern/src/mex/mexGPUExample.cumatlabroot
/toolbox/distcomp/gpu/extern/src/mex/mexGPUExample.cu
This file contains the following CUDA device function:
void __global__ TimesTwo(double const * const A, double * const B, int const N) { int i = blockDim.x * blockIdx.x + threadIdx.x; if (i < N) B[i] = 2.0 * A[i]; }
It contains the following lines to determine the array size and launch a grid of the proper size:
N = (int)(mxGPUGetNumberOfElements(A)); blocksPerGrid = (N + threadsPerBlock - 1) / threadsPerBlock; TimesTwo<<<blocksPerGrid, threadsPerBlock>>>(d_A, d_B, N);
To compile CUDA code you must have installed the CUDA toolkit
version consistent with the ToolkitVersion
property
of the GPUDevice object.
Use the mexcuda
command
in MATLAB to compile a MEX-file containing the CUDA code.
You can compile the example file using the command:
mexcuda mexGPUExample.cu
If mexcuda
has trouble locating the NVIDIA
compiler (nvcc
), it might be installed in a non-default
location. You can specify the location of nvcc
on
your system by storing it in the environment variable MW_NVCC_PATH
.
You can set this variable using the MATLAB setenv
command.
For example,
setenv('MW_NVCC_PATH','/usr/local/CUDA/bin')
The MEX-function in this example multiplies every element in the input array by 2 to get the values in the output array. To test it, start with a gpuArray in which every element is 1:
x = ones(4,4,'gpuArray');
y = mexGPUExample(x)
y = 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
Both the input and output arrays are gpuArray objects:
disp(['class(x) = ',class(x),', class(y) = ',class(y)])
class(x) = gpuArray, class(y) = gpuArray
Parallel Computing Toolbox also supports CUDAKernel
objects
that can be used to integrate CUDA code with MATLAB. Consider the
following when choosing the MEX-file approach versus the CUDAKernel
approach:
MEX-files can interact with host-side libraries, such as the NVIDIA Performance Primitives (NPP) or CUFFT libraries, and can also contain calls from the host to functions in the CUDA runtime library.
MEX-files can analyze the size of the input and allocate
memory of a different size, or launch grids of a different size, from
C or C++ code. In comparison, MATLAB code that calls CUDAKernel
objects
must pre-allocated output memory and determine the grid size.
Complex data on a GPU device is stored in interleaved complex
format. That is, for a complex gpuArray A
, the
real and imaginary parts of element i
are stored
in consecutive addresses. MATLAB uses CUDA built-in vector types
to store complex data on the device (see the NVIDIA CUDA C Programming
Guide).
Depending on the needs of your kernel, you can cast the pointer to complex data either as the real type or as the built-in vector type. For example, in MATLAB, suppose you create a matrix:
a = complex(ones(4,'gpuArray'),ones(4,'gpuArray'));
If you pass a gpuArray to a MEX-function as the first argument (prhs[0]), then you can get a pointer to the complex data by using the calls:
mxGPUArray const * A = mxGPUCreateFromMxArray(prhs[0]); mwSize numel_complex = mxGPUGetNumberOfElements(A); double2 * d_A = (double2 const *)(mxGPUGetDataReadOnly(A));
To treat the array as a real double-precision array of twice the length, you could do it this way:
mxGPUArray const * A = mxGPUCreateFromMxArray(prhs[0]); mwSize numel_real =2*mxGPUGetNumberOfElements(A); double * d_A = (double const *)(mxGPUGetDataReadOnly(A));
Various functions exist to convert data between complex and
real formats on the GPU. These operations require a copy to interleave
the data. The function mxGPUCreateComplexGPUArray
takes
two real mxGPUArrays and interleaves their elements to produce a single
complex mxGPUArray of the same length. The functions mxGPUCopyReal
and mxGPUCopyImag
each copy either the real
or the imaginary elements into a new real mxGPUArray. (There is no
equivalent of the mxGetImagData
function
for mxGPUArray objects.)