As with any MEX-files, those containing CUDA® code have
a single entry point, known as mexFunction
. The
MEX-function contains the host-side code that interacts with gpuArray
objects from MATLAB® and launches the CUDA code. The CUDA code
in the MEX-file must conform to the CUDA runtime API.
You should call the function mxInitGPU
at
the entry to your MEX-file. This ensures that the GPU device is properly
initialized and known to MATLAB.
The interface you use to write a MEX-file for gpuArray objects is different from the MEX interface for standard MATLAB arrays.
You can see an example of a MEX-file containing CUDA code at:
This file contains the following CUDA device function:
void __global__ TimesTwo(double const * const A, double * const B, int const N) { int i = blockDim.x * blockIdx.x + threadIdx.x; if (i < N) B[i] = 2.0 * A[i]; }
It contains the following lines to determine the array size and launch a grid of the proper size:
N = (int)(mxGPUGetNumberOfElements(A)); blocksPerGrid = (N + threadsPerBlock - 1) / threadsPerBlock; TimesTwo<<<blocksPerGrid, threadsPerBlock>>>(d_A, d_B, N);
The MEX-function in this example multiplies every element in the input array by 2 to get the values in the output array. To test it, start with a gpuArray in which every element is 1:
x = ones(4,4,'gpuArray');
y = mexGPUExample(x)
y = 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
Both the input and output arrays are gpuArray objects:
disp(['class(x) = ',class(x),', class(y) = ',class(y)])
class(x) = gpuArray, class(y) = gpuArray
Parallel Computing Toolbox also supports CUDAKernel
objects
that can be used to integrate CUDA code with MATLAB. Consider the
following when choosing the MEX-file approach versus the CUDAKernel
approach:
MEX-files can interact with host-side libraries, such as the NVIDIA Performance Primitives (NPP) or CUFFT libraries, and can also contain calls from the host to functions in the CUDA runtime library.
MEX-files can analyze the size of the input and allocate
memory of a different size, or launch grids of a different size, from
C or C++ code. In comparison, MATLAB code that calls CUDAKernel
objects
must preallocate output memory and determine the grid size.
Complex data on a GPU device is stored in interleaved complex
format. That is, for a complex gpuArray A
, the
real and imaginary parts of element i
are stored
in consecutive addresses. MATLAB uses CUDA built-in vector types
to store complex data on the device (see the NVIDIA CUDA C Programming
Guide).
Depending on the needs of your kernel, you can cast the pointer to complex data either as the real type or as the built-in vector type. For example, in MATLAB, suppose you create a matrix:
a = complex(ones(4,'gpuArray'),ones(4,'gpuArray'));
If you pass a gpuArray to a MEX-function as the first argument (prhs[0]), then you can get a pointer to the complex data by using the calls:
mxGPUArray const * A = mxGPUCreateFromMxArray(prhs[0]); mwSize numel_complex = mxGPUGetNumberOfElements(A); double2 * d_A = (double2 const *)(mxGPUGetDataReadOnly(A));
To treat the array as a real double-precision array of twice the length, you could do it this way:
mxGPUArray const * A = mxGPUCreateFromMxArray(prhs[0]); mwSize numel_real =2*mxGPUGetNumberOfElements(A); double * d_A = (double const *)(mxGPUGetDataReadOnly(A));
Various functions exist to convert data between complex and
real formats on the GPU. These operations require a copy to interleave
the data. The function mxGPUCreateComplexGPUArray
takes
two real mxGPUArrays and interleaves their elements to produce a single
complex mxGPUArray of the same length. The functions mxGPUCopyReal
and mxGPUCopyImag
each copy either the real
or the imaginary elements into a new real mxGPUArray. (There is no
equivalent of the mxGetImagData
function
for mxGPUArray objects.)
To compile CUDA code you must have installed the CUDA toolkit version consistent with the
ToolkitVersion
property of the GPUDevice object.
Use the mexcuda
command in MATLAB to compile a MEX-file containing the CUDA code. You can compile the example file using the command:
mexcuda mexGPUExample.cu
If mexcuda
has trouble locating the NVIDIA compiler
(nvcc
), it might be installed in a non-default location. You
can specify the location of nvcc
on your system by storing it in
the environment variable MW_NVCC_PATH
. You can set this variable
using the MATLAB setenv
command. For
example,
setenv('MW_NVCC_PATH','/usr/local/CUDA/bin')
Only a subset of Visual Studio® compilers is supported for mexcuda. For details, consult the NVIDIA toolkit documentation.