Documentation

CUDAKernel

Kernel executable on GPU

Constructor

parallel.gpu.CUDAKernel

Description

A CUDAKernel object represents a CUDA kernel, that can execute on a GPU. You create the kernel when you compile PTX or CU code, as described in Run CUDA or PTX Code on GPU.

Methods

existsOnGPU	Determine if gpuArray or CUDAKernel is available on GPU
feval	Evaluate kernel on GPU
setConstantMemory	Set some constant memory on GPU

Properties

A CUDAKernel object has the following properties:

Property Name	Description
`ThreadBlockSize`	Size of block of threads on the kernel. This can be an integer vector of length 1, 2, or 3 (since thread blocks can be up to 3-dimensional). The product of the elements of `ThreadBlockSize` must not exceed the `MaxThreadsPerBlock` for this kernel, and no element of `ThreadBlockSize` can exceed the corresponding element of the `GPUDevice` property `MaxThreadBlockSize`.
`MaxThreadsPerBlock`	Maximum number of threads permissible in a single block for this CUDA kernel. The product of the elements of `ThreadBlockSize` must not exceed this value.
`GridSize`	Size of grid (effectively the number of thread blocks that will be launched independently by the GPU). This is an integer vector of length 3. None of the elements of this vector can exceed the corresponding element in the vector of the `MaxGridSize` property of the `GPUDevice` object.
`SharedMemorySize`	The amount of dynamic shared memory (in bytes) that each thread block can use. Each thread block has an available shared memory region. The size of this region is limited in current cards to ~16 kB, and is shared with registers on the multiprocessors. As with all memory, this needs to be allocated before the kernel is launched. It is also common for the size of this shared memory region to be tied to the size of the thread block. Setting this value on the kernel ensures that each thread in a block can access this available shared memory region.
`EntryPoint`	(read-only) A character vector containing the actual entry point name in the PTX code that this kernel is going to call. An example might look like `'_Z13returnPointerPKfPy'`.
`MaxNumLHSArguments`	(read-only) The maximum number of left hand side arguments that this kernel supports. It cannot be greater than the number of right hand side arguments, and if any inputs are constant or scalar it will be less.
`NumRHSArguments`	(read-only) The required number of right hand side arguments needed to call this kernel. All inputs need to define either the scalar value of an input, the elements for a vector input/output, or the size of an output argument.
`ArgumentTypes`	(read-only) Cell array of character vectors, the same length as `NumRHSArguments`. Each of the character vectors indicates what the expected MATLAB type for that input is (a numeric type such as `uint8`, `single`, or `double` followed by the word `scalar` or `vector` to indicate if we are passing by reference or value). In addition, if that argument is only an input to the kernel, it is prefixed by `in`; and if it is an input/output, it is prefixed by `inout`. This allows you to decide how to efficiently call the kernel with both MATLAB arrays and gpuArray, and to see which of the kernel inputs are being treated as outputs.

See Also

gpuArray, GPUDevice

Introduced in R2011b

Was this topic helpful?

Parallel Computing Toolbox Documentation

Other Documentation

Support