Kernel executable on GPU
A CUDAKernel
object represents a CUDA kernel,
that can execute on a GPU. You create the kernel when you compile
PTX or CU code, as described in Run CUDA or PTX Code on GPU.
existsOnGPU | Determine if gpuArray or CUDAKernel is available on GPU |
feval | Evaluate kernel on GPU |
setConstantMemory | Set some constant memory on GPU |
A CUDAKernel
object has the following properties:
Property Name | Description |
---|---|
ThreadBlockSize | Size of block of threads on the kernel. This can be an integer
vector of length 1, 2, or 3 (since thread blocks can be up to 3-dimensional).
The product of the elements of ThreadBlockSize must
not exceed the MaxThreadsPerBlock for this kernel,
and no element of ThreadBlockSize can exceed the
corresponding element of the property MaxThreadBlockSize . |
MaxThreadsPerBlock | Maximum number of threads permissible in a single block for
this CUDA kernel. The product of the elements of ThreadBlockSize must
not exceed this value. |
GridSize | Size of grid (effectively the number of thread blocks that
will be launched independently by the GPU). This is an integer vector
of length 3. None of the elements of this vector can exceed the corresponding
element in the vector of the MaxGridSize property
of the GPUDevice object. |
SharedMemorySize | The amount of dynamic shared memory (in bytes) that each thread block can use. Each thread block has an available shared memory region. The size of this region is limited in current cards to ~16 kB, and is shared with registers on the multiprocessors. As with all memory, this needs to be allocated before the kernel is launched. It is also common for the size of this shared memory region to be tied to the size of the thread block. Setting this value on the kernel ensures that each thread in a block can access this available shared memory region. |
EntryPoint | (read-only) A character vector containing the actual entry
point name in the PTX code that this kernel is going to call. An example
might look like '_Z13returnPointerPKfPy' . |
MaxNumLHSArguments | (read-only) The maximum number of left hand side arguments that this kernel supports. It cannot be greater than the number of right hand side arguments, and if any inputs are constant or scalar it will be less. |
NumRHSArguments | (read-only) The required number of right hand side arguments needed to call this kernel. All inputs need to define either the scalar value of an input, the elements for a vector input/output, or the size of an output argument. |
ArgumentTypes | (read-only) Cell array of character vectors, the same length
as NumRHSArguments . Each of the character vectors
indicates what the expected MATLAB type for that input is (a numeric
type such as uint8 , single ,
or double followed by the word scalar or vector to
indicate if we are passing by reference or value). In addition, if
that argument is only an input to the kernel, it is prefixed by in ;
and if it is an input/output, it is prefixed by inout .
This allows you to decide how to efficiently call the kernel with
both MATLAB arrays and gpuArray, and to see which of the kernel inputs
are being treated as outputs. |