parallel.gpu.CUDAKernel

Create GPU CUDA kernel object from PTX and CU code

Syntax

KERN = parallel.gpu.CUDAKernel(PTXFILE,CPROTO) KERN = parallel.gpu.CUDAKernel(PTXFILE,CPROTO,FUNC) KERN = parallel.gpu.CUDAKernel(PTXFILE,CUFILE) KERN = parallel.gpu.CUDAKernel(PTXFILE,CUFILE,FUNC)

Description

KERN = parallel.gpu.CUDAKernel(PTXFILE,CPROTO) and KERN = parallel.gpu.CUDAKernel(PTXFILE,CPROTO,FUNC) create a CUDAKernel object that you can use to call a CUDA kernel on the GPU. PTXFILE is the name of the file that contains the PTX code, or the contents of a PTX file as a character vector; and CPROTO is the C prototype for the kernel call that KERN represents. If specified, FUNC must be a character vector that unambiguously defines the appropriate kernel entry name in the PTX file. If FUNC is omitted, the PTX file must contain only a single entry point.

KERN = parallel.gpu.CUDAKernel(PTXFILE,CUFILE) and KERN = parallel.gpu.CUDAKernel(PTXFILE,CUFILE,FUNC) create a kernel object that you can use to call a CUDA kernel on the GPU. In addition, they read the CUDA source file CUFILE, and look for a kernel definition starting with '__global__' to find the function prototype for the CUDA kernel that is defined in PTXFILE.

For information on executing your kernel object, see Run a CUDAKernel.

Examples

If simpleEx.cu contains the following:

/*
* Add a constant to a vector.
*/
__global__ void addToVector(float * pi, float c, int vecLen)  {
   int idx = blockIdx.x * blockDim.x + threadIdx.x;
   if (idx < vecLen) {
       pi[idx] += c;
   }
}

and simpleEx.ptx contains the PTX resulting from compiling simpleEx.cu into PTX, both of the following statements return a kernel object that you can use to call the addToVector CUDA kernel.

kern = parallel.gpu.CUDAKernel('simpleEx.ptx', ...
                                             'simpleEx.cu');
kern = parallel.gpu.CUDAKernel('simpleEx.ptx', ...
                                     'float *,float,int');

Documentation

parallel.gpu.CUDAKernel

Syntax

Description

Examples

See Also

Introduced in R2010b

Parallel Computing Toolbox Documentation

Other Documentation

Support