Create GPU CUDA kernel object from PTX and CU code
KERN = parallel.gpu.CUDAKernel(PTXFILE,CPROTO)
KERN
= parallel.gpu.CUDAKernel(PTXFILE,CPROTO,FUNC)
KERN = parallel.gpu.CUDAKernel(PTXFILE,CUFILE)
KERN
= parallel.gpu.CUDAKernel(PTXFILE,CUFILE,FUNC)
KERN = parallel.gpu.CUDAKernel(PTXFILE,CPROTO) and KERN
= parallel.gpu.CUDAKernel(PTXFILE,CPROTO,FUNC) create a CUDAKernel object
that you can use to call a CUDA kernel on the GPU. PTXFILE is
the name of the file that contains the PTX code, or the contents of
a PTX file as a string; and CPROTO is the C prototype
for the kernel call that KERN represents. If specified, FUNC must
be a string that unambiguously defines the appropriate kernel entry
name in the PTX file. If FUNC is omitted, the
PTX file must contain only a single entry point.
KERN = parallel.gpu.CUDAKernel(PTXFILE,CUFILE) and KERN
= parallel.gpu.CUDAKernel(PTXFILE,CUFILE,FUNC) create a
kernel object that you can use to call a CUDA kernel on the GPU. In
addition, they read the CUDA source file CUFILE,
and look for a kernel definition starting with '__global__' to
find the function prototype for the CUDA kernel that is defined in PTXFILE.
For information on executing your kernel object, see Run a CUDAKernel.
If simpleEx.cu contains the following:
/*
* Add a constant to a vector.
*/
__global__ void addToVector(float * pi, float c, int vecLen) {
int idx = blockIdx.x * blockDim.x + threadIdx.x;
if (idx < vecLen) {
pi[idx] += c;
}
}and simpleEx.ptx contains the PTX resulting
from compiling simpleEx.cu into PTX, both of the
following statements return a kernel object that you can use to call
the addToVector CUDA kernel.
kern = parallel.gpu.CUDAKernel('simpleEx.ptx', ... 'simpleEx.cu'); kern = parallel.gpu.CUDAKernel('simpleEx.ptx', ... 'float *,float,int');
arrayfun | existsOnGPU | feval | gpuArray | reset