Python CUDA GpuMat upload() function, strange warm-up required?

Firstly fantastic MRE. Unfortunately in this case is was probably unecessary but its still great.

OK when you first call an OpenCV CUDA function (e.g. d_src.upload(img)) the CUDA context is initialized which has a significant delay. A “standard” convention for initializing the CUDA context in OpenCV is to call cuda::setDevice() during the intialization of your program, however because OpenCV uses the CUDA runtime API calling any CUDA function will have the same effect.

Additionaly if you encounter an additional initialization delay the first time you call a CUDA function which launches a CUDA kernel then this will most likely be due to the driver loading that code onto the device. You can check for this by timing the same function again directly afterwards.

Althoug the link to the build script is not working, so I can’t be sure, I do not suspect this is a PTX compilation delay because the delay is small and consistent over multiple runs (your JIT cache, see CUDA_CACHE_MAX, should be big enough not to have to evict JIT compiled PTX code between runs).

1 Like