Where is ptr pointing for a cuda::GpuMat?

I doubt it as nvcc will heavily optimize your code and the resulting instructions may even be the same in both cases. If you really want you can check by examining the resulting ptx.

I am not sure why you can’t use cv::cuda::PtrStepSzf because this is how device data is passed to “most” (at least from my quick search) of the CUDA kernels in OpenCV.