I have not been able to find much info on cv::cuda::PtrStepSzf
except a couple of articles (one in japanese) that I am reading. They basically say what you wrote.
My question is, in terms of speed (performance) , is there some penalty to use the simpler
dOutput(iRow, iCol)
rather than a GpuMat and then ptr
and step
??
I am trying to get as much speed as I can get and though I would prefer to use PtrStepSzf, I cannot risk to lose speed.
Edit: Turns out I cannot use ptr inside a kernel, and I have to pass data and step directly. Just found out