Yes, I’ve timed a few lines using high_resolution_clock on cv::Scalar mean() and SimpleBlobDetector detect and it seems that, when it is just a one line (function) the UMat is slower. I assume because of the time used to copy to GPU memory. However, when it is a bunch of functions in a row like GaussianBlur, cvtColor, minMaxLoc, threshold, erode, dilate, findContours, etc., all happening on the same UMat. then I notice the speedup.
However, I have a lot of those one line functions. I would think that keeping UMats active would waste a lot of time copying memory back and forth between CPU and GPU when the speed of the GPU would then be wasted.
For now I’m looking for clusters of OCV functions where OCL might come in handy. Going line by line through the code is proving to be a real pain. That mean() function, for instance, surprised me. I had forgotten about it. Ran the speed test and decided to leave it as it is.
Rather than going throw the whole list of OCV functions and trying to jog my memory of possible things to look for, I was hoping for a shorter list of the cpu cycle intensive functions.