I made some tests comparing OpenCV performance with some basic operations with or without CUDA.
I just threw in a few simple operators: greyscale conversion, thresholding, morphological operators, resizing.
To my surprise, the CUDA code was 50-60 times slower than the CPU!!! I tested on my laptop (core i7 vs GeForce MX130) and on a Nvidia Nano (ARM CPU) with similar results. CUDA code took 0.6 sec on my laptop, which is really a lot for a 5MP image.
CUDA 10.1/10.2 was used, and OpenCV 4.5.2 was compiled locally in both cases.
C++ and Python code gave similar performances in both cases.
Do you have any idea what am I doing wrong?
Here is my code for testing:
im=cv2.imread(filename)
# **** CPU implementation ****
start_t = time.time()
gray = cv2.cvtColor(im, cv2.COLOR_BGR2GRAY)
retval,thr = cv2.threshold(gray,128,255,cv2.THRESH_BINARY)
morph_kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE,(7,7))
morph = cv2.dilate(thr,morph_kernel)
morph = cv2.resize(morph,(640,480))
end_t = time.time()
print("Processing time : {}".format(end_t-start_t))
# **** GPU implementation ****
start_t = time.time()
gpu_frame = cv2.cuda_GpuMat()
gpu_frame.upload(im)
gpu_gray = cv2.cuda.cvtColor(gpu_frame, cv2.COLOR_BGR2GRAY)
retval,gpu_thr = cv2.cuda.threshold(gpu_gray,128,255,cv2.THRESH_BINARY)
morph_kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE,(7,7))
morph_filter = cv2.cuda.createMorphologyFilter(cv2.MORPH_DILATE,cv2.CV_8U,morph_kernel)
gpu_morph = morph_filter.apply(gpu_thr)
gpu_morph = cv2.cuda.resize(gpu_morph,(640,480))
res = gpu_morph.download()
end_t = time.time()
print("Processing time : {}".format(end_t-start_t))