Python CUDA operations on matrices not working with scalars

Per element CUDA operations such as cv::cuda::divide or cv::cuda::multiply in C++ can be applied to a matrix-matrix input as well as to a matrix-scalar input.
In fact, the following C++ code compiles flawlessly:

cv::Mat test;
cv::cuda::GpuMat cu_test(4, 4, CV_8UC1, 16);
cu_test.download(test);
std::cout << test << std::endl;
cv::cuda::divide(cu_test, 2, cu_test);
cu_test.download(test);
std::cout << test << std::endl;

And as expected outputs:

[ 16,  16,  16,  16;
  16,  16,  16,  16;
  16,  16,  16,  16;
  16,  16,  16,  16]
[  8,   8,   8,   8;
   8,   8,   8,   8;
   8,   8,   8,   8;
   8,   8,   8,   8]

On the other hand, the same logic doesn’t work in Python.
For example, the Python equivalent of the above C++ code:

cu_test = cv2.cuda_GpuMat(4, 4, cv2.CV_8UC1, 16)
test = cu_test.download()
print(test)
cu_test = cv2.cuda.divide(cu_test, 2)
test = cu_test.download()
print(test)

Fails as at cv2.cuda.divide:

[[16 16 16 16]
 [16 16 16 16]
 [16 16 16 16]
 [16 16 16 16]]
Traceback (most recent call last):
  File "test.py", line 7, in <module>
    cu_test = cv2.cuda.divide(cu_test, 2)
cv2.error: OpenCV(4.10.0) :-1: error: (-5:Bad argument) in function 'divide'
> Overload resolution failed:
>  - src1 is not a numpy array, neither a scalar
>  - Expected Ptr<cv::cuda::GpuMat> for argument 'src2'
>  - Expected Ptr<cv::UMat> for argument 'src1'

According to help(cv2.cuda.divide) a matrix-scalar division is supposed to work:

divide(...)
    divide(src1, src2[, dst[, scale[, dtype[, stream]]]]) -> dst
    .   @brief Computes a matrix-matrix or matrix-scalar division.
    .   
    .   @param src1 First source matrix or a scalar.
    .   @param src2 Second source matrix or scalar.
    .   @param dst Destination matrix that has the same size and number of channels as the input array(s).
    .   The depth is defined by dtype or src1 depth.
    .   @param scale Optional scale factor.
    .   @param dtype Optional depth of the output array.
    .   @param stream Stream for the asynchronous version.
    .   
    .   This function, in contrast to divide, uses a round-down rounding mode.
    .   
    .   @sa divide

Is there anything I’m missing on how to provide the scalar as a proper scalar to cv2.cuda.divide?

Please note that I could get around this issue using a constant matrix that acts as the scalar:

cu_test = cv2.cuda_GpuMat(4, 4, cv2.CV_8UC1, 16)
cu_fake_scalar = cv2.cuda_GpuMat(4, 4, cv2.CV_8UC1, 2)
cu_test = cv2.cuda.divide(cu_test, cu_fake_scalar)

However it seems inefficient when working with high resolution images.

I tested with Python 3.6.9 and OpenCV versions 4.5.4 and also 4.10.0.

Thanks in advance!
Massimo

All the functions which use a scalar in cudaarithm look like they need additional overloads. I’ll post a fix tomorrow.

2 Likes

I haven’t had chance to submit a PR but the fix for divide is just a one liner. Add

CV_EXPORTS_W void inline divideWithScalar(InputArray src1, Scalar src2, OutputArray dst, double scale = 1, int dtype = -1, Stream& stream = Stream::Null()) {
    divide(src1, src2, dst, scale, dtype, stream);
}

underneath the existing defininition for divide

Then you can call divideWithScalar instead of divide, see below.

cu_test = cv2.cuda_GpuMat(4, 4, cv2.CV_8UC1, 16)
test = cu_test.download()
print(test)
cu_test = cv2.cuda.divideScalar(cu_test, 2)
test = cu_test.download()
print(test)

For full fix see

1 Like

PR submited `cudaarithm`: fix python bindings for binary ops involving scalars by cudawarped · Pull Request #3815 · opencv/opencv_contrib · GitHub

1 Like