Python CUDA operations on matrices not working with scalars

mminervini · October 22, 2024, 1:24pm

Per element CUDA operations such as cv::cuda::divide or cv::cuda::multiply in C++ can be applied to a matrix-matrix input as well as to a matrix-scalar input.
In fact, the following C++ code compiles flawlessly:

cv::Mat test;
cv::cuda::GpuMat cu_test(4, 4, CV_8UC1, 16);
cu_test.download(test);
std::cout << test << std::endl;
cv::cuda::divide(cu_test, 2, cu_test);
cu_test.download(test);
std::cout << test << std::endl;

And as expected outputs:

[ 16,  16,  16,  16;
  16,  16,  16,  16;
  16,  16,  16,  16;
  16,  16,  16,  16]
[  8,   8,   8,   8;
   8,   8,   8,   8;
   8,   8,   8,   8;
   8,   8,   8,   8]

On the other hand, the same logic doesn’t work in Python.
For example, the Python equivalent of the above C++ code:

cu_test = cv2.cuda_GpuMat(4, 4, cv2.CV_8UC1, 16)
test = cu_test.download()
print(test)
cu_test = cv2.cuda.divide(cu_test, 2)
test = cu_test.download()
print(test)

Fails as at cv2.cuda.divide:

[[16 16 16 16]
 [16 16 16 16]
 [16 16 16 16]
 [16 16 16 16]]
Traceback (most recent call last):
  File "test.py", line 7, in <module>
    cu_test = cv2.cuda.divide(cu_test, 2)
cv2.error: OpenCV(4.10.0) :-1: error: (-5:Bad argument) in function 'divide'
> Overload resolution failed:
>  - src1 is not a numpy array, neither a scalar
>  - Expected Ptr<cv::cuda::GpuMat> for argument 'src2'
>  - Expected Ptr<cv::UMat> for argument 'src1'

According to help(cv2.cuda.divide) a matrix-scalar division is supposed to work:

divide(...)
    divide(src1, src2[, dst[, scale[, dtype[, stream]]]]) -> dst
    .   @brief Computes a matrix-matrix or matrix-scalar division.
    .   
    .   @param src1 First source matrix or a scalar.
    .   @param src2 Second source matrix or scalar.
    .   @param dst Destination matrix that has the same size and number of channels as the input array(s).
    .   The depth is defined by dtype or src1 depth.
    .   @param scale Optional scale factor.
    .   @param dtype Optional depth of the output array.
    .   @param stream Stream for the asynchronous version.
    .   
    .   This function, in contrast to divide, uses a round-down rounding mode.
    .   
    .   @sa divide

Is there anything I’m missing on how to provide the scalar as a proper scalar to cv2.cuda.divide?

Please note that I could get around this issue using a constant matrix that acts as the scalar:

cu_test = cv2.cuda_GpuMat(4, 4, cv2.CV_8UC1, 16)
cu_fake_scalar = cv2.cuda_GpuMat(4, 4, cv2.CV_8UC1, 2)
cu_test = cv2.cuda.divide(cu_test, cu_fake_scalar)

However it seems inefficient when working with high resolution images.

I tested with Python 3.6.9 and OpenCV versions 4.5.4 and also 4.10.0.

Thanks in advance!
Massimo

cudawarped · October 22, 2024, 7:30pm

All the functions which use a scalar in cudaarithm look like they need additional overloads. I’ll post a fix tomorrow.

cudawarped · October 23, 2024, 4:59pm

I haven’t had chance to submit a PR but the fix for divide is just a one liner. Add

CV_EXPORTS_W void inline divideWithScalar(InputArray src1, Scalar src2, OutputArray dst, double scale = 1, int dtype = -1, Stream& stream = Stream::Null()) {
    divide(src1, src2, dst, scale, dtype, stream);
}

underneath the existing defininition for divide

github.com

opencv/opencv_contrib/blob/80f1ca2442982ed518076cd88cf08c71155b30f6/modules/cudaarithm/include/opencv2/cudaarithm.hpp#L131


      
          @param dst Destination matrix that has the same size and number of channels as the input array(s).
          The depth is defined by dtype or src1 depth.
          @param scale Optional scale factor.
          @param dtype Optional depth of the output array.
          @param stream Stream for the asynchronous version.
          
          This function, in contrast to divide, uses a round-down rounding mode.
          
          @sa divide
           */
          CV_EXPORTS_W void divide(InputArray src1, InputArray src2, OutputArray dst, double scale = 1, int dtype = -1, Stream& stream = Stream::Null());
          
          /** @brief Computes per-element absolute difference of two matrices (or of a matrix and scalar).
          
          @param src1 First source matrix or scalar.
          @param src2 Second source matrix or scalar.
          @param dst Destination matrix that has the same size and type as the input array(s).
          @param stream Stream for the asynchronous version.
          
          @sa absdiff
           */

Then you can call divideWithScalar instead of divide, see below.

cu_test = cv2.cuda_GpuMat(4, 4, cv2.CV_8UC1, 16)
test = cu_test.download()
print(test)
cu_test = cv2.cuda.divideScalar(cu_test, 2)
test = cu_test.download()
print(test)

For full fix see

cudawarped · October 24, 2024, 7:49am

PR submited `cudaarithm`: fix python bindings for binary ops involving scalars by cudawarped · Pull Request #3815 · opencv/opencv_contrib · GitHub

Topic		Replies	Views
Calling matrix multiplication from python Python core	5	1086	September 11, 2022
cv::Matx in cuda kernel C++ cuda , core	6	991	November 4, 2022
How do I use the cv::divide function C++	2	1204	October 7, 2022
Confusion in cv2.multiply() function Python core	6	9246	June 21, 2022
OpenCV Java Multiplication for values less than 1 Android/Java	4	310	August 19, 2021

Python CUDA operations on matrices not working with scalars

Related topics