Per element CUDA operations such as `cv::cuda::divide`

or `cv::cuda::multiply`

in C++ can be applied to a matrix-matrix input as well as to a matrix-scalar input.

In fact, the following C++ code compiles flawlessly:

```
cv::Mat test;
cv::cuda::GpuMat cu_test(4, 4, CV_8UC1, 16);
cu_test.download(test);
std::cout << test << std::endl;
cv::cuda::divide(cu_test, 2, cu_test);
cu_test.download(test);
std::cout << test << std::endl;
```

And as expected outputs:

```
[ 16, 16, 16, 16;
16, 16, 16, 16;
16, 16, 16, 16;
16, 16, 16, 16]
[ 8, 8, 8, 8;
8, 8, 8, 8;
8, 8, 8, 8;
8, 8, 8, 8]
```

On the other hand, the same logic doesn’t work in Python.

For example, the Python equivalent of the above C++ code:

```
cu_test = cv2.cuda_GpuMat(4, 4, cv2.CV_8UC1, 16)
test = cu_test.download()
print(test)
cu_test = cv2.cuda.divide(cu_test, 2)
test = cu_test.download()
print(test)
```

Fails as at `cv2.cuda.divide`

:

```
[[16 16 16 16]
[16 16 16 16]
[16 16 16 16]
[16 16 16 16]]
Traceback (most recent call last):
File "test.py", line 7, in <module>
cu_test = cv2.cuda.divide(cu_test, 2)
cv2.error: OpenCV(4.10.0) :-1: error: (-5:Bad argument) in function 'divide'
> Overload resolution failed:
> - src1 is not a numpy array, neither a scalar
> - Expected Ptr<cv::cuda::GpuMat> for argument 'src2'
> - Expected Ptr<cv::UMat> for argument 'src1'
```

According to `help(cv2.cuda.divide)`

a matrix-scalar division is supposed to work:

```
divide(...)
divide(src1, src2[, dst[, scale[, dtype[, stream]]]]) -> dst
. @brief Computes a matrix-matrix or matrix-scalar division.
.
. @param src1 First source matrix or a scalar.
. @param src2 Second source matrix or scalar.
. @param dst Destination matrix that has the same size and number of channels as the input array(s).
. The depth is defined by dtype or src1 depth.
. @param scale Optional scale factor.
. @param dtype Optional depth of the output array.
. @param stream Stream for the asynchronous version.
.
. This function, in contrast to divide, uses a round-down rounding mode.
.
. @sa divide
```

Is there anything I’m missing on how to provide the scalar as a proper scalar to `cv2.cuda.divide`

?

Please note that I could get around this issue using a constant matrix that acts as the scalar:

```
cu_test = cv2.cuda_GpuMat(4, 4, cv2.CV_8UC1, 16)
cu_fake_scalar = cv2.cuda_GpuMat(4, 4, cv2.CV_8UC1, 2)
cu_test = cv2.cuda.divide(cu_test, cu_fake_scalar)
```

However it seems inefficient when working with high resolution images.

I tested with Python 3.6.9 and OpenCV versions 4.5.4 and also 4.10.0.

Thanks in advance!

Massimo