Using OpenCL on Image processing. Error in WarpAffine


First of all saying that I’m not an OpenCv expert…
As I have been reading using object UMat instead of Mat, OpenCv will use GPU for image processing instaed of CPU whitch is used when you use Mat objects.

My GPU is an integrated Intel UHD Graphics 620

I have been doing operations like rotation or drawing lines and it’s working well. The problem is using WarpAffine to make image zoom.

Here is my code and the resulting image. As you can see with UMat it’s generating unexpected results.

Thanks for your help

import os
import cv2 as cv
img_srcFile = "C:\\Users\\usr\\Downloads\\demo.jpg"
srcMat = cv.imread(img_srcFile, cv.IMREAD_UNCHANGED)
srcUMat = cv.UMat(cv.imread(img_srcFile, cv.IMREAD_UNCHANGED))
cv.imshow("SrcMat", srcMat)
cv.imshow("SrcUMat", srcUMat)
height, width = srcMat.shape[:2]
point = ((width - 1 )/ 2, (height - 1 )/ 2)
matrixTransfrom = cv.getRotationMatrix2D(point, 0, 1.3)
cv.warpAffine(srcMat, matrixTransfrom, (width, height), srcMat)
cv.imshow("SrcMat -> Zoom", srcMat)
cv.warpAffine(srcUMat, matrixTransfrom, (width, height), srcUMat)
cv.imshow("SrcUMat -> Zoom", srcUMat)

Here is BuildInformation

General configuration for OpenCV 4.9.0 =====================================
  Version control:               unknown
    Timestamp:                   2024-04-17T06:33:37Z
    Host:                        Windows 10.0.19044 AMD64
    CMake:                       3.29.1
    CMake generator:             Visual Studio 17 2022
    CMake build tool:            C:/Program Files/Microsoft Visual Studio/2022/Professional/MSBuild/Current/Bin/amd64/MSBuild.exe
    MSVC:                        1939
    Configuration:               Debug Release
  CPU/HW features:
    Baseline:                    SSE SSE2 SSE3
      requested:                 SSE3
    Dispatched code generation:  SSE4_1 SSE4_2 FP16 AVX AVX2 AVX512_SKX
      requested:                 SSE4_1 SSE4_2 AVX FP16 AVX2 AVX512_SKX
      SSE4_1 (18 files):         + SSSE3 SSE4_1
      SSE4_2 (2 files):          + SSSE3 SSE4_1 POPCNT SSE4_2
      FP16 (1 files):            + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 AVX
      AVX (9 files):             + SSSE3 SSE4_1 POPCNT SSE4_2 AVX
      AVX2 (38 files):           + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2
      AVX512_SKX (8 files):      + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2 AVX_512F AVX512_COMMON AVX512_SKX
    Built as dynamic libs?:      YES
    C++ standard:                11
    C++ Compiler:                C:/Program Files/Microsoft Visual Studio/2022/Professional/VC/Tools/MSVC/14.39.33519/bin/Hostx64/x64/cl.exe  (ver 19.39.33523.0)
    C++ flags (Release):         /DWIN32 /D_WINDOWS /W4 /GR  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi  /fp:precise     /EHa /wd4127 /wd4251 /wd4324 /wd4275 /wd4512 /wd4589 /wd4819 /MP  /O2 /Ob2 /DNDEBUG
    C++ flags (Debug):           /DWIN32 /D_WINDOWS /W4 /GR  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi  /fp:precise     /EHa /wd4127 /wd4251 /wd4324 /wd4275 /wd4512 /wd4589 /wd4819 /MP  /Zi /Ob0 /Od /RTC1
    C Compiler:                  C:/Program Files/Microsoft Visual Studio/2022/Professional/VC/Tools/MSVC/14.39.33519/bin/Hostx64/x64/cl.exe
    C flags (Release):           /DWIN32 /D_WINDOWS /W3  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi  /fp:precise     /MP   /O2 /Ob2 /DNDEBUG
    C flags (Debug):             /DWIN32 /D_WINDOWS /W3  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi  /fp:precise     /MP /Zi /Ob0 /Od /RTC1
    Linker flags (Release):      /machine:x64  /INCREMENTAL:NO
    Linker flags (Debug):        /machine:x64  /debug /INCREMENTAL
    ccache:                      NO
    Precompiled headers:         YES
    Extra dependencies:
    3rdparty dependencies:
  OpenCV modules:
    To be built:                 calib3d core dnn features2d flann gapi highgui imgcodecs imgproc ml objdetect photo python3 stitching ts video videoio
    Disabled:                    world
    Disabled by dependency:      -
    Unavailable:                 java python2
    Applications:                tests perf_tests apps
    Documentation:               NO
    Non-free algorithms:         NO
  Windows RT support:            NO
  GUI:                           WIN32UI
    Win32 UI:                    YES
    VTK support:                 NO
  Media I/O:
    ZLib:                        build (ver 1.3)
    JPEG:                        build-libjpeg-turbo (ver 2.1.3-62)
      SIMD Support Request:      YES
      SIMD Support:              NO
    WEBP:                        build (ver encoder: 0x020f)
    PNG:                         build (ver 1.6.37)
    TIFF:                        build (ver 42 - 4.2.0)
    JPEG 2000:                   build (ver 2.5.0)
    OpenEXR:                     build (ver 2.3.0)
    HDR:                         YES
    SUNRASTER:                   YES
    PXM:                         YES
    PFM:                         YES
  Video I/O:
    DC1394:                      NO
    FFMPEG:                      YES (prebuilt binaries)
      avcodec:                   YES (58.134.100)
      avformat:                  YES (58.76.100)
      avutil:                    YES (56.70.100)
      swscale:                   YES (5.9.100)
      avresample:                YES (4.0.0)
    GStreamer:                   YES (1.24.2)
    DirectShow:                  YES
    Media Foundation:            YES
      DXVA:                      YES
  Parallel framework:            Concurrency
  Trace:                         YES (with Intel ITT)
  Other third-party libraries:
    Intel IPP:                   2021.11.0 [2021.11.0]
           at:                   C:/CMake/opencv/opencv/build/3rdparty/ippicv/ippicv_win/icv
    Intel IPP IW:                sources (2021.11.0)
              at:                C:/CMake/opencv/opencv/build/3rdparty/ippicv/ippicv_win/iw
    Lapack:                      NO
    Eigen:                       YES (ver 3.4.0)
    Custom HAL:                  NO
    Protobuf:                    build (3.19.1)
    Flatbuffers:                 builtin/3rdparty (23.5.9)
  OpenCL:                        YES (NVD3D11)
    Include path:                C:/CMake/opencv/opencv/3rdparty/include/opencl/1.2
    Link libraries:              Dynamic load
  Python 3:
    Interpreter:                 C:/Program Files/Python312/python.exe (ver 3.12.2)
    Libraries:                   C:/Program Files/Python312/libs/python312.lib (ver 3.12.2)
    numpy:                       C:/Users/JuanDYB/AppData/Roaming/Python/Python312/site-packages/numpy/core/include (ver 1.26.4)
    install path:                C:/Program Files/Python312/Lib/site-packages/cv2/python-3.12
  Python (for build):            C:/Program Files/Python312/python.exe
    ant:                         NO
    Java:                        NO
    JNI:                         NO
    Java wrappers:               NO
    Java tests:                  NO
  Install to:                    C:/CMake/opencv/opencv/build/install

looks like you’re trying to make it work in-place. and it might have trouble with that.

let it return a new UMat or provide a separate UMat for the result.

if that fixes it, I’d say in-place is the issue. in that case, you should check for existing issues and maybe submit one about this.

make sure that this is reproducible on the latest release, 4.10.


Thanks for your help. You are all rigth, returning new UMat it works ok.

dstImg = cv.warpAffine(srcUMat, matrixTransfrom, (width, height))
cv.imshow("SrcUMat -> Zoom", dstImg)

I’m in version 4.9. I’ll build 4.10 and check again if it fails. Since I’m using Opencv with Gstreamer backed I have to rebuild from sources.

With regards,

eh, you can just install a prebuilt package. no need for gstreamer for this test. create a virtual env and pip install opencv-python


I have updated to EmguCv 4.9 and also to OpenCv 4.10 and I have tested the issue in both versions.

If seems the issue is there on both versions. As you said using different UMat for output I can bypass the bug.

dstImg = cv.warpAffine(srcUMat, matrixTransfrom, (width, height))
cv.imshow("SrcUMat -> Zoom", dstImg)

I have also opened the issue in Github.

Hi @crackwitz

Thanks for posting on the issue!

It’s a bad practice to use same UMat for input and output?

I’m doing this because I have multiple actions to same frame (WarpAffine, Rotate, DrawLines …)

If I do it with different input and output I have to create more intermediate variables or cotinusly change reference to the input one…

With regards,

in-place is suitable for few algorithms.

for most algorithms, “asking for” in-place is impossible, even in principle.

warpAffine is one such function. if it supports in-place, that’ll surely translate to an internal allocation.

you can keep buffers (mats, umats) around for as long as you need. the simplest way forward is to have two buffers, and use them in turn.