Usage of UMat and Mat

Mustafa_Bay · August 18, 2021, 2:54pm

Hi Everyone,

I am using OpenCV 4.1.2 and I want to understand usage of UMat and Mat. I wonder that when I retrieve an UMat from Mat by using getUMat() function, what happen if I make a change on UMat. Is Mat changed at the same time with UMat (I mean is that same operation made on Mat or not)? It is important for me because I have some issues about usage percantage of my CPU. For understand that I made some tests by using google micro benchmark library. I made some resize operations and measure their time. I expected that when I retrieve an Umat from Mat and made operations on it, it will take longer time than a normal UMat which created by using constructer or near time. However, I got some interesting results.

You can find my code below.

#include <opencv2/core.hpp>
#include <opencv2/imgproc.hpp>
#include <opencv2/imgcodecs.hpp>
#include <opencv2/core/ocl.hpp>
#include <opencv2/highgui.hpp>
#include <string>
#include <iostream>
#include <fstream>
#include <chrono>
#include <benchmark/benchmark.h>

constexpr int inHeight = 1200;
constexpr int inWidth = 1920;
constexpr int outHeight = 600;
constexpr int outWidth = 960;

void resize(cv::UMat umat){
  cv::resize(umat, umat, cv::Size(outWidth, outHeight));
}

void resize(cv::Mat mat){
  cv::resize(mat, mat, cv::Size(outWidth, outHeight));
}

static void BM_ResizeUMatRetrievedFromMat(benchmark::State& state) {
  // Perform setup here
  for (auto _ : state) {
    // This code gets timed
    state.PauseTiming();
    cv::ocl::setUseOpenCL(true);
    cv::Mat mat(inWidth, inHeight, CV_8UC3, cv::Scalar(0,0,0));
    cv::UMat umat = mat.getUMat(cv::ACCESS_RW,
                                cv::USAGE_ALLOCATE_SHARED_MEMORY);
    state.ResumeTiming();
    resize(umat);
  }
}

static void BM_ResizeUMat(benchmark::State& state) {
  // Perform setup here
  for (auto _ : state) {
    // This code gets timed
    state.PauseTiming();
    cv::ocl::setUseOpenCL(true);
    cv::UMat umat(inWidth, inHeight, CV_8UC3, cv::Scalar(0,0,0),
                 cv::USAGE_ALLOCATE_SHARED_MEMORY);
    state.ResumeTiming();
    resize(umat);
  }
}

static void BM_ResizeMat(benchmark::State& state) {
  // Perform setup here
  for (auto _ : state) {
    // This code gets timed
    state.PauseTiming();
    cv::ocl::setUseOpenCL(true);
    cv::Mat mat(inWidth, inHeight, CV_8UC3, cv::Scalar(0,0,0));
    state.ResumeTiming();
    resize(mat);
  }
}

// Register the function as a benchmark
BENCHMARK(BM_ResizeUMatRetrievedFromMat);
// Register the function as a benchmark
BENCHMARK(BM_ResizeUMat);
// Register the function as a benchmark
BENCHMARK(BM_ResizeMat);

// Run the benchmark
BENCHMARK_MAIN();

These are my results:

You can see that from result, my expectation is wrong.

To sum up, Is there anybody who can explain why BM_ResizeUMat function took much longer time than BM_ResizeUMatRetrievedFromMat function and when I retrieve an UMat from Mat by using getUMat() function, what happen to Mat if I make a change on UMat ?

berak · August 18, 2021, 6:29pm

not at all. Mat is cpu memory, you’e manipulating a UMat copy on the gpu

crackwitz · August 18, 2021, 6:31pm

specifically, those calls cause a copy from device to host memory, or from host memory to device memory

Mustafa_Bay · August 20, 2021, 5:38am

However, when I test that, I mean that I retrieved an UMat from a Mat and made some operations on it. Before and after the operation I check Mat and I saw that this operations also applied on Mat. You can find my code below:

#include <opencv2/core.hpp>
#include <opencv2/imgproc.hpp>
#include <opencv2/imgcodecs.hpp>
#include <opencv2/core/ocl.hpp>
#include <opencv2/highgui.hpp>
#include <string>
#include <iostream>
#include <fstream>
#include <chrono>
#include <benchmark/benchmark.h>

constexpr int inHeight = 1200;
constexpr int inWidth = 1920;

void copyTo(const cv::UMat &source, cv::UMat &target){
  cv::Rect roi(100,100,100,100);
  source(roi).copyTo(target(roi));
}

int main(){
  cv::UMat img(inWidth, inHeight, CV_8UC3, cv::Scalar(255,255,255),
               cv::USAGE_ALLOCATE_SHARED_MEMORY);
  cv::Mat sceneMat(inWidth, inHeight, CV_8UC3, cv::Scalar(0,0,0));
  cv::UMat sceneUMat = sceneMat.getUMat(cv::ACCESS_RW,
                                        cv::USAGE_ALLOCATE_SHARED_MEMORY);
  cv::imshow("sceneBefore", sceneMat);
  copyTo(img, sceneUMat);
  cv::imshow("img", img);
  cv::imshow("scene", sceneMat);
  cv::waitKey(20000);
}

How can it possible?

imagine · August 31, 2021, 12:04pm

It can be possible due to shared memory between host (CPU) and device (GPU). As far as I understand, this is an optional feature in OpenCL, that depends on the Hardware you use. However, you can not rely on this and (for me) it is not transparent how OpenCV uses this feature.

kallaballa · November 11, 2022, 9:17pm

Are you talking about OpenCL-SVM?

imagine · April 24, 2023, 3:13pm

Yes, this is called Shared Virtual Memory (SVM)

Topic		Replies	Views
Opencv4 on arm64,a problem with using UMat C++ opencl , imgproc , umat	2	21	December 18, 2024
UMat to Mat, how can I save time? opencl	0	357	January 4, 2024
Is the UMat matrix of the ocl module in OpenCV placed in the CPU or the GPU?	3	337	January 7, 2024
Converting UMat to Mat data type takes too long. How to solve this problem C++ opencl , gpu , umat	0	259	January 8, 2024
Using OpenCL on Image processing. Error in WarpAffine Python opencl , imgproc	6	163	July 3, 2024

Usage of UMat and Mat

Related topics