Usage of UMat and Mat

Hi Everyone,

I am using OpenCV 4.1.2 and I want to understand usage of UMat and Mat. I wonder that when I retrieve an UMat from Mat by using getUMat() function, what happen if I make a change on UMat. Is Mat changed at the same time with UMat (I mean is that same operation made on Mat or not)? It is important for me because I have some issues about usage percantage of my CPU. For understand that I made some tests by using google micro benchmark library. I made some resize operations and measure their time. I expected that when I retrieve an Umat from Mat and made operations on it, it will take longer time than a normal UMat which created by using constructer or near time. However, I got some interesting results.

You can find my code below.

#include <opencv2/core.hpp>
#include <opencv2/imgproc.hpp>
#include <opencv2/imgcodecs.hpp>
#include <opencv2/core/ocl.hpp>
#include <opencv2/highgui.hpp>
#include <string>
#include <iostream>
#include <fstream>
#include <chrono>
#include <benchmark/benchmark.h>

constexpr int inHeight = 1200;
constexpr int inWidth = 1920;
constexpr int outHeight = 600;
constexpr int outWidth = 960;

void resize(cv::UMat umat){
  cv::resize(umat, umat, cv::Size(outWidth, outHeight));
}

void resize(cv::Mat mat){
  cv::resize(mat, mat, cv::Size(outWidth, outHeight));
}

static void BM_ResizeUMatRetrievedFromMat(benchmark::State& state) {
  // Perform setup here
  for (auto _ : state) {
    // This code gets timed
    state.PauseTiming();
    cv::ocl::setUseOpenCL(true);
    cv::Mat mat(inWidth, inHeight, CV_8UC3, cv::Scalar(0,0,0));
    cv::UMat umat = mat.getUMat(cv::ACCESS_RW,
                                cv::USAGE_ALLOCATE_SHARED_MEMORY);
    state.ResumeTiming();
    resize(umat);
  }
}

static void BM_ResizeUMat(benchmark::State& state) {
  // Perform setup here
  for (auto _ : state) {
    // This code gets timed
    state.PauseTiming();
    cv::ocl::setUseOpenCL(true);
    cv::UMat umat(inWidth, inHeight, CV_8UC3, cv::Scalar(0,0,0),
                 cv::USAGE_ALLOCATE_SHARED_MEMORY);
    state.ResumeTiming();
    resize(umat);
  }
}

static void BM_ResizeMat(benchmark::State& state) {
  // Perform setup here
  for (auto _ : state) {
    // This code gets timed
    state.PauseTiming();
    cv::ocl::setUseOpenCL(true);
    cv::Mat mat(inWidth, inHeight, CV_8UC3, cv::Scalar(0,0,0));
    state.ResumeTiming();
    resize(mat);
  }
}

// Register the function as a benchmark
BENCHMARK(BM_ResizeUMatRetrievedFromMat);
// Register the function as a benchmark
BENCHMARK(BM_ResizeUMat);
// Register the function as a benchmark
BENCHMARK(BM_ResizeMat);

// Run the benchmark
BENCHMARK_MAIN();

These are my results:
image

You can see that from result, my expectation is wrong.

To sum up, Is there anybody who can explain why BM_ResizeUMat function took much longer time than BM_ResizeUMatRetrievedFromMat function and when I retrieve an UMat from Mat by using getUMat() function, what happen to Mat if I make a change on UMat ?

not at all. Mat is cpu memory, you’e manipulating a UMat copy on the gpu

specifically, those calls cause a copy from device to host memory, or from host memory to device memory

However, when I test that, I mean that I retrieved an UMat from a Mat and made some operations on it. Before and after the operation I check Mat and I saw that this operations also applied on Mat. You can find my code below:

#include <opencv2/core.hpp>
#include <opencv2/imgproc.hpp>
#include <opencv2/imgcodecs.hpp>
#include <opencv2/core/ocl.hpp>
#include <opencv2/highgui.hpp>
#include <string>
#include <iostream>
#include <fstream>
#include <chrono>
#include <benchmark/benchmark.h>

constexpr int inHeight = 1200;
constexpr int inWidth = 1920;

void copyTo(const cv::UMat &source, cv::UMat &target){
  cv::Rect roi(100,100,100,100);
  source(roi).copyTo(target(roi));
}

int main(){
  cv::UMat img(inWidth, inHeight, CV_8UC3, cv::Scalar(255,255,255),
               cv::USAGE_ALLOCATE_SHARED_MEMORY);
  cv::Mat sceneMat(inWidth, inHeight, CV_8UC3, cv::Scalar(0,0,0));
  cv::UMat sceneUMat = sceneMat.getUMat(cv::ACCESS_RW,
                                        cv::USAGE_ALLOCATE_SHARED_MEMORY);
  cv::imshow("sceneBefore", sceneMat);
  copyTo(img, sceneUMat);
  cv::imshow("img", img);
  cv::imshow("scene", sceneMat);
  cv::waitKey(20000);
}

How can it possible?

It can be possible due to shared memory between host (CPU) and device (GPU). As far as I understand, this is an optional feature in OpenCL, that depends on the Hardware you use. However, you can not rely on this and (for me) it is not transparent how OpenCV uses this feature.