OpenCV and iGPU shared memory

For the widely used systems equipped with the Intel® Processor Graphics (iGPU) Intel promotes the „zero copy“ shared use of system memory in CPU and iGPU as described in Getting the Most from OpenCL™ 1.2: How to Increase Performance by...

I am using OpenCV in a context, where the images reside in an external memory provided by our proprietary framework. So usually

cv::Mat::Mat (int rows, int cols, int type, void *data, size_t step=AUTO_STEP)

does the job to access the external memory within OpenCV. However, from my understanding there is no direct way to use the external memory as „zero copy“ shared memory with a cv::UMat.

With
void cv::ocl::convertFromBuffer (void *cl_mem_buffer, size_t step, int rows, int cols, int type, UMat &dst)

OpenCV in principle provides an interface to use this shared memory with a cv::UMat…

Is this a valid and intended way to use iGPU shared memory?

In fact in my tests this seems to accelerate the operations but the results are not stable (sometimes the external memory is not written as supposed) and sometimes the program even crashes within OpenCL operations.

I am using a self compiled version of OpenCV 4.5.1 with Microsoft Visual Studio 2017.

My code snippets look like:

#include <boost/align/aligned_alloc.hpp>

#include <opencv2/cvconfig.h>
#include <opencv2/core.hpp>
#include <opencv2/imgproc.hpp>
#include <opencv2/imgcodecs.hpp>

#include <opencv2/core/ocl.hpp>
#include <opencv2/core/opencl/runtime/opencl_core.hpp>


struct AlignedDeleter
{
  void operator()(void* ptr) const
  {
    boost::alignment::aligned_free(ptr);
  };
};

auto mapAlignedMemToUMat(cl_context ctx, const cv::Mat& m, cv::UMat& um, int readWriteFlag = CL_MEM_READ_WRITE)
{
  cl_int status;
  cl_mem oclMem = clCreateBuffer(ctx, CL_MEM_USE_HOST_PTR | readWriteFlag, m.step[0] * m.rows, m.data, &status);
  if (status) throw(std::exception("Error in clCreateBuffer"));
  cv::ocl::convertFromBuffer(oclMem, m.step[0], m.rows, m.cols, m.type(), um);  // calls clRetain... and thus increments ref-counter for oclMem
  status = clReleaseMemObject(oclMem);
  if (status) throw(std::exception("Error in clReleaseMemObject"));
}


main(..)
{
  int cols = 512;
  inr rows = 400; // cols * rows is a multiple of 64
  
  std::unique_ptr<unsigned char, AlignedDeleter> dataRead;
  dataRead.reset(static_cast<unsigned char*>(boost::alignment::aligned_alloc(4096, cols * rows)));
  std::unique_ptr<unsigned char, AlignedDeleter> dataWrite;
  dataWrite.reset(static_cast<unsigned char*>(boost::alignment::aligned_alloc(4096, cols * rows)));

  ...read the image data ...
  
  {  // within this scope the shared mem is not accessed by the host
    cv::Mat mRead(rows, cols, CV_8U, dataRead.get(), rows);   // just used to hold the image format
    cv::Mat mWrite(rows, cols, CV_8U, dataWrite.get(), rows);

    cv::ocl::Context CvCtx = cv::ocl::Context::getDefault();
    auto ctx = reinterpret_cast<cl_context>(CvCtx.ptr());
  
    cv::UMat uRead, uWrite;
    mapAlignedMemToUMat(ctx, mRead, uRead, CL_MEM_READ_ONLY | CL_MEM_HOST_NO_ACCESS);
    mapAlignedMemToUMat(ctx, mWrite, uWrite, CL_MEM_WRITE_ONLY | CL_MEM_HOST_NO_ACCESS);
  
    ... do some fancy image processing, whith uRead as cv::InputArray and uWrite as cv::OutputArray. eg  uRead.copyTo(uWrite)  ...
  }
  ... check the results ...
}
2 Likes

I also have no idea about how OpenCV intends to support the shared use of system memory in CPU and iGPU. However, I did some tests using the code above and it appears that reading the system RAM in the manner descriped above is stable and significantly improves the performance (compared to “cv::Mat::copyTo(cv::UMat)”). But writing to the RAM using the UMat, that references the RAM as OutputArray, does not work.

I am only an OpenCV-user and have no deep view on the insights, but it appears that the problems in writing are related to synchronisation and the fact that a UMat may create a new OpenCL handle for write operations.

1 Like

Depending on what you actually want to achieve you could use OpenCL-interop extensions. Which are available on recent Intel platforms via an alternative compute-runtime. I am not aware of OpenCV making any direct use of UMA, but the graphics stack might do it for you.

A post was split to a new topic: UMat to Mat, how can I save time?