For the widely used systems equipped with the Intel® Processor Graphics (iGPU) Intel promotes the „zero copy“ shared use of system memory in CPU and iGPU as described in Getting the Most from OpenCL™ 1.2: How to Increase Performance by...
I am using OpenCV in a context, where the images reside in an external memory provided by our proprietary framework. So usually
cv::Mat::Mat (int rows, int cols, int type, void *data, size_t step=AUTO_STEP)
does the job to access the external memory within OpenCV. However, from my understanding there is no direct way to use the external memory as „zero copy“ shared memory with a cv::UMat.
With
void cv::ocl::convertFromBuffer (void *cl_mem_buffer, size_t step, int rows, int cols, int type, UMat &dst)
OpenCV in principle provides an interface to use this shared memory with a cv::UMat…
Is this a valid and intended way to use iGPU shared memory?
In fact in my tests this seems to accelerate the operations but the results are not stable (sometimes the external memory is not written as supposed) and sometimes the program even crashes within OpenCL operations.
I am using a self compiled version of OpenCV 4.5.1 with Microsoft Visual Studio 2017.
My code snippets look like:
#include <boost/align/aligned_alloc.hpp>
#include <opencv2/cvconfig.h>
#include <opencv2/core.hpp>
#include <opencv2/imgproc.hpp>
#include <opencv2/imgcodecs.hpp>
#include <opencv2/core/ocl.hpp>
#include <opencv2/core/opencl/runtime/opencl_core.hpp>
struct AlignedDeleter
{
void operator()(void* ptr) const
{
boost::alignment::aligned_free(ptr);
};
};
auto mapAlignedMemToUMat(cl_context ctx, const cv::Mat& m, cv::UMat& um, int readWriteFlag = CL_MEM_READ_WRITE)
{
cl_int status;
cl_mem oclMem = clCreateBuffer(ctx, CL_MEM_USE_HOST_PTR | readWriteFlag, m.step[0] * m.rows, m.data, &status);
if (status) throw(std::exception("Error in clCreateBuffer"));
cv::ocl::convertFromBuffer(oclMem, m.step[0], m.rows, m.cols, m.type(), um); // calls clRetain... and thus increments ref-counter for oclMem
status = clReleaseMemObject(oclMem);
if (status) throw(std::exception("Error in clReleaseMemObject"));
}
main(..)
{
int cols = 512;
inr rows = 400; // cols * rows is a multiple of 64
std::unique_ptr<unsigned char, AlignedDeleter> dataRead;
dataRead.reset(static_cast<unsigned char*>(boost::alignment::aligned_alloc(4096, cols * rows)));
std::unique_ptr<unsigned char, AlignedDeleter> dataWrite;
dataWrite.reset(static_cast<unsigned char*>(boost::alignment::aligned_alloc(4096, cols * rows)));
...read the image data ...
{ // within this scope the shared mem is not accessed by the host
cv::Mat mRead(rows, cols, CV_8U, dataRead.get(), rows); // just used to hold the image format
cv::Mat mWrite(rows, cols, CV_8U, dataWrite.get(), rows);
cv::ocl::Context CvCtx = cv::ocl::Context::getDefault();
auto ctx = reinterpret_cast<cl_context>(CvCtx.ptr());
cv::UMat uRead, uWrite;
mapAlignedMemToUMat(ctx, mRead, uRead, CL_MEM_READ_ONLY | CL_MEM_HOST_NO_ACCESS);
mapAlignedMemToUMat(ctx, mWrite, uWrite, CL_MEM_WRITE_ONLY | CL_MEM_HOST_NO_ACCESS);
... do some fancy image processing, whith uRead as cv::InputArray and uWrite as cv::OutputArray. eg uRead.copyTo(uWrite) ...
}
... check the results ...
}