Reusing Mat memory as much as possible when decoding a sequence of images

Is there a way to reuse allocated Mat memory in cv::imdecode(InputArray buf, int flags, Mat * dst) when the memory allocated in dst is equal to or larger than the memory needed for the image being decoded?

I know that the cv::imdecode() above will reuse memory when dst->data has the exact same size as the pixel array of the image being decoded, but I’m interested in preventing re-allocation when the dst->data is larger than the decoded pixel array also. Is there a way to make this happen?

The reason I want to do this is because I decode a sequence of encoded images (with varying sizes) in my application. I then process each image with GPU, which is much faster if I page-lock (pin) the memory into which the images are decoded and reuse that pinned memory for each image. By re-allocating memory only when needed (only when the next image is larger), I minimize the number of times I perform the (costly) allocation and pinning of memory.

Any ideas? Thanks a lot.

if you decode into a new Mat, and then subregion-copy that into the “page-locked” one, how does that perform?

Using Mat with page-locked memory is faster because it does not require a copy into page-locked memory memory when that Mat is uploaded into GPU: CUDA — Memory Model. This post details the CUDA memory model… | by Raj Prasanna Ponnuraj | Analytics Vidhya | Medium.

Given that, won’t allocating a new Mat (with pageable memory) and copying its data to an existing page-locked memory location have the same performance as uploading that new Mat directly into GPU?

After some more research, I now see a couple of options to do what I need:

  1. Replicate some of the logic in imdecode_() to look ahead and figure out the size and type of the image. Then create a Mat header with the same size and type, point that header to the pinned data I already have and know is large enough, and use this Mat as the dst parameter in imdecode().

  2. Create a custom allocator inheriting the cv::MatAllocator() abstract class like here, instantiate this class with some pinned memory, and set the instance as the allocator of the dst Mat. The custom allocator will allocate new memory only if the requested memory is larger than the memory it already has.

Option (1) looks straightforward, but limited to imdecode() only. Option (2) may not be as straightforward, but I can use it to change the behavior of any Mat::create() call on the Mat to which it’s applied, meaning it’s not limited to imdecode().

Thoughts or recommendations?

P.S. I found an old post that asked for the same feature I’m seeking: Does Mat::create() reallocate when new size is smaller? - OpenCV Q&A Forum

If you are not already using CUDA streams you could try that if the CUDA functions support it. Although this won’t let you overlap memcopies from the host to the device with GPU processing it “should” (depending on the functions you use) allow you to overlap the decode operation on the host with your device functions which “could” offer more of a speed up than pinning the memory.

1 Like