Cv::undistort gpu acceleration

Hi,
in my app I am using camera calibration and I use cv::undistort to compensate for massive barrel distortion of the camera. It works but it’s kind of too slow to be done real-time with acceptable frame-rate.

I’d expect a CUDA version of this function, but I haven’t been able to find anything of the sort. Is it really a CPU only function, or am I missing something?

Thanks
Jan

look for remap

besides, a lot of stuff in non-CUDA OpenCV can use OpenCL… which runs on GPUs. you just need to wrap cv::Mat in cv::UMat. usual caveats apply: talking to a GPU costs latency. if you are a newbie to GPU programming, you will have no idea how to make it perform well (that’s generally true about GPGPU programming, not just for OpenCV)

Hm, that’s a good point. I haven’t realized that. So I’ve been going over how OpenCL integration within OpenCV works and I think that just might help…

Also, is there a way I can tell if the code actually runs on GPU on the target platform? I read, there’s some runtime heuristic that is going to decide if it should run the code on CPU or GPU (if at all possible of course). Background: I am developing on Windows PC with GTX1080, but the target device is nVidia Jetson Xavier NX.

I know what you mean by the GPGPU calculations and newbies and fully agree. Yes, I’d describe myself as newbie but at least I know what you’re talking about. I had a university course on CUDA back in my days, but I have no intuition built up for Jetson whatsoever.

hmmmm I don’t know if OpenCV has ways to tell you where it’s running the T-API stuff (“Transparent” API i.e. UMat and every OpenCV function accepting those). it might, I just don’t know. the docs would say somewhere. IIRC there’s the cv::ocl namespace but I don’t know if that’s still supposed to be used or not.

I’d advise to use some system tool to monitor CPU and GPU utilization. that switch between “just run on CPU” and “worth running on GPU” should make the obvious decision if you have anything but trivial amounts of data.

a university course a very solid foundation. even if you remember no specifics at all (and I wouldn’t), the vague sense that there were issues to be aware of is valuable. and it’s more than most people have when they show up wanting to GPU-accelerate something and they go for clickbaity youtube tutorials and blog posts geared towards… clicks :wink:

Wow, this is an order of magnitude faster, thanks!

I employed both optimizations you suggested:

The result is perfectly smooth video feed. Massive thanks!

As for the university course… Yep, I remember pulling my hair out as to why the project still ran so slow and not even the teacher was able to tell why. He just said, that it should run pretty fast and that he couldn’t see where the problem was. My solution was just about 4 times faster than the reference CPU version, but the best guy was about 50 times faster than the CPU version. What I learned is that you have to build an intuition for that specific platform. All those different CUDA architectures, different cache hierarchies… also the communication speed with the ARM CPU would make difference. So yeah, I get the point.

If you need even more performance you might try using the fixed point maps. This assumes you are using CV_32FC1 maps (or a single CV_32FC2 map). You can do this by passing in CV_16SC2 as the map type in initUndistortRectifyMaps, or with convertMaps() after the fact. I think on my platform I saved about 30-40% processing time when applying the maps.

Also if your output image can stand to be lower resolution (or if it is getting downsampled anyway) you can set up the initUndistortRectifyMaps to handle that all at once, and the smaller the output image the faster it will be.

-Steve

Hi Steve,
thanks for the suggestion. Can you share some link where this is explained. I’ve been learning opencv for some time now, but this particular area (camera calibration, remapping…) seems to be poorly covered by both docs and examples…

As for the final resampling… Well, sort of. I don’t downsample the final image in the end, but rather upsample it a bit. The source resolution is 808x620 and I need to display it on 1024x768 screen (having small black bars on sides). Now given the fact, that OpenCV on nVidia Jetson is built without OpenGL, I need to upsample the image first before trying to “imshow” it. Otherwise it will use CPU for the upsampling and the performance then goes to hell. Calling resize before that however will upsample the images using OpenCL which is way faster and doing imshow then does not throttle it. If I understand it correctly your mapping suggestion still applies, right?

For the map type (to improve remap() performance) you can just look at the documentation either for initUndistortRectifyMap or convertMaps. I found that in my case this was “free” performance - I did a comparison of the remapping results (compared to the float based maps) and there was essentially zero image quality difference between the two. The benefit is faster remap() times and a smaller memory footprint for the maps.

I don’t know if I have a link that describes the second part - I’m sure I found something somewhere once upon a time, but this might help:

cv::initUndistortRectifyMap(m_camMat, m_distCoeffs, cv::Mat(),
                            perspective*m_camMat, cv::Size(destWidth, destHeight), CV_16SC2,
                            warpMap1, warpMap2);

The key parts of getting it to map to a different output image size is the 4th argument (perspective * m_camMat), and the 5th argument (the size of the output image). In my case I am actually using the full perspective transform since I’m remapping a world plane (which has perspective distortion that I want to correct) into my image plane. For your case I think you only care about the scale and maybe cropping, but you can still use this method to achieve what you want.

To get the perspective transform, set up two lists of point correspondences - the first in your source image coordinates, the second in your destination image coordinates.

std::vectorcv::Point2f sourcePoints;
sourcePoints.push_back(cv::Point2f(0,0));
sourcePoints.push_back(cv::Point2f(sourceWidth,0));
sourcePoints.push_back(cv::Point2f(sourceWidth,sourceHeight));
sourcePoints.push_back(cv::Point2f(0,sourceHeight));

std::vectorcv::Point2f destPoints;
destPoints.push_back(cv::Point2f(0,0));
destPoints.push_back(cv::Point2f(destWidth,0));
destPoints.push_back(cv::Point2f(destWidth,destHeight));
destPoints.push_back(cv::Point2f(0,destHeight));

then:
cv::Mat perspective = cv::getPerspectiveTransform(sourcePoints, destPoints);

These source/dest points can be wherever you want them to be in the source/dest images (well, they should probably form a convex quadrilateral in image space). The point is that if you can crop the source image by adjusting the source points (and you can also adjust where it lands in the dest image by adjusting the destination points). The magic gets encoded in the warp maps so all you have to do is call remap with the source/dest images.

I hope that helps.

Hi Steve,
thanks for the info. That looks promising.

I am just leaving for a week so I cannot try it right away, but will do so, when I am back.

Thanks again!
Jan