cv::fillPoly cuda implementation

jeremy · February 10, 2022, 6:57am

i want to speed up the cv::fillPoly process and considering using cuda code, so is there any cuda implementation of cv::fillPoly function? i could only find cpu code from opencv open source code, it seems no cv::cuda::fillPoly implementated. Could anyone give me a hint ?
sincerely

jeremy · February 11, 2022, 9:59am

anyone help, please~

berak · February 11, 2022, 10:28am

afaik, there is no cuda or even ocl (UMat) implementation for this

crackwitz · February 11, 2022, 12:09pm

perhaps CUDA expects you to use a graphics API to draw this (OpenGL/Vulkan/D3D)

berak · February 11, 2022, 2:32pm

most drawing functions require “random access” to the pixels, and thus cant be nicely vectorized

crackwitz · February 12, 2022, 10:55am

GPUs were originally made to accelerate rasterization of triangles. granted, it’s still considered part of the “fixed function” parts of GPUs. so either one has to replicate it with shaders, or the fixed function is somehow accessible

cudawarped · February 13, 2022, 11:11am

Are you already performing processing on the GPU?

I’m asking because if not then a GPU based solution may not be of any help, that is upload/fill/download could be slower than processing on the CPU, whereas if the data is already on the GPU and you have implemented a way to display directly from GPU memory without downloading then you should see a speed boost. That said if you are displaying from GPU memory using opengl then I would perform the fill operation using that API.

jeremy · February 14, 2022, 3:30am

yes, input data is already on GPU, the output data is also on GPU, after the fill process, the data will be processed by other custom APIs. this fill operation may not be nicely vectorized just as @berak said, and in my use case, points that forming polygon is fixed, so a tricky way to do this is : firstly, i use the cv::fillPoly to create a mask, thus dump it to a dat file. then load it in init process, and a kernel function to do the set value operation.

jeremy · February 14, 2022, 3:31am

thank you for the masterials

cudawarped · February 14, 2022, 10:52am

That’s great. If the mask is fixed I suspect setTo() will be as quick as a GPU based fillPoly. Is the dat file with a custom kernel quicker than GpuMat::setTo() with a mask image?

jeremy · February 15, 2022, 3:07am

just comparing the set value operation and the setTo function, my code is slower 0.1ms,
kernel time cost:
Time: 1.00208 ms
Time: 0.686521 ms
Time: 0.660184 ms
Time: 0.718618 ms
Time: 0.690938 ms
Time: 0.672665 ms
Time: 0.70521 ms
Time: 0.669944 ms
Time: 0.659448 ms
Time: 0.669049 ms
Time: 0.676728 ms
Time: 0.662809 ms
Time: 0.663865 ms
Time: 0.6726 ms
Time: 0.661401 ms
Time: 0.676665 ms
Time: 0.666073 ms
Time: 0.667033 ms
Time: 0.66812 ms
Time: 0.668121 ms

setTo time cost:
Time: 0.9825 ms
Time: 0.609975 ms
Time: 0.571445 ms
Time: 0.56626 ms
Time: 0.559541 ms
Time: 0.59071 ms
Time: 0.554164 ms
Time: 0.563669 ms
Time: 0.559029 ms
Time: 0.563604 ms
Time: 0.572565 ms
Time: 0.545396 ms
Time: 0.556789 ms
Time: 0.547284 ms
Time: 0.563509 ms
Time: 0.586357 ms
Time: 0.569365 ms
Time: 0.564341 ms
Time: 0.565397 ms
Time: 0.567381 ms

jeremy · February 15, 2022, 3:35am

input img data size: 3840x3160x3

cudawarped · February 15, 2022, 10:37am

Interesting may be its just the grid/block size you are using, or timing error.

If your mask is always fixed and it is only applied to a smaller sub ROI of your original image then you should be able to speed things up futher. That is if your masked area entirely contained within cv::Rect maskRoi(x,y,w,h) then you can use setTo() as
src(maskRoi).setTo(0, mask(maskRoi));
I checked this on your image dimensions (3840x3160x3) with Rect(500,500,500,500). The original time (src.setTo(0, mask)) was ~0.4ms and the time isolating only the ROI dropped to ~0.06ms. Obviously if your mask covers the whole image this will be of no help to you.

jeremy · February 16, 2022, 1:45am

ok, thank you again for your advice. I’ll try it later when I get time.

Topic		Replies	Views
Is there a cuda version of cv::FillConvexPoly()? C++ cuda , imgproc	7	1165	October 11, 2021
Cv::undistort gpu acceleration C++ gpu , cuda , tapi , umat	8	4443	October 25, 2021
fillPolygon API usage C++ imgproc	3	1434	June 18, 2021
Any OpenCV Cuda function have similar used for cv::distanceTransform? C++ cuda , imgproc	4	402	August 14, 2023
Why is there no cuda accelerated image encoding / decoding methods? C++ cuda	2	1712	August 19, 2023

cv::fillPoly cuda implementation

Related topics