cv::fillPoly cuda implementation

i want to speed up the cv::fillPoly process and considering using cuda code, so is there any cuda implementation of cv::fillPoly function? i could only find cpu code from opencv open source code, it seems no cv::cuda::fillPoly implementated. Could anyone give me a hint ?

anyone help, please~

afaik, there is no cuda or even ocl (UMat) implementation for this

perhaps CUDA expects you to use a graphics API to draw this (OpenGL/Vulkan/D3D)

most drawing functions require “random access” to the pixels, and thus cant be nicely vectorized

GPUs were originally made to accelerate rasterization of triangles. granted, it’s still considered part of the “fixed function” parts of GPUs. so either one has to replicate it with shaders, or the fixed function is somehow accessible

Are you already performing processing on the GPU?

I’m asking because if not then a GPU based solution may not be of any help, that is upload/fill/download could be slower than processing on the CPU, whereas if the data is already on the GPU and you have implemented a way to display directly from GPU memory without downloading then you should see a speed boost. That said if you are displaying from GPU memory using opengl then I would perform the fill operation using that API.

yes, input data is already on GPU, the output data is also on GPU, after the fill process, the data will be processed by other custom APIs. this fill operation may not be nicely vectorized just as @berak said, and in my use case, points that forming polygon is fixed, so a tricky way to do this is : firstly, i use the cv::fillPoly to create a mask, thus dump it to a dat file. then load it in init process, and a kernel function to do the set value operation. :slight_smile:

thank you for the masterials

That’s great. If the mask is fixed I suspect setTo() will be as quick as a GPU based fillPoly. Is the dat file with a custom kernel quicker than GpuMat::setTo() with a mask image?

just comparing the set value operation and the setTo function, my code is slower 0.1ms, :face_exhaling:
kernel time cost:
Time: 1.00208 ms
Time: 0.686521 ms
Time: 0.660184 ms
Time: 0.718618 ms
Time: 0.690938 ms
Time: 0.672665 ms
Time: 0.70521 ms
Time: 0.669944 ms
Time: 0.659448 ms
Time: 0.669049 ms
Time: 0.676728 ms
Time: 0.662809 ms
Time: 0.663865 ms
Time: 0.6726 ms
Time: 0.661401 ms
Time: 0.676665 ms
Time: 0.666073 ms
Time: 0.667033 ms
Time: 0.66812 ms
Time: 0.668121 ms

setTo time cost:
Time: 0.9825 ms
Time: 0.609975 ms
Time: 0.571445 ms
Time: 0.56626 ms
Time: 0.559541 ms
Time: 0.59071 ms
Time: 0.554164 ms
Time: 0.563669 ms
Time: 0.559029 ms
Time: 0.563604 ms
Time: 0.572565 ms
Time: 0.545396 ms
Time: 0.556789 ms
Time: 0.547284 ms
Time: 0.563509 ms
Time: 0.586357 ms
Time: 0.569365 ms
Time: 0.564341 ms
Time: 0.565397 ms
Time: 0.567381 ms

input img data size: 3840x3160x3

Interesting may be its just the grid/block size you are using, or timing error.

If your mask is always fixed and it is only applied to a smaller sub ROI of your original image then you should be able to speed things up futher. That is if your masked area entirely contained within cv::Rect maskRoi(x,y,w,h) then you can use setTo() as
src(maskRoi).setTo(0, mask(maskRoi));
I checked this on your image dimensions (3840x3160x3) with Rect(500,500,500,500). The original time (src.setTo(0, mask)) was ~0.4ms and the time isolating only the ROI dropped to ~0.06ms. Obviously if your mask covers the whole image this will be of no help to you.

ok, thank you again for your advice. I’ll try it later when I get time.