yes, input data is already on GPU, the output data is also on GPU, after the fill process, the data will be processed by other custom APIs. this fill operation may not be nicely vectorized just as @berak said, and in my use case, points that forming polygon is fixed, so a tricky way to do this is : firstly, i use the cv::fillPoly to create a mask, thus dump it to a dat file. then load it in init process, and a kernel function to do the set value operation.