It depends, why are you filling the polygon is it for eventual display? If so depending on the runtimes of your other CUDA functions and your processing pipeline you could do the easy thing and dload the image asynchronously as early in you pipeline as possilbe and run the CPU (or maybe opencl with UMat) version of cv::fillConvexPoly()
asynchronously with your remaining CUDA routines. This is my apporach because all my image display routines are passed host images. Obviously this is not the most efficient approach, it would be much more efficient to display directly through the GPU using opengl etc. But if you are doing the also displaying your output by passing host images to opencv or other display functions there “may” be little benefit to drawing on your images using the GPU.
So I would guess if that approach isn’t efficient enough you would need to implement or find an implementation in CUDA or use opengl for display and draw the polygon there.