Use YuNet with CUDA

Hi, I have a large stack of videos(~95gb). Each video is around 20 minutes long in which there are faces in every frame. The aspect ratio is consistent in each video. I was trying to get faces from YuNet but it is terribly slow for this large corpus of data. Can anyone tell me how can I increase the detection speed? I looked at the opencv cuda documentation but couldn’t find any cuda support for YuNet. Should I just make it parallel using python’s multiprocessing library?