Nvidia Notebook 3050 Ti and i7 12700 is giving same performance for debayering RAW to RGB

My end goal is get 60 FPS with 4K RAW to RGB conversion. But I am getting around 40 to 42 FPS while doing debayering using cv2.cvtColor() method for CPU and cv2.cuda.cvtColor() in case nvidia GPU.
FPS are same with CPU (i7 12th gen 12700H) and with CUDA supported GPU(RTX3050Ti).
Is there any other method to increase FPS (performance)?

How are you timing the GPU execution, are you including the upload of a 4K RAW and the download of a 4K RGB in the timing.?

If so you could try to use CUDA streams and work one frame behind to hide this overhead.

Hi cudawarped,
I tested the timings without including upload and download.
And, regarding the use of CUDA stream, in my project input is coming from a camera and then I used multiporcessing so that I can get maximum real time FPS (~60). So I am not finding the solution where I can use multiprocessing and use only CUDA. Somewhere I have to involve CPU.

Can you share your timing code and your results for 4k frames on your GPU and CPU please?

I just quickly ran the performance tests on a 3070 Ti vs 12700H and the preliminary results on 4k (3840x2160) indicate that the GPU is 10x faster so I would estimate your 3050 Ti should be at least 3x faster than your CPU.

How are you hiding the overhead of the upload (and if you are using it download) of the raw (and RGB) frame? My estimation is that (without running any proper testing) is that the conversion should be taking less than 0.8ms which would equate to over 1000 FPS just for the conversion, if this is right then some other operations in your pipeline must be causing you to only see 40-42 FPS.

Hi ,
Thanks for reply.
You are right, there are few additional lines of code those were adding overall time and delaying the pipeline FPS.
I removed all the overheads and here are my new results:
Time: ~ 7 ms for cv2.cvtcColor() when using CPU (i7 )
Time : ~ 3.6 ms for cv2.cuda.cvtColor() when using GPU (3050 Ti)
Time: ~ 10 ms for upload, cvt and download when using GPU (3050 Ti)

But as I am streaming in realtime from camera, therefore I have to include upload and download code for GPU.
Is it possible that without adding upload (only once I create gpu_mat and the just use that reference) I can change frame data which is being used by cv2.cuda.cvtColor()? I mean only download I have to do to send to display for every frame to run it in real time.

You could use CUDA streams and run one frame behind to hide some of the overhead of the upload/download by overlapping the conversion with the transfer. See the below for an example
https://jamesbowley.co.uk/accelerating-opencv-with-cuda-streams-in-python/

I don’t follow but if you mean can you avoid uploading the frame, the answer is no.

I will look in to the shared link.
And yes, I meant to avoid uploading again and again.
Thanks for the help and quick response !!