In my scene ,i want to decode video stream to BGR frame by GPU. But it is better to decode and convert yuv to BGR both on GPU.
Does cv2.VideoCapture work like this now? Or decode to yuv on GPU and convert to BGR on CPU.
cv2.VideoCapture will only output host/CPU frames. I am not sure exactly how the hardware acceleration works internally.
cudacodec.VideoReader decodes directly to device/GPU memory. If you build from the master branch you now have the option to output to BGR, BGRA, GRAY or NV12(YUV), with the default being BGRA. The decoder currently decodes everything to NV12, so if you choose BGR output format then it will run an extra CUDA kernel over the frame to perform the conversion.
In my scene, i am using a Tesla V100 to decode from rtsp and encode to rtsp.
After i use cudacodec to get the gpu_mat, is there anyway to use this gpu_mat to inference by pytorch or i must download to cpu and make it a torch.tensor and then inference.
I have a video stream which is 2560x1920.And i write a pipeline like decoding it and then encoding it and pushing it to another server. Yeah it is like a remuxing , but i will modify the frame, so i must decode and encode.
I am decoding the rgb frame by
cap = cv2.cudacodec.createVideoReader(url)
while True:
ret, frame = cap.nextFrame()
if ret:
image = frame.download()
image=cv2.cvtColor(image,cv2.COLOR_BGRA2BGR)
pushing(image)
And
I am pushing the rgb frame to pipe and pushing to the server by this cmd
Not that I am aware of, there was talk of an external alocator for pytorch etc. but I still don’t think that would work with OpenCV. In the future the OpenCV CUDA DNN backend should support GpuMat input, so if it is just for inference when that modification is made you could export your model from pytorch to ONNX and just use OpenCV. In the meantime you could try to hide the overhead of the download by using CUDA streams to overlap the download with the some CPU work. I know this is very inefficient (downloading from the device in OpenCV and the uploading to the device in pytorch) but I can’t think of another way.
Unfortunatley if you are altering the frame and need to encode there isn’t much I can suggest because cudacodec.videoWriter() is out of action.
I am not sure what you mean 0.8x for pushing pipe? Normal speed for what, processing your 2560x1920 video on a Tesla V100?
I can only guess but it is possible that you are not requesting frames at the source fps due to the resolution of the video combined with the overhead of the calls to download(), cvtColor() and pushing(). Have you tried streaming by just calling ret, frame = cap.nextFrame() to see if the issue disapears? Alternatively you could try the new allowFrameDrop flag to see if that works. If so you need to remember that this is just a convenience flag for prototyping and not really for production because you are dropping frames.
params = cv.cudacodec.VideoReaderInitParams()
params.allowFrameDrop = True
cap = cv2.cudacodec.createVideoReader(url,[],params)
What I would suggest is removing the call to cv2.cvtColor and setting the output from cudacodec::VideoReader to BGR.
the latest test code for me is this one ,is this the same way you suggest? This issue appeared as before.
cap = cv2.cudacodec.createVideoReader(url)
ret, frame = cap.nextFrame()
while ret:
frame = cv2.cuda.cvtColor(frame,cv2.COLOR_BGRA2BGR)
image = frame.download()
pushing(image)
ret, frame = cap.nextFrame()
Yeah, i mean the speed for pushing stream by ffmpeg cmd line on Tesla V100 with writing frame to a subprocess which will call ffmpeg -hwaccel cuvid -y -f rawvideo -pix_fmt bgr24 -s wxh -i - -c:v h264_nvenc -pix_fmt yuv420p -f rtsp -rtsp_transport tcp rtsp://xxxxxx this cmd to push the frame.
whatever that did, please do this instead. I only rearranged the code to use a single nextFrame. while-true loops are okay.
cap = cv2.cudacodec.createVideoReader(url)
while True:
(ret, frame) = cap.nextFrame()
if not ret: break
frame = cv2.cuda.cvtColor(frame,cv2.COLOR_BGRA2BGR)
image = frame.download()
pushing(image)
If you use the latest commint from the master you can slightly modify the code above from crackwitz and it should automatically drop frames for you if you are requesting frames more slowly than they are captured and output BGR instead of BGRA frames.
params = cv.cudacodec.VideoReaderInitParams()
params.allowFrameDrop = True
cap = cv2.cudacodec.createVideoReader(url,[],params)
cap.set(cv.cudacodec.ColorFormat_BGR)
while True:
(ret, frame) = cap.nextFrame()
if not ret: break
image = frame.download()
pushing(image)
In my experiance the quoted performance is realistic. To achieve this you need to call nextFrame() fast enough to saturate the decoder, that is to get 4x30=120 fps (H.264 8=bit V100) you would have to be calling nextFrame() faster than 120 fps (every 8 ms) which you won’t be doing if you are calling download(), cvtColor() and pushing() after nextFrame(). As I said above, try
cap = cv2.cudacodec.createVideoReader(url)
while True:
(ret, frame) = cap.nextFrame()
if not ret: break
to see what fps you are getting first to check there are no other issues present.