Camera Supports 255 in FFMPEG, but Only getting ~185 in OpenCV

thedead · November 17, 2024, 5:51am

Hi All,

I’m I have a cheap ELP camera that outputs the following formats per FFMPEG.

[dshow @ ] vcodec=mjpeg min s=640x360 fps=260.004 max s=640x360 fps=260.004
[dshow @ ] vcodec=mjpeg min s=640x360 fps=260.004 max s=640x360 fps=260.004 (pc, bt470bg/bt709/unknown, center)
[dshow @ ] vcodec=mjpeg min s=1280x720 fps=120 max s=1280x720 fps=120
[dshow @ ] vcodec=mjpeg min s=1280x720 fps=120 max s=1280x720 fps=120 (pc, bt470bg/bt709/unknown, center)
[dshow @ ] vcodec=mjpeg min s=1920x1080 fps=60.0002 max s=1920x1080 fps=60.0002
[dshow @ ] vcodec=mjpeg min s=1920x1080 fps=60.0002 max s=1920x1080 fps=60.0002 (pc, bt470bg/bt709/unknown, center)

When I use FFMPEG directly, I can get 255 FPS using MJPEG + yuvj422p. I can also get 255 using their native windows tool.

Using OpenCV, I’m using this code below, and I average only ~185FPS while reading to memory and doing no other operations. Overall CPU /memory utilization is very low during the run. System is a Xeon E5-1650 + 16GB of RAM

import cv2
import time

cam = cv2.VideoCapture(0, cv2.CAP_DSHOW)
cam.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
cam.set(cv2.CAP_PROP_FRAME_HEIGHT, 360)
cam.set(cv2.CAP_PROP_FOURCC, cv2.VideoWriter_fourcc(*"MJPG"))


count = 0

while count < 500:  # Warmup
    rval, frame = cam.read()
    count = count + 1

count = 0
start_time = time.time()
while (time.time() - start_time) < 5:
    rval, frame = cam.read()
    count = count + 1
end_time = time.time()
print("Total Time: {}".format(end_time - start_time))
print("Total Frames: {}".format(count))
print("Average FPS: {}".format(count / (end_time - start_time)))

Results:

Total Time: 5.003589630126953
Total Frames: 929
Average FPS: 185.66670504040297

Total Time: 5.007279634475708
Total Frames: 937
Average FPS: 187.12755595845798

Total Time: 5.00612735748291
Total Frames: 929
Average FPS: 185.57258608520556

I then look at the time between each frame, and it seems like every 3 frames one takes longer:

read frame processed : 12.14289665222168 ms
read frame processed : 2.002239227294922 ms
read frame processed : 1.9993782043457031 ms
read frame processed : 11.974811553955078 ms
read frame processed : 2.0225048065185547 ms
read frame processed : 1.9757747650146484 ms
read frame processed : 11.033296585083008 ms
read frame processed : 0.9982585906982422 ms
read frame processed : 2.0055770874023438 ms
read frame processed : 11.561155319213867 ms
read frame processed : 1.9941329956054688 ms
read frame processed : 1.9996166229248047 ms
read frame processed : 12.000560760498047 ms
read frame processed : 2.0232200622558594 ms
read frame processed : 1.9807815551757812 ms

I tried to take FFMPEG and pipe it into open CV, but the pixel format is yuvj422p not BGR24 and when swiching the codec in FFMPEG from MJPEG to rawvideo FFMPEG drops to 100fps and bitrate skyrockets.

Any idea what would cause one of 3 frames to take longer to capture?

Is there a way to use openCV to capture frames at ~250 FPS?

crackwitz · November 17, 2024, 10:47am

set CAP_PROP_FOURCC. examples all over the place. that should do the trick.

that is junk because you used time.time(). you should have used time.perf_counter_ns() or time.perf_counter()

besides, that merely measures when your program executes, not when the frames are made.

cameras make frames at their own rate and put them into a queue.

thedead · November 17, 2024, 2:32pm

Thanks! I added this line to the cap and no change (updated original post too):
cam.set(cv2.CAP_PROP_FOURCC, cv2.VideoWriter_fourcc(*“MJPG”))

I get the exact same results using perf_counter_ns

I have confirmed with FFMPEG and another DSHOW app that I can pull 255 FPS with this camera.

crackwitz · November 17, 2024, 4:38pm

change up the order. it might require being first or last, where one of those will work and the other won’t.

if none of those work, pass the cap props in the constructor instead of using set() calls.

thedead · November 18, 2024, 1:43am

Tried in multiple places, no difference. I tried these two variations as well.

cam.set(cv2.CAP_PROP_FOURCC, cv2.VideoWriter_fourcc(*"MJPG"))

cam.set(cv2.CAP_PROP_FOURCC, cv2.VideoWriter_fourcc("M", "J", "P", "G"))

What’s the best way to do that here? I couldn’t figure out the syntax.

Any idea on why when counting the time to read a frame, it’s always every 3rd one that is 5x longer? I think if we can get that frame to take the same time as the rest it would probably work as expected, especially given ffmpeg and another windows app is able to pull 255 frames.

cudawarped · November 18, 2024, 8:11am

It probably won’t make any difference but just to rule it out, try passing the pre-allocated frame to
cam.read() to avoid alocating a new one on every invocation. e.g.

rval, _ = cam.read(frame)

crackwitz · November 18, 2024, 8:43am

you’re saying “low” CPU usage, but that still could be using an entire core…

set CAP_PROP_CONVERT_RGB to 0 and see how that affects the frame rate

thedead · November 25, 2024, 4:27am

Same results…

Total Frames: 917
Average FPS: 183.33326669092452

thedead · November 25, 2024, 4:34am

I ran the script multiple times and none of the cores seem to come close to max, picture attached:

no change at all.

crackwitz · November 25, 2024, 8:39am

the scheduler makes a process/thread hop around. it still uses “an entire core”, but spread around, and you won’t see it in that graph. you’ll only see it in a per-process/per-thread accounting.

that per-core graph looks to me like the CPU is fairly busy, i.e. there’s probably something taking up an entire core’s worth of time.

don’t expect parallelization. there might be some, in some parts, but there might not. processing doesn’t simply use multiple cores on its own. someone has to put effort into writing the algorithm to use parallelism. and that’s speaking generally.

you’re only doing a video capture. the only “heavy lifting” here is (1) data transfer, which is cheap and doesn’t take up CPU (2) VideoCapture’s color space conversion, which can be disabled (explained earlier). that too shouldn’t take up much time, but at high frame rates, it might become significant.

cudawarped · November 25, 2024, 10:02am

One possibility is that your read thread is sleeping or in a wait state, which could be the case here, see

github.com

opencv/opencv/blob/7095cb6904e682a5a6c1c1229e3a519dc9d91f63/modules/videoio/src/cap_dshow.cpp#L1551


      
          // ----------------------------------------------------------------------
          
          bool videoInput::getPixels(int id, unsigned char * dstBuffer, bool flipRedAndBlue, bool flipImage){
          
              bool success = false;
          
              if(isDeviceSetup(id)){
                  if(bCallback){
                      //callback capture
          
                      DWORD result = WaitForSingleObject(VDList[id]->sgCallback->hEvent, 1000);
                      if( result != WAIT_OBJECT_0) return false;
          
                      //double paranoia - mutexing with both event and critical section
                      EnterCriticalSection(&VDList[id]->sgCallback->critSection);
          
                          unsigned char * src = VDList[id]->sgCallback->pixels;
                          unsigned char * dst = dstBuffer;
                          int height             = VDList[id]->height;
                          int width              = VDList[id]->width;

That is the long waits are because you arrive before a frame is ready and the thread then enters a wait state, to be woken up when the thread is ready. The extra delay you are seeing is then due to the precision (10-15ms on Windows ) of the system timers. Then because you waited so long before the next two frame ready when you call VideoCapture::read() and are processed imidiately, taking ~2ms.

I haven’t checked but I would have thought you could see if changing

DWORD result = WaitForSingleObject(VDList[id]->sgCallback->hEvent, 1000);
if( result != WAIT_OBJECT_0) return false;

to

while (WaitForSingleObject(VDList[id]->sgCallback->hEvent, 0) != WAIT_OBJECT_0) continue;

makes a difference.

Either way from reading this thread it looks like the implementation in OpenCV, which uses the Direct Show API not FFMpeg is probably what is causing the difference you are observing.

crackwitz · November 25, 2024, 10:48am

you can try VideoCapture::waitAny(). that’s supposed to be able to poll.

still, there’s nothing you could do with that gained time except poll again and again.

thedead · January 2, 2025, 5:08pm

If its a windows precision thing, would trying this in linux help before trying to custom compile a version on windows?

cudawarped · January 2, 2025, 5:30pm

Yes if that’s easier, however I have no idea if this will make any difference.

thedead · January 3, 2025, 4:20am

That was it… so I’m able to get 255fps in linux with the exact same code.

Thanks for the help! Is there any info I can share to help get this solved in Windows, or it’s a Windows limitation?

Topic		Replies	Views
Slow frame read from USB100W05MT-DL.36	0	66	December 7, 2024
Slow frame read from webcam C++ gstreamer , videoio	15	10003	December 8, 2024
Recording camera video - fps drop? videoio , nvidia	7	2557	January 27, 2021
VideoWriter - Change Framerate After Open Python videoio	3	1434	May 12, 2023
OpenCV VideoCapture recording is sped up Python videoio	5	483	September 30, 2024

Camera Supports 255 in FFMPEG, but Only getting ~185 in OpenCV

Related topics