I’m working on code that captures images from an instrument, analyzes them, then displays the images on a monitor with some information (lines, numbers, text) added to the frames. The images are coming from a USB camera, 2592 x 1944. To analyze I need all the pixels, but for subsequent display it’s OK to shrink by a linear factor of 2 to 3.
I first wrote code that grabbed frames with cv.VideoCapture(), did the analysis with numpy and scipy, made a reduced-size frame for display with cv.resize(), drew geometry on the frames, annotated them with cv.putText() and lastly displayed them with cv.imshow(). It was very slow (2 frames/sec) and the latency was seconds.
This sped up to about 4 frames/sec when I changed the backend to V4L2. I then did some experiments to try and understand what was limiting the speed.
If cv.VideoCapture() is in a busy loop , I can get 18 frames/sec from the camera. So I put this into a thread by itself. I then put the frame analysis in a second thread, and the imshow() in a third thread. (Hardware is 4 cores.)
In the end, I was able to get about 10 frames/sec, with 260% load on a 4-core CPU. That’s not quite as good as I would like, but it’s OK. (Note: I’m using opencv compiled for all cores, so cv.getNumThreads() returns 4 and cv.checkHardwareSupport(100) returns True, where 100==CV_CPU_NEON.)
The problem: there is far too much latency – about 2 seconds (not a typo!). So if I jiggle the instrument, it takes 2 seconds at 10fps before the image reacts. I’m not sure where that latency is coming from - reducing the buffer size in VideoCapture() does not help.
My question: is this latency coming from the python layer? Or will shifting to C++ not fix it?
Before someone says “it’s not possible on your hardware, you need better hardware”, I have proof that the hardware is sufficient. I have downloaded and installed guvcview on the same system (Raspberry pi 4 b, four x 64-bit NEON cores). This program, guvcview, uses the same V4L2 library as opencv, and displays 17 fps from the same camera, in real time at 2592 x 1944 with no observable latency, using one core pegged to 100%.