I am loading a yolo model with opencv in python with cv2.dnn_DetectionModel(cfg,weights) and then calling net.detect(img) . I think I can get a speed-up per image using batches, but I don’t see any support for batch size other than one.
img is a (h,w,3) uint8 numpy array. No amount of fiddling with stacking, concatenating, putting in lists, etc, allowed me to pass more than one image into net.detect.
yep it does not accept batches. ;(
(same problem as in the “low level” sample )
unfortunately, it’s not as simple as calling blobFromImages() alone,
the whole post-processing (result parsing, NMS) has to be multiplied
one thing you might want to try is profile your batch code there against multiple forward() calls with a single images (testing all backends/targets you have)
I did profile it. The backends I have available to me are opencv and openvino, and the targets are cpu, intel integrated gpu 32 bits, and intel integrated gpu 16 bits. On cpu, openvino is faster and batch size doesn’t matter. On gpu, to my knowledge, openvino is required. 16 bits is faster, and throughput is significantly faster with batch size > 1. For example, the throughput on the images and model I am using is 70 ms/image at batch size 1 (on intel gpu with 16 bit inference) and is 50 ms/image at batch size 3.