I did profile it. The backends I have available to me are opencv and openvino, and the targets are cpu, intel integrated gpu 32 bits, and intel integrated gpu 16 bits. On cpu, openvino is faster and batch size doesn’t matter. On gpu, to my knowledge, openvino is required. 16 bits is faster, and throughput is significantly faster with batch size > 1. For example, the throughput on the images and model I am using is 70 ms/image at batch size 1 (on intel gpu with 16 bit inference) and is 50 ms/image at batch size 3.
1 Like