The Models files are all the same onnx file copied.
I did run a version of the script using a single onnx model file used by all threads and got the same results. I wondered if there was some type of lock on the model file which may have caused the unexpected result but no change occurred when using duplicate files.
I have just tried changing the test to avoid OpenCV and use a simple sleep with a new runjob() of;
def runjob(i):
time.sleep(0.01)
print("job exe time=", (time.time() - start)*1000,"ms")
This gave the output of;
job exe time= 10.088205337524414 ms
job exe time= 20.206689834594727 ms
job exe time= 30.312061309814453 ms
job exe time= 40.41624069213867 ms
total serial exe time= 40.44032096862793 ms
job exe time= 23.56123924255371 ms
job exe time= 23.647069931030273 ms
job exe time= 23.77796173095703 ms
job exe time= 23.76723289489746 ms
total parallel exe time= 24.188995361328125 ms
This result is what I would expect, however it hints at the problem where there is an added overhead with python setting up the threading. For a simple time.sleep() it adds 13ms to the execution of the job.
By modifying my original script and putting a sleep into the jobrun() eg;
def runjob(i):
time.sleep(1)
prob, out = processimg(models[i], images[i])
class_id = np.argmax(prob)
print("predict=",class_names[class_id], " confidence=", out[class_id]*100,"%"," exe time=",
(time.time() - start)*1000,"ms")
This results in the following;
predict= none confidence= 98.39194416999817 % exe time= 1010.5865001678467 ms
predict= has confidence= 99.57807064056396 % exe time= 2023.1900215148926 ms
predict= none confidence= 95.81201076507568 % exe time= 3034.2249870300293 ms
predict= back confidence= 99.99748468399048 % exe time= 4048.3639240264893 ms
total serial exe time= 4048.428535461426 ms
predict= none confidence= 98.39194416999817 % exe time= 1070.512056350708 ms
predict= back confidence= 99.99748468399048 % exe time= 1070.6267356872559 ms
predict= none confidence= 95.81201076507568 % exe time= 1071.1612701416016 ms
predict= has confidence= 99.57807064056396 % exe time= 1072.0620155334473 ms
total parallel exe time= 1072.5388526916504 ms
So clearly the issue is the overhead python is adding to do the threading and is adding around 60ms of execution time for setup.