Model Parallelism/Multithreading DNN Model is slower than sequential execution

what’s the difference between them ?

what about batching your imges, using
cv2.dnn.blobFromImages ?
(and try to use opencv’s internal (data-based) parallelization, instead of trying to wrap your own (thread/task based) around it)