I do this inside each dnn algorithm class constructor,
cv::cuda::setDevice(cuda_id);
this->cuda_id = cuda_id;
net = cv::dnn::readNetFromCaffe(model_deploy, model_bin);
this->net.setPreferableBackend(cv::dnn::DNN_BACKEND_CUDA);
this->net.setPreferableTarget(cv::dnn::DNN_TARGET_CUDA);
and init them inside every thread’s run function. My PC has 4x3080 gpus. when the batch num is big (like >4 ) it crashed. However, It is working with batch =16 with single gpu, and running 4 independent single gpu process on different device is also working.
what else should to be done to running in one process with multiple threads?