that I need to use a per-thread cv::cuda:Stream object to safely call cv::cuda routines from multiple threads. However, I’m unable to confirm if I need to pass any CUDA flags when creating those Stream objects, if I want asynchronous calls. The doc under cv::cuda::Stream::Stream(const size_t cudaFlags) seems to indicate that I need to pass the cudaStreamNonBlocking flag. Is that correct?
Conversely, if I do not pass any flags, i.e., if I use the default constructor cv::cuda::Stream(), will all calls made using that Stream object be blocked?
Using cv::cuda::Stream does not guarantee any safety in multithreaded code. Depending on what you are doing you could use the same stream in multiple threads but you probably don’t want to. E.g. If you are processing n videos in the same way in n threads I would use n streams to run them “concurrently” (so they don’t block each other). However if you have a producer thread reading a video and a consumer thread processing it you would want to use the same stream in both threads. If you are doing this for n videos then you would probably want to use n streams with 2n threads.
No, you only need to pass that flag if you want your streams to concurrently with the default stream. If you are creating a stream per thread there won’t be a default stream so this is unlikely to be what you want. See below taken from the CUDA docs
Description
Creates a new asynchronous stream. The flags argument determines the behaviors of the stream. Valid values for flags are
cudaStreamNonBlocking: Specifies that work running in the created stream may run concurrently with work in stream 0 (the NULL stream), and that the created stream should perform no implicit synchronization with stream 0.