What cuda stream do OpenCV::Cuda functions use?


Since any CUDA kernel is organized into one cuda stream, be it the default CUDA stream (id 0) or a user-created stream.

So i’m wondering when OpenCV performs a function what CUDA stream does it use? Does it create a new random CUDA stream? Or does it just use the default CUDA stream 0?


If you don’t specify the stream the default stream is used.

Be aware that if you don’t specify the stream cudaDeviceSynchronize() will be called internally in each function as OpenCV assumes when you don’t explicitely pass a stream you want the funcitons to be synchronous with respect to the host.

Also be aware that not all CUDA functions which accept the stream parameter are completely asynchronous with respect to the host. Some require internal synchronization for intermediate results.

1 Like

@cudawarped Thank you for your answer.
If I specify a cv::cuda::stream object into cv::cuda functions, which makes them asynchronous, how do I synchronize the stream manually using cudaStreamSynchronize()? What would be the argument to supply for that function?

Or does cv::cuda::stream::waitForCompletion() have the same effect?

Exactly you want to call waitForCompletion() on your stream object to synchronize with the host.

In my last post I meant cudaDeviceSynchronize (not cudaStreamSynchronize) is called internally when the stream argument is omitted.

1 Like

@cudawarped Thank you so much. That answers my original question.

However, I don’t understand why cudaDeviceSynchronize is needed since you only have to synchronize the default stream.

The default stream is implicit so there is not stream object to call waitFroCompletion() on.

@cudawarped Hmm. Then I think adding a stream called procStream and then performing procStream.waitForCompletion() are more advantageous. Non-stream functions would place everything on stream 0 and perform a cudaDeviceSyncrhonize which is rather expensive.

Exactly, my advise would be to always explicitley pass in the stream you want to use to avoid calls to cudaDeviceSynchronize.

1 Like

Thank you very much for a really helpful discussion.