Exactly you want to call waitForCompletion()
on your stream object to synchronize with the host.
In my last post I meant cudaDeviceSynchronize
(not cudaStreamSynchronize
) is called internally when the stream argument is omitted.
Exactly you want to call waitForCompletion()
on your stream object to synchronize with the host.
In my last post I meant cudaDeviceSynchronize
(not cudaStreamSynchronize
) is called internally when the stream argument is omitted.