When I using cuda::SparsePyrLKOpticalFlow in multi thread application failed to calc optical flow properly.
I run it on different cuda::GpuMats and in separate cuda::Streams with separate Algorithm instances.
I create new instance with cuda::SparsePyrLKOpticalFlow::create() method In each thread but have same problem.
If you are getting different results when using seperate threads that would point to some global memory usage (this used to be common in functions which called CUDA NPP under the hood). The other possibility could be that you are not calling waitForCompletion()
on your stream following cuda::SparsePyrLKOpticalFlow::calc
.
In either case it might be worth testing with the default stream (cuda::Stream::Null()) or without streams (which should have the same effect) to see if you get constistent results accross threads. If not it may be cause by something else.
Thank you for your response.
I use waitForCompletion() after call calc() method and use separate stream in each thread.
It could be a bug ?
like this bug CUDA GoodFeaturesToTrackDetector is not ThreadSafe ? · Issue #18051 · opencv/opencv · GitHub
here is my code run in separate threads.
cuda::GpuMat GpuFrame;
GpuFrame.upload(frame, mForwardOpticalFlowStream);
cuda::GpuMat GpuLastFrame;
GpuLastFrame.upload(mLastFrame, mForwardOpticalFlowStream);
cuda::GpuMat gOldFeaturePoints, gNewFeaturePoints, gStatus, gErrors;
trackerUtils::uploadVector(allFeaturePoints, gOldFeaturePoints, CV_32FC2, mForwardOpticalFlowStream);
mSparseLK->calc(GpuLastFrame, GpuFrame, gOldFeaturePoints, gNewFeaturePoints, gStatus, gErrors, mForwardOpticalFlowStream);
trackerUtils::downloadVector(gStatus, status, CV_8UC1, mForwardOpticalFlowStream);
trackerUtils::downloadVector(gErrors, errors, CV_32FC1, mForwardOpticalFlowStream);
trackerUtils::downloadVector(gNewFeaturePoints, newFeaturePoints, CV_32FC2, mForwardOpticalFlowStream);
mForwardOpticalFlowStream.waitForCompletion();
cuda::GpuMat GpuFrameBackward;
GpuFrameBackward.upload(frame, mBackwardOpticalFlowStream);
cuda::GpuMat GpuLastFrameBackward;
GpuLastFrameBackward.upload(mLastFrame, mBackwardOpticalFlowStream);
cuda::GpuMat gNewBackwardFeaturePoints, gBackwardStatus, gBackwardErrors;
mSparseLKBackward->calc(GpuFrameBackward, GpuLastFrameBackward, gNewFeaturePoints, gNewBackwardFeaturePoints, gBackwardStatus, gBackwardErrors, mBackwardOpticalFlowStream);
trackerUtils::downloadVector(gBackwardStatus, backwardStatus, CV_8UC1, mBackwardOpticalFlowStream);
trackerUtils::downloadVector(gBackwardErrors, backwardErrors, CV_32FC1, mBackwardOpticalFlowStream);
trackerUtils::downloadVector(gNewBackwardFeaturePoints, newBackwardFeaturePoints, CV_32FC2, mBackwardOpticalFlowStream);
mBackwardOpticalFlowStream.waitForCompletion();
That doesn’t look like a bug, the routine wasn’t thread safe (the algorithm was using global memory as I mentioned above) until the author submitted this PR.
Are mSparseLK
and mSparseLKBackward
created inside each thread?
I’ll try to take a look later.
mSparseLK
and mSparseLKBackward
are both in one thread
Do you create a seperate instance of them in each thread, or create one instance of each and pass those to each thread?
I create a separate instance of them in each thread
Just checked SparsePyrLKOpticalFlow
is definitely not thread safe. It use constant and texture memory which are both globally defined
There appear to be versions which don’t use texture memory for both short
and int
types,
so I would guess but cannot confirm that these should be thread safe as long as you use the same window size and number of iterations per thread.
So in conclusion you could try short
or int
if applicable, re-write the routine to avoid textures or maybe use texture objects, use cuda::Stream::Null() which will hurt performance of maybe see how slow Nvidia’s hardware accelerated dense optical flow is for your problem.
I use same win size & iteration in all threads.
what is short
and int
types ? pixel data type ?
Yes pixel data type.
I would start by confirming that using the default stream cv::cuda::Stream::Null()
gives you the same result in each thread. If so see if the execution time of that is acceptable. If not then try different data types.
(post deleted by author)
Yes problem resolved by convert pixel format from CV_8U
to CV_16U
frame.convertTo(shortFrame, CV_16U);