Asynchronous feature detection not running asynchronously

ChrisN · November 9, 2022, 11:53am

I have been experimenting with the asynchronous feature detection calls, but when I time them it seems they still block the CPU for a significant amount of time.

I tried to detect ORB features similar to the below:

void detectFeatures(cv::cuda::GpuMat& greyscaleImage)
{
    cv::Ptr<cv::cuda::ORB> orb = cv::cuda::ORB::create(300, 1.2f, 8, 31, 0, 2, 0, 31, 20, true);
    cv::cuda::Stream stream;
    cv::cuda::GpuMat keypoints, descriptors;
    m_rmatcher->getFeatureDetector()->detectAndComputeAsync(greyscaleImage, cv::noArray(), keypoints, descriptors, false, stream);
    stream.waitForCompletion();
}

I would expect the asynchronous call to defer most of the CPU time to stream.waitForCompletion() , however only around 2 ms is spent in that line, with detectAndComputeAsync still taking around 12 ms.

I tried separating the calls into detectAsync and computeAsync, and it looks like the blocking time is mostly spent in the detection part. I also tried FAST feature detection, and found a similar issue, with the majority of time being spent on the CPU. Turning off nonmaxsupression helped reduce the blocking time, but there doesn’t seem to be an option to modify this in the ORB detector and this might reduce the quality of the features.

I’ve tried various things such as running multiple times, changing various options and preallocating the memory for the keypoints and descriptors, however nothing seems to help.

Is there else I can look at in my setup, or is this expected and these functions are not really asynchronous?

cudawarped · November 9, 2022, 2:50pm

I think the calling the function detectAndComputeAsync is a little misleading. I would guess it is because the CUDA feature detectors inherit from Feature2DAsync which have these method names or it is because at some point Async in the name meant that the function took CUDA streams. Anyway the function requires calls to cudaStreamSynchronize throughout which is causing the host side delay you are seeing.

ChrisN · November 10, 2022, 10:53am

Thanks for the explanation, good to know that this is expected.

crackwitz · November 10, 2022, 8:57pm

related: c++ - OpenCV asynchronous ORB feature detection function is blocking on the CPU - Stack Overflow

Topic		Replies	Views
Fast feature detection in CUDA C++ cuda	1	850	June 10, 2021
Different match results for each orb cuda detection Python cuda	2	408	February 11, 2021
Opencv cuda stream optimisation C++ cuda	1	1093	August 18, 2022
CUDA Fast detector much slower than normal FAST performance , cuda , practical	9	2501	May 28, 2021
CUDA flag to create a cv::cuda::Stream that supports asynchronous calls C++ gpu , cuda	1	1143	July 20, 2022

Asynchronous feature detection not running asynchronously

Related topics