I would like some help to understand the correct workings of the Cuda Fast Feature Detector.
After instantiating an object of this class I can detect keypoints with the function
detect with arguments (image ,vector). (I suppose the image is GpuMat). However I have been told here that this operation downloads the keypoints and processes them on the CPU.
I am interested in doing one more operation in the GPU (I am starting to code the kernel) that needs the keypoints. I suppose I can transform the vector (not really sure how to do that, but that is not the point of this question), upload it to the device and work with that but all this transfer from host <-> device seems unnecessary.
So is there a way to perform the Fast Detection and have the keypoints still on device memory??
I was recommended to use detectAsync(). How does this work? And, is it really “async”? (that might complicate things)
detectAsync has as arguments (InputArray, OutputArray) so I guess the second one is a GpuMat of keypoints? I suppose that is still in the device memory?
Perhaps I can use that but why is it Async?
My idea would be something like
cuda::GpuMat image, keypoints;
image.upload(cpuimage);
Cudadetector->detectAsync(image, keypoints);
int some_result=my_own_process << <N,1>> >(image, keypoints);
As you can see, after the detect I plan to call a Cuda kernel with the keypoints and I assume that my_own_process
will run after detecAsync. The “Async” part alarms me a bit.
Is there other way to do this?