SURF_CUDA performance

Helios · October 6, 2022, 12:54am

Using the following code:

const int N = 1000;

void test_surf_performance1(){
	auto surf = SURF::create();
	surf->setUpright(true);
	Mat src = imread("aloe.png", IMREAD_GRAYSCALE);
	std::uint64_t sum = 0;
	volatile auto t0 = std::chrono::high_resolution_clock::now().time_since_epoch().count();
	for (int i = N; i--;){
		std::vector<KeyPoint> keypoints;
		std::vector<float> descriptors;
		surf->detectAndCompute(src, Mat(), keypoints, descriptors);
		sum += descriptors.size();
	}
	volatile auto t1 = std::chrono::high_resolution_clock::now().time_since_epoch().count();
	std::cout << sum << std::endl;
	std::cout << (t1 - t0) * hrc / N << std::endl;
}

void test_surf_performance2(){
	SURF_CUDA surf;
	surf.upright = true;
	surf.extended = false;
	GpuMat img_gpu;
	GpuMat keypoints_gpu;
	GpuMat descriptors_gpu;
	auto img = imread("aloe.png", IMREAD_GRAYSCALE);
	img_gpu.upload(img);
	std::uint64_t sum = 0;
	volatile auto t0 = std::chrono::high_resolution_clock::now().time_since_epoch().count();
	for (int i = N; i--;){
		surf(img_gpu, GpuMat(), keypoints_gpu, descriptors_gpu);
		std::vector<float> descriptors;
		surf.downloadDescriptors(descriptors_gpu, descriptors);
		sum += descriptors.size();
	}
	volatile auto t1 = std::chrono::high_resolution_clock::now().time_since_epoch().count();
	std::cout << sum << std::endl;
	std::cout << (t1 - t0) * hrc / N << std::endl;
}

the measured time is 4.34 ms per call for the CPU and 2.13 ms per call for the GPU. Commenting out downloadDescriptors() makes a difference of 0.1 ms.

Topic		Replies	Views
CUDA: SIFT or SURF, disappointed by execution timings cuda	6	3510	December 29, 2022
SURF CUDA problems with feature detection C++ cuda	0	470	March 11, 2021
SURF/SIFT VRAM requirements C++ cuda	0	271	September 27, 2022
OpenCV Optical Flow Cuda Naiva Implementation Slower then CPU Python cuda	3	389	April 4, 2024
SURF GPU with FLANN? Python flann , surf , gpu , cuda	15	2771	May 4, 2021

SURF_CUDA performance

Related topics