Blur is not a member of cv::cuda

are you measuring a single iteration of this ?
(kernels need to be compiled, caches warmed, etc)

are there more gpu ops in your pipeline ?
(up/downloading between cpu/gpu is expensive)