cv::boxFilter and cv::addWeighted are giving worst performance on GPU than CPU


In my vision algorithm, I am using opencv UMat based API’s.
API list are as follows: boxFilter, addWeighted, multiply, divide, subtract, compare and etc.

I am using Adreno GPU to execute above calls.

except boxFilter and addWeighted, other API’s are giving best performance around 1millisecs.
But boxFilter is giving 30milliseconds, addWeighted is giving 49milliseconds.

When I run boxFilter, addWeighted on CPU, these two are giving 7millisecoonds. Infact when I run these two API’s on Nvidia, Intel platforms, giving good performance around 1milliseconds.

Could you help us why its giving worst performance in GPU(Adreno 650) that too for only above two API’s in UMat.

Thanks and Regards
Rajesh Chanda


please post minimal but runnable code and necessary data to reproduce the issue.

float32_t srcAlpha = 0.5;
float32_t srcBeta = 0.5;
Mat opencvblenddstUMATToMat;
//using umat
UMat opencvblendinputUMAT1,opencvblendinputUMAT2,opencvblenddstUMAT;



cv::imwrite("/storage/emulated/0/opencvTesting/opencvblenddstUMATToMat.jpg", opencvblenddstUMATToMat);

Above code is for addWeighted i.e blending of two images

UMat opencvfilterinputUMAT,opencvfilterdstUMAT;
Mat opencvfilterdstUMATToMat;
Point point1 = Point(-1, -1);

boxFilter(opencvfilterinputUMAT, opencvfilterdstUMAT, -1, cv::Size(16,16), point1, true, BORDER_DEFAULT);

cv::imwrite("/storage/emulated/0/opencvTesting/opencvcvboxfilterUMATToMat.jpg", opencvfilterdstUMATToMat);

above code is for boxFilter