Opencv4 on arm64,a problem with using UMat

I want to use the GPU to speed up the code and then try to use UMat.It’s right at first, then after dozens of times, it slowed down,why?

CODE:

int OclTest(void )
{
int size = 2000;
Mat mat(size, size, CV_8UC1, cv::Scalar(0));
Mat matout(size, size, CV_8UC1, cv::Scalar(0));
UMat uMat(size, size, CV_8UC1, cv::Scalar(0));
UMat uMatout(size, size, CV_8UC1, cv::Scalar(0));

while(1)
{
    double timeMat = 0;
	//while(0)
    {
        TickMeter tm;
        tm.start();
 
        
        //for(int i=0;i<10;i++)
        {
        	cv::threshold(mat, matout, 127, 255, cv::THRESH_BINARY);
        }
 
        tm.stop();
        timeMat = tm.getTimeMilli();
    }
 

    double timeUMat = 0;
	//while(1)
    {
        TickMeter tm;
        tm.start();

		//for(int i=0;i<10;i++)
        {
        	cv::threshold(uMat, uMatout, 127, 255, cv::THRESH_BINARY);
	}
 
        tm.stop();
        timeUMat = tm.getTimeMilli();
    }
    cout << "Time using Mat: " << timeMat << " ms" << std::endl;
    cout << "Time using UMat: " << timeUMat << " ms" << std::endl;
}


mat.release();
matout.release();
uMat.release();
uMatout.release();
return 0;

}

Platform:

Print:

[ WARN:0@6.942] global filesystem.cpp:489 getCacheDirectory Using world accessible cache directory. T his may be not secure: /var/tmp/
Time using Mat: 1.419628 ms
Time using UMat: 2.664990 ms
Time using Mat: 1.458756 ms
Time using UMat: 0.333905 ms
Time using Mat: 1.522763 ms
Time using UMat: 0.464917 ms
Time using Mat: 1.825664 ms
Time using UMat: 0.306903 ms
Time using Mat: 1.979428 ms
Time using UMat: 0.321904 ms
Time using Mat: 1.796787 ms
Time using UMat: 0.353782 ms
Time using Mat: 2.060436 ms
Time using UMat: 0.340781 ms
Time using Mat: 1.527763 ms
Time using UMat: 0.386285 ms
Time using Mat: 1.727906 ms
Time using UMat: 0.323654 ms
Time using Mat: 1.495510 ms
Time using UMat: 0.443415 ms
Time using Mat: 1.594269 ms
Time using UMat: 0.356657 ms
Time using Mat: 1.654524 ms
Time using UMat: 0.420413 ms
Time using Mat: 1.566516 ms
Time using UMat: 0.351532 ms
Time using Mat: 2.014806 ms
Time using UMat: 0.357908 ms
Time using Mat: 1.463882 ms
Time using UMat: 0.354907 ms
Time using Mat: 1.495760 ms
Time using UMat: 0.450666 ms
Time using Mat: 1.766784 ms
Time using UMat: 0.310403 ms
Time using Mat: 1.883670 ms
Time using UMat: 0.341906 ms
Time using Mat: 1.572766 ms
Time using UMat: 0.366158 ms
Time using Mat: 1.481384 ms
Time using UMat: 0.436414 ms
Time using Mat: 1.426254 ms
Time using UMat: 0.351531 ms
Time using Mat: 1.641773 ms
Time using UMat: 0.524922 ms
Time using Mat: 1.568892 ms
Time using UMat: 0.344781 ms
Time using Mat: 1.743907 ms
Time using UMat: 0.347656 ms
Time using Mat: 1.569267 ms
Time using UMat: 0.362782 ms
Time using Mat: 2.155444 ms
Time using UMat: 0.320654 ms
Time using Mat: 1.813038 ms
Time using UMat: 0.381409 ms
Time using Mat: 1.495760 ms
Time using UMat: 0.422913 ms
Time using Mat: 1.921299 ms
Time using UMat: 0.365283 ms
Time using Mat: 1.553140 ms
Time using UMat: 0.313904 ms
Time using Mat: 1.476508 ms
Time using UMat: 0.332155 ms
Time using Mat: 1.714529 ms
Time using UMat: 0.380535 ms
Time using Mat: 1.485133 ms
Time using UMat: 0.347156 ms
Time using Mat: 1.620646 ms
Time using UMat: 0.346656 ms
Time using Mat: 1.466257 ms
Time using UMat: 0.407162 ms
Time using Mat: 1.537263 ms
Time using UMat: 0.316528 ms
Time using Mat: 1.627521 ms
Time using UMat: 0.316404 ms
Time using Mat: 1.564766 ms
Time using UMat: 0.364407 ms
Time using Mat: 1.600644 ms
Time using UMat: 0.352907 ms
Time using Mat: 1.564891 ms
Time using UMat: 2.377965 ms
Time using Mat: 1.521637 ms
Time using UMat: 4.548035 ms
Time using Mat: 1.571516 ms
Time using UMat: 4.390145 ms
Time using Mat: 1.638272 ms
Time using UMat: 3.435310 ms
Time using Mat: 1.760534 ms
Time using UMat: 4.451526 ms
Time using Mat: 1.661775 ms
Time using UMat: 3.688832 ms
Time using Mat: 1.945300 ms
Time using UMat: 3.412308 ms
Time using Mat: 1.687402 ms
Time using UMat: 4.822185 ms
Time using Mat: 2.090813 ms
Time using UMat: 3.409682 ms
Time using Mat: 1.643398 ms
Time using UMat: 3.396806 ms
Time using Mat: 1.608520 ms
Time using UMat: 4.376019 ms
Time using Mat: 1.621147 ms
Time using UMat: 4.536783 ms
Time using Mat: 1.674026 ms
Time using UMat: 3.408307 ms
Time using Mat: 1.748782 ms
Time using UMat: 4.660920 ms
Time using Mat: 1.637273 ms
Time using UMat: 3.423308 ms
Time using Mat: 1.995555 ms
Time using UMat: 4.399272 ms
Time using Mat: 1.526887 ms
Time using UMat: 4.458152 ms
Time using Mat: 1.668400 ms
Time using UMat: 3.605450 ms
Time using Mat: 1.691527 ms
Time using UMat: 4.488155 ms
Time using Mat: 1.792037 ms
Time using UMat: 3.444185 ms
Time using Mat: 1.583017 ms
Time using UMat: 3.464563 ms
Time using Mat: 2.003930 ms
Time using UMat: 5.407988 ms
Time using Mat: 1.657900 ms
Time using UMat: 3.494690 ms
Time using Mat: 1.713155 ms
Time using UMat: 3.405057 ms
Time using Mat: 1.724905 ms
Time using UMat: 4.429024 ms
Time using Mat: 1.669650 ms
Time using UMat: 3.424684 ms
Time using Mat: 1.879670 ms
Time using UMat: 4.409147 ms
Time using Mat: 1.736031 ms
Time using UMat: 3.397431 ms
Time using Mat: 1.535389 ms
Time using UMat: 4.377894 ms
Time using Mat: 1.565016 ms
Time using UMat: 4.416898 ms
Time using Mat: 1.543139 ms
Time using UMat: 4.439525 ms
Time using Mat: 1.690777 ms
Time using UMat: 4.421274 ms

your computer’s CPU and GPU are getting hot, then get throttled?

it’s common for processing units to “sprint” into their thermal limits.

you might want to plot these times. makes them understandable.

Think you. My embedded linux does not have a cpu temperature tool,So I touched the CPU(The gpu is in the same chip) and there was no noticeable change in temperature. In the above printed information, “Time using Mat:” is CPU runtime, “Time using UMat:” is GPU runtime,only GPU runtime slowed down and CPU runtime keep in 1.7ms always. So I think maybe it’s not the chip thermal limits