Mat.at() very slow writing to pixels

OCV 4.5; C++; Visual Studio 2017; MFC; Windows 10

I am having a timing problem that does not seem right and I believe that I must be doing something wrong.

I can go through a Mat image of color type HSV size 1920 X 1080 pixel by pixel using the standard code below in about 10ms

cv::Mat HSVImg(mMyImage.size(), mMyImage.type());
cv::cvtColor(mMyImage, HSVImg, CV_BGR2HSV);
for (int row = 0; row < HSVImg.size().height; row++)
	{
		for (int col = 0; col < HSVImg.size().width; col++)
		{
			cv::Vec3b value = HSVImg.at<cv::Vec3b>(row, col);
			// Do some math calculations and then assign the new value with 
			HSVImg.at<cv::Vec3b>(row, col) = value;  
		}	
	}
}

However, on a Mat of this type

cv::Mat mMaskImg(mMyImage.size(), CV_64F, Scalar(1));

with values only from 0 to 1 it takes 130ms to go through the pixels with this standard code

for (int i = 0; i < mMaskImg.rows; i++)
{
	for (int j = 0; j < mMaskImg.cols; j++)
	{
		// Do some math and then assing the value with 
		mMaskImg.at<double>(i, j) = temp_s; // temp_s is a double with a value between 0 and 1
	}
}

I have timed the lines containing the math and they are not the problem. The at() assignment is the problem.

I have tried to make the mMaskImage a CV_32F and use only floats in the math but when I try to assign the new value with

mMaskImg.at<float>(i, j) = temp_s;

it will crash each time no matter what I have tried.

My main problem is the speed. I tried the float only because I was wondering if dealing with 1920 X 1080 Mat of 32 bit floats would be faster than a Mat of 64bit doubles.

Regardless, can anyone explain why the mMaskImg.at<double>(i, j) is running 13 times slower than the HSVImg.at<cv::Vec3b>(row, col)?

PS the less than - float - greater than and less than - double - greater than are not coming out in the quoted lines or even here but you know what they should be

Thanks

please use ctrl-e to format code here, not “block quote”

apart from that, please be concise, what’s happening in your loops.
you should NOT write code like that (for loops), instead you should try to vectorize your ops, to make it fast (even on the CPU, and especially with large images)

I fixed your post’s formatting. you are supposed to use the CODE button, not the QUOTE button.

you should present a minimal reproducible example. as it is, this isn’t reproducible.

you will understand if we don’t take your word for it.

vec3b is three bytes. one double is eight bytes. that’s 2.7 times as much data. and it probably involves floating point math, not integer math (which is faster).

part of professionalism is canning the assertions and presenting the cold hard truth. assertions merely hide the issues. the issues always stem from faulty understanding. assertions imply that understanding can’t be questioned.

Sorry about that. I assume that is the preformatted text button. At least that is what it shows as when I hold my mouse over it.

Here is the program with 2 small subsidiary programs

double dist(cv::Point a, cv::Point b)
{
	return sqrt(pow((double)(a.x - b.x), 2) + pow((double)(a.y - b.y), 2));
}
double getMaxDisFromCorners(const cv::Size& imgSize, const cv::Point& center)
{
	std::vector<cv::Point> corners(4);
	corners[0] = cv::Point(0, 0);
	corners[1] = cv::Point(imgSize.width, 0);
	corners[2] = cv::Point(0, imgSize.height);
	corners[3] = cv::Point(imgSize.width, imgSize.height);

	double maxDis = 0;
	for (int i = 0; i < 4; ++i)
	{
		double dis = dist(corners[i], center);
		if (maxDis < dis)
			maxDis = dis;
	}

	return maxDis;
}

Here is the program. The original program has parameters passed to it so I modified it to show the memory variables.

Here are some memory variables that are publicly available to the function

auto aTotalTimeView = system_clock::now() - system_clock::now(); // get as close to zero as possible
auto aTotalTimeView_ms = duration_cast<milliseconds>(aTotalTimeView);
double dGradientRadius = 0.4; 
double dGradientPower = 0.9; 
int iCyclesView = 0;

//void generateGradient(cv::Mat& mask, cv::Point firstPt)
void generateGradient()
{
cv::Mat mask1(1080, 1920, CV_64F, Scalar(1));
cv::Point firstPt1 = cv::Point(500, 500);
double maxImageRad = dGradientRadius * getMaxDisFromCorners(mask1.size(), firstPt1);
// Start Timming clock
auto start1 = high_resolution_clock::now();  // Start Time
iCyclesView++;
	for (int iRows = 0; iRows < mask1.rows; iRows++)
	{
		for (int iCols = 0; iCols < mask1.cols; iCols++)
		{
			double temp = dist(firstPt1, cv::Point(iCols, iRows)) / maxImageRad;
			temp = temp * dGradientPower;
			double temp_s = pow(cos(temp), 4);
			mask.at<double>(iRows, iCols) = temp_s;
		}
	}
        // Gather timming data
	auto stop1 = high_resolution_clock::now();  // Stop Time
	auto duration1 = duration_cast<milliseconds>(stop1 - start1);
	aTotalTimeView_ms += duration1;
        // log average time if 100 cycles.
	if (iCyclesView > 100 && iCyclesView < 102) {
		float fAvgTime = aTotalTimeView_ms.count() / iCyclesView;
		CString csTime;
		csTime.Format(_T("1 generateGradient with double internal for loops i and j average time is %.2f ms \n"), fAvgTime);
		//log_write(csTime);
		//std::fflush(g_hLog);
                // log results for me were 
                // generateGradient with double internal for loops i and j average time is 118.00 ms
		Beep(2021, 1000);  // So I will know when to stop running the test.
	}

} // end function generateGradient