Very simple frame grabing loop consumes way to much CPU load

Hi all,
I have a problem with OpenCV and I hope I can solve the problem with the help of this forum.

I have a very simple loop, grabbing a frame and processing the frame. The problem is that the overhead of this loop seems to consume almost a whole CPU core! So something most be wrong.

I’m working on a Jetson Nano board and connected an USB camera. I isolated and measured the problem as follow:

I run a C++ program isolated on one CPU core. No other processes will run on this core and only my C++ program will run on it. This is the main() of that program:

int main(int argc, char **argv)
{
   setupPin12();
	
   VideoCapture cap1;
   Mat CameraFrame1;

   cap1.open(0);
   if (!cap1.isOpened())
   {
       cout << "***Could not initialize capturing...***\n";
       return -1;
   }
   cap1.set(cv::CAP_PROP_FRAME_WIDTH, 640);
   cap1.set(cv::CAP_PROP_FRAME_HEIGHT, 480);

   auto start = high_resolution_clock::now();

   int sum = 0;
   int frame = 0;
   for (;;)
   {
       cap1 >> CameraFrame1;

       if (CameraFrame1.empty())
           break;

       SET_PIN12

       // dummy processing
       for (int i=0 ; i<2000000 ; i++)
       {
           sum += i;
       }

       CLR_PIN12

       frame++;
       if (frame == 100) {
           auto stop = high_resolution_clock::now();
           auto duration = duration_cast<microseconds>(stop - start);
           cout << duration.count()/100 << endl;
           frame = 0;
           start = stop;
       }

       // Wait for Escape keyevent to exit from loop
       char keypressed = (char)waitKey(10);
       if (keypressed == 27)
           break;
   }

   cout << sum << endl;

   cap1.release();

   return 0;
}

So mainly a loop that grabs the frame and does some dummy processing.
I can change the processing time by changing the loop count here.
Just before the loop I make pin12 of the Jetson board high and after the processing I make pin12 low (using Macros that directly write to the GPIO registers). I measure the processing time using an oscilloscope.

The frame time is measured by measuring the 100 frames and calculating the mean.
Note that the camera frame rate will be 60fps and the resolution is set to 640x480.

So this is what I observe:

When the processing time is below 3.8 ms, I get a frame rate of 15.7 ms.
But when processing time is higher I see that the frame rate increases. Almost with the same amount as the processing time is increased.

So my conclusion is that the CPU core is for about 75% busy with something. Once the other 25% is filled with the dummy processing, the core is 100% loaded and frame rate increases.
But why does the frame grabing loop takes 75% of a CPU core (that runs at 1.5 GHz)?

I hope anybody can point me into the right direction to solve this problem.

I found out that there is a conversion from 8 bit grey to rgb going on that results in the high CPU load. I have no idea why this is, but that is topic for an other thread.