OpenCV supported CUDA calcCovarMatrix

I would try something similar to

There may be more efficient fused ways to calculate the intermediate matrix with the mean subtracted but as most of the computation will be in the calculation of the covariances (gemm) it probably wouldn’t make any significant difference.