I wanted to make a suggestion. I’m trying to apply template matching with matchTemplate in opencv with python, using mostly TM_SQDIFF method.
However, I can’t get sharp enough matches with this method. I would suggest if we can control the exponent of the calculation, that would be amazing. It would enable much sharper template matching if we used, for example, 4 or 8 in the exponent. As following:

In TM_SQDIFF: R(x,y)=∑x′,y′((T(x′,y′)−I(x+x′,y+y′))⋅M(x′,y′))^2

In my case: R(x,y)=∑x′,y′((T(x′,y′)−I(x+x′,y+y′))⋅M(x′,y′))^(custom_exponent)

that would change the norm from L2 towards L-infinity, which shouldn’t change the shape of the response peaks (so not what you want), but it should change the math fundamentally.

if you just wanted to apply an extra squaring on the response array, that wouldn’t affect its monotonicity.

what you need is to find local maxima. non-maximum suppression can be constructed from a grayscale dilation and equality comparison.

Thanks for the response.
I have implemented what I was refering to with cython, but with naive implementation which is too slow. I guess opencv implements sqdiff in fourier space, which is much faster (though less accurate). My suggestion was if there was a way to implement with a custom exponent in fourier space.

Weighting individual error distances with a higher exponent gives much better results.