Maximum template matching speed

What hardware upgrades would be ideal for maximizing speed when doing large numbers of template matching. Talking hundreds of thousands of images. Can gpus be used?

does it have to be template matching? would convolution/correlation work too?

the general matchTemplate has OpenCL kernels, so that might work on any (CPU and) GPU.

I know exactly what im looking for (part of a document)… other than noise caused by the quality of the document.

My accuracy is great, but just wanted to see how fast i could get it going. I assumed it was limited by the CPU. You think convolution/correlation might perform faster?

correlation and convolution are nearly the same operation, just mirrored.

convolution can be done in O(n log n) because it can be implemented as fourier transform and elementwise multiplication of the spectra.

opencv’s matchTemplate is calculating variants of correlation. the equations are detailed in the docs. it does a little more than regular correlation because some of the equations involve subtraction. because it can take a mask argument, it has to perform the naive algorithm which is O(n²) at least, if not worse.

again, matchTemplate can execute on GPUs (and so can almost everything else in OpenCV, including convolution).

in any case, more compute power (also memory bandwidth) would help of course.