What you need is a fast and efficient template matching algorithm, and I just implemented one.
Here is the visual effect:
Transforming c++ code to python may be an issue for you, but finding all the cv::function and change them is enough. All details are on this github