Hey,

So I was implementing a 3x3 median filter from scratch in c++. I am using a naive approach of recalculating a vector of the neighbouring pixels and then sorting them. I know there are a lot of faster algorithmic approaches. However my approach takes 3000 ms for an image of size 2000x2000px. If i use cv::medianBlur it only takes 12ms. Even if it doesnt have to read and sort the 9 values for each pixel I dont understand how a speedup by the factor of nearly 300 is possible?

Are there some other optimizations internally going on like utilizing GPU or multithreading that explain thi speedup?

- OpenCL (that means GPUs)
- threading
- SIMD (vector instructions)
- use of â€śintrinsicsâ€ť
- special cases (simple cases like 3x3) receive special kernel code the compiler can make better instructions from

all of those â€śonlyâ€ť contribute a *factor* of speedup to the execution time.

*algorithmic aspects* can beat all of the above. â€śtime complexityâ€ť may be taught in some high school AP programming classes, and it is first year CS material. you may have heard of â€śbubble sortâ€ť and how thatâ€™s O(nÂ˛) while other sorting algorithms are O(n log n).

your description sounds like you implemented a â€śnaiveâ€ť (obvious) algorithm. That would be O(rÂ˛ log r)* per output pixel, and whatever sorting algorithm you apply might have some fixed (setup) cost per call.

I just found this: https://nomis80.org/ctmf.pdf right in the first picture it presents the Huangâ€™79 algorithm thatâ€™s O(r). and then it presents an algorithm thatâ€™s O(1), albeit using a good amount of memory (CPU cache).

*) the area is rÂ˛, and sorting is n log n, and with n=rÂ˛, you get rÂ˛ log rÂ˛, which is rÂ˛ 2 log r (IIRC), and in time complexity, constant factors donâ€™t matter, so thatâ€™s then rÂ˛ log r