So I was implementing a 3x3 median filter from scratch in c++. I am using a naive approach of recalculating a vector of the neighbouring pixels and then sorting them. I know there are a lot of faster algorithmic approaches. However my approach takes 3000 ms for an image of size 2000x2000px. If i use cv::medianBlur it only takes 12ms. Even if it doesnt have to read and sort the 9 values for each pixel I dont understand how a speedup by the factor of nearly 300 is possible?
Are there some other optimizations internally going on like utilizing GPU or multithreading that explain thi speedup?
- OpenCL (that means GPUs)
- SIMD (vector instructions)
- use of “intrinsics”
- special cases (simple cases like 3x3) receive special kernel code the compiler can make better instructions from
all of those “only” contribute a factor of speedup to the execution time.
algorithmic aspects can beat all of the above. “time complexity” may be taught in some high school AP programming classes, and it is first year CS material. you may have heard of “bubble sort” and how that’s O(n²) while other sorting algorithms are O(n log n).
your description sounds like you implemented a “naive” (obvious) algorithm. That would be O(r² log r)* per output pixel, and whatever sorting algorithm you apply might have some fixed (setup) cost per call.
I just found this: https://nomis80.org/ctmf.pdf right in the first picture it presents the Huang’79 algorithm that’s O(r). and then it presents an algorithm that’s O(1), albeit using a good amount of memory (CPU cache).
*) the area is r², and sorting is n log n, and with n=r², you get r² log r², which is r² 2 log r (IIRC), and in time complexity, constant factors don’t matter, so that’s then r² log r