Per-pixel operations in Python

I’m working on some image processing algorithm in Python where I need to do some per-pixel operations (i.e. I can’t solve it with matrix operations)*.

The algorithm was extremely fast in C++, but it takes an eternity in Python.

I also made a quick test to compare the speed of a simple operation ( image = image+2 ) with matrix operations and compared it with iterating through the image (2 for loops) and it is over 1000 times slower!!!

What are the possibilities to speed it up?

*It’s a kind of Hough transform, so I need to manipulate the pixel value, its neighborhood and the coordinates… so I don’t really see other solution than to iterate through the image with 2 for loops along the X and Y axis and to access the pixels using image[x,y] .

look into cython or pyopencl.

cython will allow you to compile python. it works best if you apply cython’s static typing. they have good documentation.

pyopencl obviously gives you access to OpenCL.


for loops in python are in lot of cases a bad idea because of performance (like in Matlab) and usually it’s not the pythonic way. Python is like Matlab a vector based script language (see some examples here: If you would like do to it in a pythonic way have a look on map, filter, and reduce, lambda: or

Btw., nice to see the new OpenCV forum.


1 Like

[I’m also grateful to the new forum as it allows extended discussions…]

I’ll look into @volkmar.wieser’s suggestion, and I find interesting @crackwitz’s idea of Python/C++ interoperation, especially as I already have several algorithms implemented in C++ that I’d like to reuse in Python.

There seems to be a more direct way to create Python modules from C++ code using OpenCV’s bindings. I found this tutorial in the docs about expanding Python OpenCV with my own modules: (second part). Unfortunately it’s a very cryptic and incomplete description (it’s really far from being a tutorial), I didn’t understand it even with 6+ years of OpenCV experience.

I found this article on the same subject, but I couldn’t make it work, it seems outdated - but it’s more or less the idea I would like to achieve.

Can someone explain how to do this?

1 Like

you can take a look at a sample to add a custom function in a new module to OpenCV. Or another way is you can add your functions in existing modules such as imgproc etc.

It should be pretty simple:

  • First, you create an OpenCV module with WRAP python parameter (e.g. and compatible layout. Add it to the compilation process using OPENCV_EXTRA_MODULES_PATH cmake parameter which accepts lists of folders.
  • Second, you mark functions and classes with CV_EXPORTS_W and other macros and use InputArray, OutputArray and other types known by the wrappers as parameters. The tutorial describes these macros pretty well although supported types are not clearly documented.

Thanks @sturkmen and @mshabunin! I think we’re getting closer! (I didn’t know that you can add a list of folders as OPENCV_EXTRA_MODULES_PATH)

It’s clear that building my module as a part of the OpenCV build process is the most straightforward solution, but I still wonder if it’s possible to build my module separately (for simpler modifications and redistribution)?

map/reduce/filter will be worthless for image manipulation. not just because they’re the wrong APIs but because they still base their actions on python code.

the pythonic way is to use the library functions given by numpy and OpenCV, which do the job in compiled, optimized code and also parallelized when sensible.

I think it’s a bad idea to consider making an “OpenCV module” for your application-specific code. application code simply doesn’t belong there.

you can write a python module in C/C++ and use that from python, beside OpenCV. you can also use OpenCV’s C++ API in your module. since writing python extension modules is a little difficult to approach, cython was made.

Yes, it is possible, but your library will not be integrated into OpenCV package and will not be able to use same python wrapping mechanisms.

Thanks for all the ideas, guys! @crackwitz’s suggestions were particularly helpful.

I tested most of the methods; here is a wrap-up:

  • Map/reduce/filter: not really applicable. Most of the time the images aren’t reduced, and often we need to manipulate array indexes, which is impossible with these methods (or with other matrix operators)
  • Pyton wrapper for C code: very interesting solution, but unfortunately you need to create an OpenCV module - and putting application-specific code in a library is a bad idea. However it would be great if there was a simple wrapper to create a python header and a .so file from a C++ code
  • Cython - this is the best solution. The time-critical Python code gets translated to C (and binary code if necessary), and imported.

As I didn’t find any simple example on Cython/OpenCV, I’m attaching my simple testing code below. It is mostly based on this tutorial. Note that this is my first experiment, so probably it can still be optimized/simplified, butI still hope this can help!
My results on a 10MP photo: OpenCV: 0.003s; Numpy: 0.023s Python loops: 12.9s[!!!] Cython loops: 0.01s

import cv2
import numpy as np
import time

# import and compile cython code
import pyximport
import fastthreshold

def pythonthresh(gray):
    res = np.zeros(gray.shape, np.uint8)
    for y in range(gray.shape[0]):
        for x in range(gray.shape[1]):
            res[y, x] = 255 if gray[y, x] > th else 0
    return res

# Open file and convert to grayscale
filename = "IMG_02506.jpg" # change this
img = cv2.imread(filename)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
th = 128

# OpenCV thresholding
t1 = time.time()
res1 = cv2.threshold(gray, th, 255, cv2.THRESH_BINARY)
t2 = time.time()
print(" ------ CV2 thresholding: %s seconds -------" % (t2-t1))

# Numpy thresholding
# probably can be optimized, the multiplication takes time
res3 = (gray > th) * 255
t3 = time.time()
print(" ----- Numpy thresholding: %s seconds ------" % (t3-t2))

# iterating through the array using for loops; function above
res2 = pythonthresh(gray)
t4 = time.time()
print(" ---  Per pixel thresholding: %s seconds ---" % (t4-t3))

# fast iteration using cython
out = np.zeros(gray.shape, np.uint8)
fastthreshold.fastthreshold(th, gray, out)
t5 = time.time()
print(" ---- Cython thresholding: %s seconds ------" % (t5-t4))


#cython: language_level=3

cimport cython


cpdef fastthreshold(int th, unsigned char[:,:] gray,unsigned char[:,:] output):
    cpdef int x,y
    for y in range(gray.shape[0]):
        for x in range(gray.shape[1]):
            output[y,x] = 255 if gray[y,x]>th else 0
1 Like