Looking for various advice as to improving performance of some python code

  1. To make a faster python program with loops and image processing, I am using Opencv…to best utilize Opencv…should I keep the sourcecode with for loops or should I make them converted into NUMPY single line operations using Numpy.Vectorize, map and lambda ?

  2. Will it work if I mix Cupy, Opencv and NUMBA for gpu operation for a single function?

  3. How to make below three lines of file processing be done parallely using Opencv:

for f in os.listdir(in_dir): 
      filename = f.split(".")[0] 
       generate_stereo(in_dir, depth_dir, depth_prefix, out_dir, filename)

listdir() can’t run faster and it can’t be parallelized. Besides it doesn’t take so much time.

Python doesn’t have built-in parallelization in for-loop.

I don’t know if numpy vectorization can help when you have to execute function which is not part of numpy package.

The only idea is to use standard module threading, multiprocessing (or external module like ray, joblib or pyspark) to run generate_stereo() in separated threads or processes. But it needs some more code which I skip.

Other idea: if you have images in few folders then you can run this script many times with differen folders. And then all scripts will work at the same time.

You may start all scripts manually or you can use some script or tool to start them at the same time.

On Linux you can create simple bash script with & at the end of every process to start it in background and then it may starts another processes.

Or you can use program parallel for this

parallel script.py folder1 folder2 folder3

and it should run it like

script.py folder1 &
script.py folder2 &
script.py folder3 &

Thank you so much… I am trying with joblib