Opencv on tbb creating one more thread than expected

Hi, folks!

I’m grappling with a tricky issue with opencv threading. I’m Fedora’s QA team lead, I maintain Fedora’s deployment of openQA, a screenshot match-based automated testing tool. openQA’s test runner backend, os-autoinst, uses OpenCV for low-level image stuff. It’s written in perl, but uses a small internal C++ library called “tinycv” to wrap OpenCV.

There have been issues in the past with os-autoinst processes blowing up due to a signal handling issue. The parent perl process can wind up sending SIGTERM or SIGCHLD signals to the OpenCV threads, which they don’t know how to handle.

So in os-autoinst we try to block these signals with a signal processing mask before spawning OpenCV threads, then unblock them again afterwards (so the parent process can receive these signals, which we need later). The current way we implement this is to set a sigprocmask, use cv::setNumThreads to set a cap on the number of threads, and then pre-create all those threads using a parallel_for_ loop, so they all have the sigprocmask set. Then we unset the sigprocmask. The code for pre-creating the threads is at os-autoinst/ppmclibs/tinycv_impl.cc at 6c67d2feff2af0c4d85e325eb2bee6dbdffa6562 · os-autoinst/os-autoinst · GitHub ; it’s called right after the sigprocmask is set, and the sigprocmask is unset right after it’s called.

In Fedora’s deployment, we use the system opencv, which is a build of 4.9.0. It is configured to use tbb and built against the system tbb, which is 2021.11.0.

On that deployment, we are frequently hitting a case where this doesn’t quite work right. The thread pre-creation loop works and the threads it creates have the mask set, but then, later (it seems to be about 30 seconds later, in the case I examined anyway), after we unset the sigprocmask, one more thread gets created by tbb. We assume this is an OpenCV thread, as we don’t think anything else is using tbb, but I can’t prove this yet as I have not managed to get the issue to reproduce one time when running under strace to trace the thread creation (which is another mystery, as it otherwise occurs fairly often). It doesn’t have the sigprocmask set, so it causes the crash if it receives a SIGTERM or SIGCHLD.

We cannot figure out what’s going on here. I’ve been poking at it in isotovideo frequently crashing in the signal handler stuff (especially on aarch64) · Issue #2549 · os-autoinst/os-autoinst · GitHub , but not getting very far. When I started digging into exactly what happens when we do cv::setNumThreads it seems to get pretty fuzzy; it seems like opencv and tbb both have caps, they may have separate hard and soft caps, and it’s not clear to me whether the count includes the parent process or not.

os-autoinst sets the cap to whichever of cv::getNumThreads() or cv::getNumberOfCPUs() - 1) is lower. (This is explained as being “To avoid running into TBB’s soft limit which seems to be one thread less than the number of physical CPU threads (see TBB function calc_workers_soft_limit)”).

On the box I’ve been testing it on, that number is 63, which is cv::getNumberOfCPUs() - 1). One thing I found is that the pre-creation loop seems to create only 62 new threads, because the first iteration of the parallel_for_ loop appears to run on the parent. So I thought that was the issue. But if I have the loop attempt to create one more thread, it seems to deadlock (the os-autoinst test suite times out, presumably because the process never completes startup). But since that’s the case, I’m baffled about how an extra thread seems to get created later.

Does anyone have any idea about what’s going on here and how to fix it? Or, failing that, any alternative ideas for dealing with the signal handling issue which might sidestep this problem? There’s a lot more detail in the issue linked above. Thanks a lot!

Small update on this today: I tried two things. I patched os-autoinst to arbitrarily cap the value it passes to cv::setNumThreads at 4, then watched what happened. New processes started out with 4 threads - the main thread, and three subthreads created by the parallel_for_ loop, with a 14 sigprocmask - but rapidly acquired more subthreads, with 00 sigprocmasks. The highest count I’ve seen is 10. So it really seems like opencv/tbb are creating more threads than we are telling them to.

I also tried patching opencv to set tbb’s global parallelism limit:

diff --git a/modules/core/src/parallel.cpp b/modules/core/src/parallel.cpp
index 5799d73599..841be3b3ac 100644
--- a/modules/core/src/parallel.cpp
+++ b/modules/core/src/parallel.cpp
@@ -108,6 +108,9 @@
     #endif
     #include "tbb/tbb.h"
     #include "tbb/task.h"
+    #if TBB_INTERFACE_VERSION >= 8005
+        #include "tbb/global_control.h"
+    #endif
     #if TBB_INTERFACE_VERSION >= 8000
         #include "tbb/task_arena.h"
     #endif
@@ -727,7 +730,9 @@ void setNumThreads( int threads_ )
     }
 
 #ifdef HAVE_TBB
-
+#if TBB_INTERFACE_VERSION >= 8005
+    tbb::global_control tbb_cont(tbb::global_control::max_allowed_parallelism, threads);
+#endif
 #if TBB_INTERFACE_VERSION >= 8000
     if(tbbArena.is_active()) tbbArena.terminate();
     if(threads > 0) tbbArena.initialize(threads);

But it doesn’t seem like that helps at all either :frowning: We still get more threads than the limit we set.