Deadlock occures at unloading opencv_world481.dll (win-x64) after dnn usage; OpenCL target on Intel UHD 750

Hi All,

Our C++ program runs correctly (win-x64) but at the end (unloading opencv dll) there is a deadlock, callstack:

ntdll.dll!NtWaitForSingleObject() Unknown
KernelBase.dll!WaitForSingleObjectEx() Unknown
igdrcl64.dll!00007ffe152ecb2f() Unknown
igdrcl64.dll!00007ffe1525921e() Unknown
igdrcl64.dll!00007ffe15164b03() Unknown
igdrcl64.dll!00007ffe151648d3() Unknown
igdrcl64.dll!00007ffe1504227a() Unknown
igdrcl64.dll!00007ffe15042a24() Unknown
igdrcl64.dll!00007ffe1505e37d() Unknown
igdrcl64.dll!00007ffe1505e794() Unknown
igdrcl64.dll!00007ffe150149de() Unknown
> opencv_world481d.dll!cv::ocl::Program::Impl::~Impl() Line 4756 C++
opencv_world481d.dll!cv::ocl::Program::Impl::`scalar deleting destructor’(unsigned int) C++
opencv_world481d.dll!cv::ocl::Program::Impl::release() Line 4342 C++
opencv_world481d.dll!cv::ocl::Program::~Program() Line 4818 C++
opencv_world481d.dll!std::pair<std::string const ,cv::ocl::Program>::~pair<std::string const ,cv::ocl::Program>() C++
opencv_world481d.dll!std::pair<std::string const ,cv::ocl::Program>::`scalar deleting destructor’(unsigned int) C++
opencv_world481d.dll!std::_Default_allocator_traits<std::allocator<std::_Tree_node<std::pair<std::string const ,cv::ocl::Program>,void *>>>::destroy<std::pair<std::string const ,cv::ocl::Program>>(std::allocator<std::_Tree_node<std::pair<std::string const ,cv::ocl::Program>,void *>> & __formal={…}, std::pair<std::string const ,cv::ocl::Program> * const _Ptr=0x000001d164449110) Line 730 C++
opencv_world481d.dll!std::_Tree_node<std::pair<std::string const ,cv::ocl::Program>,void *>::_Freenode<std::allocator<std::_Tree_node<std::pair<std::string const ,cv::ocl::Program>,void *>>>(std::allocator<std::_Tree_node<std::pair<std::string const ,cv::ocl::Program>,void *>> & _Al={…}, std::_Tree_node<std::pair<std::string const ,cv::ocl::Program>,void *> * _Ptr=0x000001d1644490f0) Line 383 C++
opencv_world481d.dll!std::_Tree_val<std::_Tree_simple_types<std::pair<std::string const ,cv::ocl::Program>>>::_Erase_tree<std::allocator<std::_Tree_node<std::pair<std::string const ,cv::ocl::Program>,void *>>>(std::allocator<std::_Tree_node<std::pair<std::string const ,cv::ocl::Program>,void *>> & _Al={…}, std::_Tree_node<std::pair<std::string const ,cv::ocl::Program>,void *> * _Rootnode=0x000001d16443e680) Line 748 C++
opencv_world481d.dll!std::_Tree_val<std::_Tree_simple_types<std::pair<std::string const ,cv::ocl::Program>>>::_Erase_head<std::allocator<std::_Tree_node<std::pair<std::string const ,cv::ocl::Program>,void *>>>(std::allocator<std::_Tree_node<std::pair<std::string const ,cv::ocl::Program>,void *>> & _Al={…}) Line 755 C++
opencv_world481d.dll!std::_Tree<std::_Tmap_traits<std::string,cv::ocl::Program,std::less< std::string>,std::allocator<std::pair<std::string const ,cv::ocl::Program>>,0>>::~_Tree<std::_Tmap_traits<std::string,cv::ocl::Program,std::less< std::string>,std::allocator<std::pair<std::string const ,cv::ocl::Program>>,0>>() Line 1084 C++
opencv_world481d.dll!std::map<std::string,cv::ocl::Program,std::less< std::string>,std::allocator<std::pair<std::string const ,cv::ocl::Program>>>::~map< std::string,cv::ocl::Program,std::less< std::string>,std::allocator<std::pair<std::string const ,cv::ocl::Program>>>() C++
opencv_world481d.dll!cv::ocl::Context::Impl::~Impl() Line 2417 C++
opencv_world481d.dll!cv::ocl::Context::Impl::`scalar deleting destructor’(unsigned int) C++
opencv_world481d.dll!cv::ocl::Context::Impl::release() Line 2684 C++
opencv_world481d.dll!cv::ocl::Context::release() Line 2903 C++
opencv_world481d.dll!cv::ocl::Context::~Context() Line 2888 C++
opencv_world481d.dll!cv::ocl::OpenCLExecutionContext::Impl::~Impl() C++
opencv_world481d.dll!cv::ocl::OpenCLExecutionContext::Impl::`scalar deleting destructor’(unsigned int) C++
opencv_world481d.dll!std::_Destroy_in_place< cv::ocl::OpenCLExecutionContext::Impl>(cv::ocl::OpenCLExecutionContext::Impl & _Obj={…}) Line 310 C++
opencv_world481d.dll!std::_Ref_count_obj2< cv::ocl::OpenCLExecutionContext::Impl>::_Destroy() Line 2123 C++
opencv_world481d.dll!std::_Ref_count_base::_Decref() Line 1179 C++
opencv_world481d.dll!std::_Ptr_base< cv::ocl::OpenCLExecutionContext::Impl>::_Decref() Line 1405 C++
opencv_world481d.dll!std::shared_ptr< cv::ocl::OpenCLExecutionContext::Impl>::~shared_ptr< cv::ocl::OpenCLExecutionContext::Impl>() Line 1688 C++
opencv_world481d.dll!cv::ocl::OpenCLExecutionContext::Impl::getInitializedExecutionContext'::2’::`dynamic atexit destructor for ‘g_primaryExecutionContext’'() C++
ucrtbased.dll!_execute_onexit_table::__l2::() Line 206 C++
ucrtbased.dll!__crt_seh_guarded_call::operator()<void (void),int (void) &,void (void)>(__acrt_lock_and_call::__l2::void (void) && setup=void (void){…}, _execute_onexit_table::__l2::int (void) & action=int (void){…}, __acrt_lock_and_call::__l2::void (void) && cleanup=void (void){…}) Line 204 C++
ucrtbased.dll!__acrt_lock_and_call<int (void)>(const __acrt_lock_id lock_id=__acrt_exit_lock, _execute_onexit_table::__l2::int (void) && action=int (void){…}) Line 974 C++
ucrtbased.dll!_execute_onexit_table(_onexit_table_t * table=0x00007ffd87c042c0) Line 231 C++
opencv_world481d.dll!__scrt_dllmain_uninitialize_c() Line 399 C++
opencv_world481d.dll!dllmain_crt_process_detach(const bool is_terminating=false) Line 182 C++
opencv_world481d.dll!dllmain_crt_dispatch(HINSTANCE__ * const instance=0x00007ffd809e0000, const unsigned long reason=0, void * const reserved=0x0000000000000000) Line 220 C++
opencv_world481d.dll!dllmain_dispatch(HINSTANCE__ * const instance=0x00007ffd809e0000, const unsigned long reason=0, void * const reserved=0x0000000000000000) Line 293 C++
opencv_world481d.dll!DllMainCRTStartup(HINSTANCE_ * const instance=0x00007ffd809e0000, const unsigned long reason=0, void * const reserved=0x0000000000000000) Line 335 C++
ntdll.dll!LdrpCallInitRoutine() Unknown
ntdll.dll!LdrpProcessDetachNode() Unknown
ntdll.dll!LdrpUnloadNode() Unknown
ntdll.dll!LdrpUnloadNode() Unknown
ntdll.dll!LdrpDecrementModuleLoadCountEx() Unknown
ntdll.dll!LdrUnloadDll() Unknown
KernelBase.dll!FreeLibrary() Unknown

Exactly the same application program with CPU target is OK.
Exactly the same application program with OpenCL target on RTx4070Ti is OK.
Exactly the same application program with Cuda target on RTx4070Ti is OK.

Is there someone with similar issue?
Latest Intel iGPU Software and installed. (OpenCL Compatibility Pack also installed)
(Same issue with v4.3.0)

Unfortunately, providing MRE would be very difficult.

The workflow more or less:
Net Handle = readNetFromTensorflow(bufferModel);
Handle.setPreferableTarget(DNN_TARGET_OPENCL);
Mat imgi = createMat(Bitmap, size.cy, bytes);
Mat img2;
imgi.convertTo(img2, CV_32F);

Mat imgo = createMat(matrixout, ll/2, kk/2);
Mat blob = blobFromImage(img2, 1 / 255.F);

Handle.setInput(blob);

outs = pMyNet->Handle.forward();

imagesFromBlob(outs, predict);