Use NVIDIA GeForce RTX 3070 run opencl version of cv::dft error

I use NVIDIA GeForce RTX 3070 run opencl version of cv::dft, the code as follows. And I got error when image size is big.

Mat img0 = imread("LenaGRAY.bmp", 0);
resize(img0, img0, Size(8000, 4000));
img0.convertTo(img0, CV_32F);
Mat img1 = Mat::zeros(img0.size(), img0.type());
vector<Mat>imgVec;
imgVec.push_back(img0);
imgVec.push_back(img1);
Mat img;
merge(imgVec, img);

UMat imgU;
img.copyTo(imgU);
UMat dstU;
dft(imgU, dstU);

Error message as follows:

OpenCL program build log: core/fft
Status -9999: Unknown OpenCL error
-D LOCAL_SIZE=8000 -D kercn=8 -D FT=float -D CT=float2 -D RADIX_PROCESS=fft_radix8(smem, twiddles+0, ind, 1, 1000); fft_radix8(smem, twiddles+7, ind, 8, 1000); fft_radix5_B2(smem, twiddles+63, ind, 64, 1600); fft_radix5_B2(smem, twiddles+319, ind, 320, 1600); fft_radix5_B2(smem, twiddles+1599, ind, 1600, 1600); -D COMPLEX_INPUT -D COMPLEX_OUTPUT
ptxas error : Entry function 'ifft_multi_radix_cols' uses too much shared data (0xfa08 bytes, 0xc00 max)
ptxas error : Entry function 'ifft_multi_radix_rows' uses too much shared data (0xfa08 bytes, 0xc00 max)
ptxas error : Entry function 'fft_multi_radix_cols' uses too much shared data (0xfa08 bytes, 0xc00 max)
ptxas error : Entry function 'fft_multi_radix_rows' uses too much shared data (0xfa08 bytes, 0xc00 max)