Dnn Action Recognition

Hi,
i know that Dnn Module can load 3D Resnet to perform action recognition or classification in videos.
I want to use other Dnn Models like I3D or R(2+1)D to perform the same task, but this seems not to work. (ONNX correctly loaded but crash during the inference).
Is this possible in any way with OpenCV Dnn Module, or else is this planned in a future version ?
Thanks

can you be more specific ?
what are the errors ?
which framework is used to train it ?
link to the code ?
maybe a netron graph ?
how do you feed your data into it ?

such things are done on a “case by case” basis.
if we cannot help from here, raise a github issue.

Thank you Berak and sorry for my late reply

I am not able to modify or share the ONNX files, but here are the netron graphs for Resnet, R3D (instead of I3D) and R2plus1d

The code is

void test_simple()
{
	std::string f_classes("../models/action_recognition_fr.txt");
	vector<string>v_classes;
	std::ifstream ifs(f_classes);
	std::string line;
	while (std::getline(ifs, line))
	{
		v_classes.push_back(line);
	}
	
	//std::string f_onnx("../models/resnet-34_kinetics.onnx"); // OK
	std::string f_onnx("../models/r3d18.onnx"); // NOT OK
	//std::string f_onnx("../models/r2p1d18.onnx"); // NOT OK

	Net net = readNetFromONNX(std::string(f_onnx));
	net.setPreferableTarget(DNN_TARGET_CPU);
	net.setPreferableBackend(DNN_BACKEND_OPENCV);

	std::string f_video("e:/divx/boxe.mkv");
	VideoCapture cap(f_video);

	int sample_duration = 16;
	int sample_size = 112;

	vector<Mat>v_frames;
	v_frames.clear();

	Mat frame_bidon;
	cap >> frame_bidon;

	for (int i = 0; i < sample_duration; i++)
	{
		Mat frame, frame_resized, frame_f;
		cap >> frame;
		resize(frame, frame_resized, Size(sample_size, sample_size));
		frame_resized.convertTo(frame_f, CV_32FC3);
		v_frames.push_back(frame_f);
	}

	Mat blob = blobFromImages(v_frames, 1.0, Size(112, 112), (114.7748, 107.7354, 99.4750), true, true, CV_32F);
	int sz[] = { 1,blob.size[1], blob.size[0], blob.size[2], blob.size[3] };
	Mat newblob = Mat(5, sz, CV_32F, blob.ptr<float>(0));

	Sleep(300);

	net.setInput(newblob);

	Sleep(300);
	
	cout << "Fwd..." << endl;

	Mat score = net.forward();

	cout << "Done." << endl;

	Point pmin, pmax;
	double vmin, vmax;
	minMaxLoc(score, &vmin, &vmax, &pmin, &pmax);
	cout << "Action : " << v_classes[pmax.x] << endl;
}

and the outputs for Resnet, R3D, R2plus1D

PS E:\Work\Z-Presta\Code-THL\VideoAnalysis-opencv\bin> .\ActionReco-v452.exe (Resnet)
Fwd...
Done.
Action : jouer du trombone

PS E:\Work\Z-Presta\Code-THL\VideoAnalysis-opencv\bin> .\ActionReco-v452.exe (R3D)
Fwd...
OpenCV: terminate handler is called! The last OpenCV error is:
OpenCV(4.5.2) Error: Assertion failed (srcMat.dims == 2 && srcMat.cols == weights.cols && dstMat.rows == srcMat.rows && dstMat.cols == weights.rows && srcMat.type() == weights.type() && weights.type() == dstMat.type() && srcMat.type() == CV_32F && (biasMat.empty() || (biasMat.type() == srcMat.type() && biasMat.isContinuous() && (int)biasMat.total() == dstMat.cols))) in cv::dnn::FullyConnectedLayerImpl::FullyConnected::run, file E:\Develop\opencv-4.5.2\modules\dnn\src\layers\fully_connected_layer.cpp, line 180

PS E:\Work\Z-Presta\Code-THL\VideoAnalysis-opencv\bin> .\ActionReco-v452.exe (R2Plus1D)
Fwd...
OpenCV: terminate handler is called! The last OpenCV error is:
OpenCV(4.5.2) Error: Assertion failed (total(os[i]) > 0) in cv::dnn::dnn4_v20210301::Net::Impl::getLayerShapesRecursively, file E:\Develop\opencv-4.5.2\modules\dnn\src\dnn.cpp, line 3534

I Do not know which framework is used to train, but even without the training phase and with random weights, the network shound not cause opencv exception

I do not think that the problem comes from the data, because everything works fine with the first dnn (Resnet).

Thank you for your help !

Edit : OpenCV 4.5.2 / VisualStudio 2017 / Windows10

im not 100% sure, but this looks bad:

even if it does not crash on resnet, imo you cannot “swizzle” Channels and Time/Batch from

[B,C,H,W]  to [1,C,B,H,W] so easy, 

i remember needing a permute layer:

and, without seeing code, impossible to say, if those dimensions are correct for other networks …

[edit]
ok, could reproduce problem with r2p1d18 network on colab:

import numpy as np, cv2

net = cv2.dnn.readNet("r2p1d18.onnx")
dat = np.ones((1,3,16,112,112),np.float32)
net.setInput(dat)
res = net.forward()


----> 6 res = net.forward()
      7 print(res)

error: OpenCV(4.5.3-dev) /content/opencv/modules/dnn/src/dnn.cpp:3564: error: (-215:Assertion failed) total(os[i]) > 0 in function 'getLayerShapesRecursively'