Action recognize with NN (2plus1d_34)

I have retrained the 2plus1d_34 network using this notebook (computervision-recipes/01_training_introduction.ipynb at staging · microsoft/computervision-recipes · GitHub). Saved the model in ONNX format:

    def save(self, model_path: Union[Path, str]) -> None:
        """ Save the model to a path on disk. """, model_path)
        dummy_input = Variable(torch.randn(1, 3, 8, 112, 112, device="cuda"))
        torch.onnx.export(self.model, dummy_input, model_path)

I run this model using opencv c++ and python onnx and have two different results. Could you tell me what I’m doing wrong. Also, I did not work with arrays of dimension larger than the usual image, and here it was necessary to create a 5d array. I don’t really understand why sizes[5] and dim = 4.


auto net = cv::dnn::readNetFromONNX("test_model.onnx");
int sizes[5] = {1, 3, 8, 112, 112 };
cv::Mat w_i = cv::Mat(4, sizes, CV_32FC1, cv::Scalar(0));

for (int k = 0; k < 8; k++) {
  for (int i = 0; i < 112; i++) {
      for (int j = 0; j < 112; j++) {
<float>(cv::Vec<int, 5>(0, 0, k, i, j)) = 0.0;
<float>(cv::Vec<int, 5>(0, 1, k, i, j)) = 0.0;
<float>(cv::Vec<int, 5>(0, 2, k, i, j)) = 0.0;

cv::Mat out;
std::cout << out << std::endl;


import torch
import onnxruntime as ort

ort_sess = ort.InferenceSession('test_model.onnx')
frame = np.zeros((1,3,8,112,112))

input_data = frame
input_data = np.array(input_data, dtype=np.float32)

session = ort.InferenceSession('test_model.onnx')
output_data =, {session.get_inputs()[0].name: input_data})


if I create a matrix of the wrong size in c++ (for example 4D), then I will get this error:
[ERROR:0] OPENCV/DNN: [Convolution]:(640): getMemoryShapes() throws exception. inputs=1 outputs=0/1 blobs=2
[ERROR:0] input[0] = [ 3 8 112 112 ]
[ERROR:0] blobs[0] = CV_32FC1 [ 45 3 1 7 7 ]
[ERROR:0] blobs[1] = CV_32FC1 [ 45 1 ]
[ERROR:0] Exception message: OpenCV(3.4.10) c:\tarhan\compiling\opencv_3410\opencv-3.4.10\modules\dnn\src\layers\convolution_layer.cpp:306: error: (-2:Unspecified error) Number of input channels should be multiple of 3 but got 8 in function ‘cv::dnn::ConvolutionLayerImpl::getMemoryShapes’

I don’t understand why there are two blobs here and why they don’t match what I saved

a link to the exported onnx, and an example image sequence would be quite helpful here.

due to the onnx export, which ‘freezes’ the dimensions. you cant change image / time_series size after that

that simply looks wrong, it must be
cv::Mat w_i = cv::Mat(5, sizes,

you’re writing out-of-bounds in the following for - loops

oh, noes, quite outdated. please use recent 4.7 instead
(also check / update the python cv2 version !)

1 Like

here is a link to my model: test_model.onnx - Google Drive

I use an array filled with zeros as a sequence of images, because I just want to make sure that the python and c++ code produces the same result

when I do dims = 5 I get this error : std::exception: OpenCV(3.4.10) c:\tarhan\compiling\opencv_3410\opencv-3.4.10\modules\dnn\src\dnn.cpp:2997: error: (-215:Assertion failed) total(os[i]) > 0 in function ‘cv::dnn::experimental_dnn_34_v17::Net::Impl::getLayerShapesRecursively’

unfortunately, I can’t just raise the opencv version as this will lead to big changes in my project.