Error while loading onnx model export from pytorch model

OS: centos-release-8.2-2.2004.0.1.el8.x86_64
opencv: 4.5.3-dev

I used the function to get the model without get any error.
“torch.onnx.export(net, img_tensor, output, verbose=False, opset_version=11)”

I’m trying to load onnx as title, but get error below.

[ERROR:0] global /data_2t/sam/CentOS/opencv_build/opencv/modules/dnn/src/onnx/onnx_importer.cpp (694) handleNode DNN/ONNX: ERROR during processing node with 2 inputs and 1 outputs: [MatMul]:(2060)
terminate called after throwing an instance of 'cv::Exception'
  what():  OpenCV(4.5.3-dev) /data_2t/sam/CentOS/opencv_build/opencv/modules/dnn/src/onnx/onnx_importer.cpp:713: error: (-2:Unspecified error) in function 'handleNode'
> Node [MatMul]:(2060) parse error: OpenCV(4.5.3-dev) /data_2t/sam/CentOS/opencv_build/opencv/modules/dnn/src/onnx/onnx_importer.cpp:1475: error: (-215:Assertion failed) constBlobs.find(node_proto.input(0)) == constBlobs.end() in function 'parseMatMul'
>
Aborted (core dumped)

The error happen at the node [2060] which mapped to the network at line 23
in MHSA.py I thought.

Here I produce the model with no pre-train version and how I load my model code.

Thanks!

  • did you put a net.eval() before the export line ? or with torch.no_grad(): ...
  • setting verbose=True in the export() will show you the exact node
    (and its connections)
  • this looks like a 1 layer transformer to me, right ?
  1. Yes, I have put net.eval() before export

  2. I’ve tried to set verbose to True and it’s the node’s information

%2060 : Float(1, 4, 49, 49, strides=[9604, 2401, 49, 1], requires_grad=1, device=cuda:0) = onnx::MatMul(%2059, %2016) # model.py:42:0
%2061 : Float(1, 4, 49, 49, strides=[9604, 2401, 49, 1], requires_grad=1, device=cuda:0) = onnx::Add(%2044, %2060) # model.py:44:0

maps to the MHSA
content_position = torch.matmul(content_position, q)

  1. Yes, it’s one kind of the transformer implementation
1 Like

i could reproduce the problem, but have not found a solution ;(

note, that the first 2 matmul() calls pass, and that the 3rd, failing one has the tensor args in different order.

looking at the opencv error again, it looks like the current impl only accepts const blobs as the 2nd, but not as 1st arg (order, again)

no idea how to solve it, raise an issue here ?