The OUTPUT of the ONNX model shows a strange image

the graphics is nice !
and it shows, that “our model” here ends with the 7x7x512 maxpool (no fc layers)

and i took a closer look at the training notebook:
normally i’d expect to see some inference like:

result = VGG(input)

but the result is nowwhere used here. instead it seems to use (&manipulate) the input image “by reference”, very weird …