After processing the semantic segmentation results of Deeplabv3 using Argmax, the results were incorrect. Please help me take a look

ZJD · May 11, 2023, 3:36am

The inference statement used is this：

Mat score = net.forward("outputs");

The output format of the model is float [1x2x512x648]
The input image looks like this：

I can parse the correct results using these few codes.

cv::Mat output1(512, 648, CV_32F, (float*)score.data);
cv::Mat output2(512, 648, CV_32F, (float*)score.data+648*512);
cv::Mat sum = output2 - output1;
cv::threshold(sum, sum, 0, 255, cv::THRESH_BINARY);

But using the method I wrote myself, the parsing result is incorrect. May I know how to modify my code.

const int OUTPUT_H = 512;
const int OUTPUT_W = 648;
const int NUM_CLASSES = 2;

void postprocess(const uchar* output, Mat& result)
{
	result.create(OUTPUT_H, OUTPUT_W, CV_8U);
	//result.create(OUTPUT_H, OUTPUT_W, CV_32SC1);
	for (int i = 0; i < OUTPUT_H; ++i) {
		for (int j = 0; j < OUTPUT_W; ++j) {
			int idx = i * OUTPUT_W + j;
			int max_idx = -1;
			float max_val = -FLT_MAX;
			for (int k = 0; k < NUM_CLASSES; ++k) {
				float val = output[idx * NUM_CLASSES + k];
				if (val > max_val) {
					max_val = val;
					max_idx = k;
				}
			}
			result.at<uint8_t>(i, j) = max_idx * 100;
			//result.at<int>(i, j) = max_idx * 100;
		}
	}
}


// Postprocess output
Mat result;
postprocess(score.data, result);

please help me.
The src image,result image, output1 image, output2 image ,sum image like this：

berak · May 11, 2023, 5:47am

network output is float32, you treat it as it were uchar:

so wrong values (wrong type) here:

also, this looks wrong. what did you want here ?

do you have a link to the generator / export code, please ?

ZJD · May 11, 2023, 6:30am

I have modified the data type, but the result is still incorrect.
void postprocess(const float* output, Mat& result)

// Postprocess output
Mat result;
const float* data = reinterpret_cast<const float*>(score.ptr());
postprocess(data, result);

crackwitz · May 11, 2023, 6:39am

did you also change the postprocess function? please show the current code.

ZJD · May 11, 2023, 6:44am



void postprocess(const float* output, Mat& result)
{
	result.create(OUTPUT_H, OUTPUT_W, CV_8U);
	//result.create(OUTPUT_H, OUTPUT_W, CV_32SC1);
	for (int i = 0; i < OUTPUT_H; ++i) {
		for (int j = 0; j < OUTPUT_W; ++j) {
			int idx = i * OUTPUT_W + j;
			int max_idx = -1;
			float max_val = -FLT_MAX;
			for (int k = 0; k < NUM_CLASSES; ++k) {
				float val = output[idx * NUM_CLASSES + k];
				if (val > max_val) {
					max_val = val;
					max_idx = k;
				}
			}
			result.at<uint8_t>(i, j) = max_idx * 100;
			//result.at<int>(i, j) = max_idx * 100;
		}
	}
}



int main() {

……
Mat score = net.forward("outputs");

// Postprocess output
Mat result;
const float* data = reinterpret_cast<const float*>(score.ptr());
postprocess(data, result);

……
cv::Mat output1(512, 648, CV_32F, (float*)score.data);|
cv::Mat output2(512, 648, CV_32F, (float*)score.data+648*512);|
cv::Mat sum = output2 - output1;|
cv::threshold(sum, sum, 0, 255, cv::THRESH_BINARY);|

……
}

ZJD · May 11, 2023, 7:18am

The image composition in output1 and output2 has different probabilities, one probability being the background layer and the other probability being the desired result. The semantic segmentation of deeplab is to calculate the Argmax for different results. Taking the value with the highest probability as the result may require some AI knowledge to understand, and its result is in the form of a 4-dimensional matrix NCHW. Because I don’t know how to parse its results, the core code is to refer to others’ semantic segmentation and parsing, and I don’t quite understand either.

crackwitz · May 11, 2023, 11:46am

I’d recommend dumping that float array to a tiff (tiff can do floats and multiple channels/layers/pages) and looking at the data using some tool… python at least because that’s a lot less of a headache when it comes to dealing with arrays and numbers. you could also upload the data.

berak · May 11, 2023, 12:34pm

this still calculates interleaved probs, like p1,p2,p1,p2,p1,p2,…
while there is a whole WxH plane lying between pixels at the same pos between output1 & output2
isn’t it rather:

idx + (k * W * H)

?

Topic		Replies	Views
OpenCV DNN Semantic Segmention doesn't produce expected result with Deeplabv3.onnx Model C++ dnn	1	892	June 28, 2021
Open CV dnn module python and c++ outputs are different C++ dnn	12	1499	March 31, 2021
How to access the output layer of a semantic segmenation model (with onnx) C++ dnn	0	381	April 26, 2022
Model inference resulting in unknown rows and cols C++ dnn	7	472	August 14, 2023
Issue when training the ML model C++ dnn , ml	1	53	January 28, 2025

After processing the semantic segmentation results of Deeplabv3 using Argmax, the results were incorrect. Please help me take a look

Related topics