After processing the semantic segmentation results of Deeplabv3 using Argmax, the results were incorrect. Please help me take a look

The inference statement used is this:

Mat score = net.forward("outputs");

The output format of the model is float [1x2x512x648]
The input image looks like this:

I can parse the correct results using these few codes.

cv::Mat output1(512, 648, CV_32F, (float*)score.data);
cv::Mat output2(512, 648, CV_32F, (float*)score.data+648*512);
cv::Mat sum = output2 - output1;
cv::threshold(sum, sum, 0, 255, cv::THRESH_BINARY);

But using the method I wrote myself, the parsing result is incorrect. May I know how to modify my code.

const int OUTPUT_H = 512;
const int OUTPUT_W = 648;
const int NUM_CLASSES = 2;

void postprocess(const uchar* output, Mat& result)
{
	result.create(OUTPUT_H, OUTPUT_W, CV_8U);
	//result.create(OUTPUT_H, OUTPUT_W, CV_32SC1);
	for (int i = 0; i < OUTPUT_H; ++i) {
		for (int j = 0; j < OUTPUT_W; ++j) {
			int idx = i * OUTPUT_W + j;
			int max_idx = -1;
			float max_val = -FLT_MAX;
			for (int k = 0; k < NUM_CLASSES; ++k) {
				float val = output[idx * NUM_CLASSES + k];
				if (val > max_val) {
					max_val = val;
					max_idx = k;
				}
			}
			result.at<uint8_t>(i, j) = max_idx * 100;
			//result.at<int>(i, j) = max_idx * 100;
		}
	}
}

// Postprocess output
Mat result;
postprocess(score.data, result);

please help me.
The src image,result image, output1 image, output2 image ,sum image like this:

network output is float32, you treat it as it were uchar:

so wrong values (wrong type) here:

also, this looks wrong. what did you want here ?

do you have a link to the generator / export code, please ?

I have modified the data type, but the result is still incorrect.
void postprocess(const float* output, Mat& result)

// Postprocess output
Mat result;
const float* data = reinterpret_cast<const float*>(score.ptr());
postprocess(data, result);

did you also change the postprocess function? please show the current code.



void postprocess(const float* output, Mat& result)
{
	result.create(OUTPUT_H, OUTPUT_W, CV_8U);
	//result.create(OUTPUT_H, OUTPUT_W, CV_32SC1);
	for (int i = 0; i < OUTPUT_H; ++i) {
		for (int j = 0; j < OUTPUT_W; ++j) {
			int idx = i * OUTPUT_W + j;
			int max_idx = -1;
			float max_val = -FLT_MAX;
			for (int k = 0; k < NUM_CLASSES; ++k) {
				float val = output[idx * NUM_CLASSES + k];
				if (val > max_val) {
					max_val = val;
					max_idx = k;
				}
			}
			result.at<uint8_t>(i, j) = max_idx * 100;
			//result.at<int>(i, j) = max_idx * 100;
		}
	}
}


int main() {

……
Mat score = net.forward("outputs");

// Postprocess output
Mat result;
const float* data = reinterpret_cast<const float*>(score.ptr());
postprocess(data, result);

……
cv::Mat output1(512, 648, CV_32F, (float*)score.data);|
cv::Mat output2(512, 648, CV_32F, (float*)score.data+648*512);|
cv::Mat sum = output2 - output1;|
cv::threshold(sum, sum, 0, 255, cv::THRESH_BINARY);|

……
}

The image composition in output1 and output2 has different probabilities, one probability being the background layer and the other probability being the desired result. The semantic segmentation of deeplab is to calculate the Argmax for different results. Taking the value with the highest probability as the result may require some AI knowledge to understand, and its result is in the form of a 4-dimensional matrix NCHW. Because I don’t know how to parse its results, the core code is to refer to others’ semantic segmentation and parsing, and I don’t quite understand either.

I’d recommend dumping that float array to a tiff (tiff can do floats and multiple channels/layers/pages) and looking at the data using some tool… python at least because that’s a lot less of a headache when it comes to dealing with arrays and numbers. you could also upload the data.

this still calculates interleaved probs, like p1,p2,p1,p2,p1,p2,…
while there is a whole WxH plane lying between pixels at the same pos between output1 & output2
isn’t it rather:

idx + (k * W * H)

?