Prediction result shares no similarity with training data

Shassk · May 22, 2023, 7:53am

I am trying to use an MLP for classification (yes, I’m aware of dnn, but now I’m interested in ml) so I put together a simple example to test things with a simple formula: array of classId / classesCount values => zero-filled vector with 1 at classId index:

#include <opencv2/opencv.hpp>
#include <opencv2/ml.hpp>
#include <vector>


//---------------------------------------------------------------------
//---------------------------------------------------------------------
int main(int argc, char* argv[])
{
	int samplesCount = 1000;
	int imageSquareSide = 20;
	int sampleLength = imageSquareSide * imageSquareSide;
	int classesCount = 10;
	int epochsCount = 200;

	cv::Mat x(samplesCount, sampleLength, CV_32FC1);
	cv::Mat y(samplesCount, classesCount, CV_32FC1, 0.0);
	std::vector<float> expectedClassIds;

	std::cout << "Creating dataset ..." << std::endl;

	float step = 1.0f / classesCount;
	for (int i = 0; i < x.rows; i++)
	{
		int classId = rand() % classesCount;
		expectedClassIds.push_back(classId);
		for (int j = 0; j < x.cols; j++)
		{
			x.at<float>(i, j) = classId * step;
		}
		y.at<float>(i, classId) = 1;
	}

	std::cout << "Dataset created" << std::endl;

	std::cout << "Training MLP ..." << std::endl;

	auto mlp = cv::ml::ANN_MLP::create();

	std::vector<int> layerSizes = { sampleLength, 5 * sampleLength, 2 * sampleLength, classesCount };
	mlp->setLayerSizes(layerSizes);
	mlp->setActivationFunction(cv::ml::ANN_MLP::ActivationFunctions::SIGMOID_SYM);
	mlp->setTrainMethod(cv::ml::ANN_MLP::TrainingMethods::BACKPROP);
	mlp->setTermCriteria(cv::TermCriteria(cv::TermCriteria::Type::MAX_ITER + cv::TermCriteria::Type::EPS, epochsCount, 0.04));

	bool result = mlp->train(x, cv::ml::SampleTypes::ROW_SAMPLE, y);

	std::cout << "MLP trained" << std::endl;

	std::cout << "Predicting ..." << std::endl;

	cv::Mat predictions;
	mlp->predict(x, predictions);

	int correctPredictionsCount = 0;

	for (int i = 0; i < predictions.rows; i++)
	{
		float maxPredictedProbability = predictions.at<float>(i, 0);
		int predictedClassId = 0;
		int actualClassId = expectedClassIds[i];

		std::cout << "Prediction " << i << " | expecting [" << actualClassId << "] = 1:\n";
		for (int j = 0; j < predictions.cols; j++)
		{
			std::cout << "\t[" << j << "] = " << predictions.at<float>(i, j) << "\n";
			if (predictions.at<float>(i, j) > maxPredictedProbability)
			{
				maxPredictedProbability = predictions.at<float>(i, j);
				predictedClassId = j;
			}
		}

		if (predictedClassId == actualClassId)
		{
			correctPredictionsCount++;
		}
	}

	std::cout << "Predicted" << std::endl;
	std::cout << "Accuracy: " << (float(correctPredictionsCount) / float(y.rows)) << std::endl;

	return 0;
}

However the prediction result is nowhere near the training data, and even more so - all the output values are just a combination of 1.40311 and -0.403105 at different indexes. Example of an output:

Prediction 999 | expecting [9] = 1:
        [0] = 1.40311
        [1] = -0.403105
        [2] = 1.40311
        [3] = 1.40311
        [4] = 1.40311
        [5] = 1.40311
        [6] = 1.40311
        [7] = 1.40311
        [8] = 1.40311
        [9] = 1.40311

Tried other activation functions as well, but was always getting strange results:

GAUSSIAN - gives -0.0526316 on all outputs regardless of input data.
IDENTITY, RELU, LEAKYRELU - take eternity with 200 epochs, with 2 epochs instead give -nan(ind) an all outputs.

What can be the cause of that? Or am I simply missing something?

crackwitz · May 22, 2023, 8:07am

that’s assigning the same value to all elements of the input vector. it’s flat. do you expect this thing to distinguish different levels at any input?

I haven’t done anything with OpenCV’s ANN_MLP. perhaps the thing needs to be told that it’s doing classification, vs. regression?

in the docs I found an alarming sentence:

All the weights are set to zeros. Then, the network is trained using a set of input and output vectors.

that should be randomized, or else all neurons will be adjusted the same.

UPDATE_WEIGHTS
Update the network weights, rather than compute them from scratch. In the latter case the weights are initialized using the Nguyen-Widrow algorithm.

TrainFlags has UPDATE_WEIGHTS. worth a try?

Shassk · May 22, 2023, 8:31am

In this specific example - yes, just flat input data for now.

Not that I’m aware of at least. In case of CNNs it’d depend on actual network configuration but ml gives no such tools, only an activation function and learning conditions.

Randomized as in not all the inputs to be the same? I tried it with actual 20x20 monochrome images (hence 400 input neurons) but was getting equally strange results. So I decided to try in on this more simple data instead to confirm it’s working as it should.

Thanks. Tried it now, but got even more confused: all scales and weights are now -0. which results in all 0 outputs.

berak · May 22, 2023, 8:32am

that would mean - you’d want a regression, not a classification
(like: ‘learn’ the input function, and produce similar output)
(and sadly, this isn’t possible with ANN_MLP, since it would need a linear activation for the last layer, and you can’t set activation funcs per layer independently)

Shassk · May 22, 2023, 8:35am

So regression as in “take 10 smartphone parameters - give a single value rating how good it is”? Or no?

UPD: Does it mean a simple neural network with about equal structure, but different activation functions for different layers might handle this task?

UPD2: Wait, so I’ve remembered I was making an even more simple MLP (also for image classification (if I understand this term correctly — it had to recognize handwriting) back in the university, and it had 2 layers with linear activation. Might it be that a hidden layer is messing up result in this specific case?

UPD3: Apparently not. With just 2 layers it’s back to the same result regardless of activation type and

laurent.berger · May 22, 2023, 9:01am

there is an example here

github.com

opencv/opencv/blob/4.x/samples/cpp/letter_recog.cpp#L344-L422


      
          static bool
          build_mlp_classifier( const string& data_filename,
                                const string& filename_to_save,
                                const string& filename_to_load )
          {
              const int class_count = 26;
              Mat data;
              Mat responses;
          
          
    bool ok = read_num_class_data( data_filename, 16, &data, &responses );
              if( !ok )
                  return ok;
          
          
    Ptr<ANN_MLP> model;
          
          
    int nsamples_all = data.rows;
              int ntrain_samples = (int)(nsamples_all*0.8);
          
          
    // Create or load MLP classifier
              if( !filename_to_load.empty() )

This file has been truncated. show original

crackwitz · May 22, 2023, 9:06am

I was referring to the network’s parameters (weights, biases), which should be initialized to something with more “texture” than

I’m intrigued now. I think I’ll mess with ANN_MLP from python this evening. the keyboard on this new laptop is driving me nuts though.

Shassk · May 22, 2023, 9:14am

Oh, thanks, I will try it.

So with zeros there’s nothing to grab onto at the start?

crackwitz · May 22, 2023, 9:38am

well, IDK how ANN_MLP training is actually implemented.

in a dense layer, if the weights of one “neuron” are equal to the weights of another, they behave the same and any updates from training will affect them the same, so they’ll end up being twins forever… and if the whole layer’s weights are initialized the same, the whole layer would be near worthless because it’s equivalent to a single neuron.

there has to be some randomness that affects neurons individually. it’s either in the initialization or somewhere in the training.

stochastically picking training data for a batch may be random but it wouldn’t cause neurons to differentiate.

maybe the docs lie and the weights aren’t initialized to all 0
maybe they inject some randomness during training, in the right places

just speculating.

since you showed one output that looks kinda one-hot, maybe the network did train decently, but there’s some issue with indexing? you said inputs didn’t match the outputs you expected… but you did see outputs always looking somewhat one-hot, or did you see any other patterns?

Shassk · May 22, 2023, 12:41pm

It turned out to be not very helpful unfortunately. It uses a dataset not of actual images, but a dataset with 17 parameters of images such as total number of pixels of an object (letter) or different means and correlations you have to figure out from an image yourself.

I’ve tried to predict with a single [400x1] vector as well (the last one to be precise) - to get also 1-dimensional ([10x1]) output. And I’m getting the same data.

Other variation is the same pattern (0, 2…9 are the same, 1 is different), but with those 2 values switching places.

laurent.berger · May 22, 2023, 1:22pm

Topic		Replies	Views
Using a MLP to classify lists of numbers dnn , ml	7	391	May 24, 2023
Issue when training the ML model C++ dnn , ml	1	53	January 28, 2025
Why SVM->predict(Mat) returns very large value that is greater than number of class? C++ svm	1	149	March 4, 2024
2D Dataset Classifikation with HaarCascade dnn , ml	2	358	January 25, 2023
Opencv ANN_MLP model xx.xml C++	0	296	April 25, 2022

Prediction result shares no similarity with training data

Related topics