Prediction result shares no similarity with training data

I am trying to use an MLP for classification (yes, I’m aware of dnn, but now I’m interested in ml) so I put together a simple example to test things with a simple formula: array of classId / classesCount values => zero-filled vector with 1 at classId index:

#include <opencv2/opencv.hpp>
#include <opencv2/ml.hpp>
#include <vector>


//---------------------------------------------------------------------
//---------------------------------------------------------------------
int main(int argc, char* argv[])
{
	int samplesCount = 1000;
	int imageSquareSide = 20;
	int sampleLength = imageSquareSide * imageSquareSide;
	int classesCount = 10;
	int epochsCount = 200;

	cv::Mat x(samplesCount, sampleLength, CV_32FC1);
	cv::Mat y(samplesCount, classesCount, CV_32FC1, 0.0);
	std::vector<float> expectedClassIds;

	std::cout << "Creating dataset ..." << std::endl;

	float step = 1.0f / classesCount;
	for (int i = 0; i < x.rows; i++)
	{
		int classId = rand() % classesCount;
		expectedClassIds.push_back(classId);
		for (int j = 0; j < x.cols; j++)
		{
			x.at<float>(i, j) = classId * step;
		}
		y.at<float>(i, classId) = 1;
	}

	std::cout << "Dataset created" << std::endl;

	std::cout << "Training MLP ..." << std::endl;

	auto mlp = cv::ml::ANN_MLP::create();

	std::vector<int> layerSizes = { sampleLength, 5 * sampleLength, 2 * sampleLength, classesCount };
	mlp->setLayerSizes(layerSizes);
	mlp->setActivationFunction(cv::ml::ANN_MLP::ActivationFunctions::SIGMOID_SYM);
	mlp->setTrainMethod(cv::ml::ANN_MLP::TrainingMethods::BACKPROP);
	mlp->setTermCriteria(cv::TermCriteria(cv::TermCriteria::Type::MAX_ITER + cv::TermCriteria::Type::EPS, epochsCount, 0.04));

	bool result = mlp->train(x, cv::ml::SampleTypes::ROW_SAMPLE, y);

	std::cout << "MLP trained" << std::endl;

	std::cout << "Predicting ..." << std::endl;

	cv::Mat predictions;
	mlp->predict(x, predictions);

	int correctPredictionsCount = 0;

	for (int i = 0; i < predictions.rows; i++)
	{
		float maxPredictedProbability = predictions.at<float>(i, 0);
		int predictedClassId = 0;
		int actualClassId = expectedClassIds[i];

		std::cout << "Prediction " << i << " | expecting [" << actualClassId << "] = 1:\n";
		for (int j = 0; j < predictions.cols; j++)
		{
			std::cout << "\t[" << j << "] = " << predictions.at<float>(i, j) << "\n";
			if (predictions.at<float>(i, j) > maxPredictedProbability)
			{
				maxPredictedProbability = predictions.at<float>(i, j);
				predictedClassId = j;
			}
		}

		if (predictedClassId == actualClassId)
		{
			correctPredictionsCount++;
		}
	}

	std::cout << "Predicted" << std::endl;
	std::cout << "Accuracy: " << (float(correctPredictionsCount) / float(y.rows)) << std::endl;

	return 0;
}

However the prediction result is nowhere near the training data, and even more so - all the output values are just a combination of 1.40311 and -0.403105 at different indexes. Example of an output:

Prediction 999 | expecting [9] = 1:
        [0] = 1.40311
        [1] = -0.403105
        [2] = 1.40311
        [3] = 1.40311
        [4] = 1.40311
        [5] = 1.40311
        [6] = 1.40311
        [7] = 1.40311
        [8] = 1.40311
        [9] = 1.40311

Tried other activation functions as well, but was always getting strange results:

  • GAUSSIAN - gives -0.0526316 on all outputs regardless of input data.
  • IDENTITY, RELU, LEAKYRELU - take eternity with 200 epochs, with 2 epochs instead give -nan(ind) an all outputs.

What can be the cause of that? Or am I simply missing something?

that’s assigning the same value to all elements of the input vector. it’s flat. do you expect this thing to distinguish different levels at any input?

I haven’t done anything with OpenCV’s ANN_MLP. perhaps the thing needs to be told that it’s doing classification, vs. regression?

in the docs I found an alarming sentence:

All the weights are set to zeros. Then, the network is trained using a set of input and output vectors.

that should be randomized, or else all neurons will be adjusted the same.

UPDATE_WEIGHTS
Update the network weights, rather than compute them from scratch. In the latter case the weights are initialized using the Nguyen-Widrow algorithm.

TrainFlags has UPDATE_WEIGHTS. worth a try?

In this specific example - yes, just flat input data for now.

Not that I’m aware of at least. In case of CNNs it’d depend on actual network configuration but ml gives no such tools, only an activation function and learning conditions.

Randomized as in not all the inputs to be the same? I tried it with actual 20x20 monochrome images (hence 400 input neurons) but was getting equally strange results. So I decided to try in on this more simple data instead to confirm it’s working as it should.

Thanks. Tried it now, but got even more confused: all scales and weights are now -0. which results in all 0 outputs.

that would mean - you’d want a regression, not a classification
(like: ‘learn’ the input function, and produce similar output)
(and sadly, this isn’t possible with ANN_MLP, since it would need a linear activation for the last layer, and you can’t set activation funcs per layer independently)

So regression as in “take 10 smartphone parameters - give a single value rating how good it is”? Or no?

UPD: Does it mean a simple neural network with about equal structure, but different activation functions for different layers might handle this task?

UPD2: Wait, so I’ve remembered I was making an even more simple MLP (also for image classification (if I understand this term correctly — it had to recognize handwriting) back in the university, and it had 2 layers with linear activation. Might it be that a hidden layer is messing up result in this specific case?

UPD3: Apparently not. With just 2 layers it’s back to the same result regardless of activation type and

there is an example here

I was referring to the network’s parameters (weights, biases), which should be initialized to something with more “texture” than

I’m intrigued now. I think I’ll mess with ANN_MLP from python this evening. the keyboard on this new laptop is driving me nuts though.

Oh, thanks, I will try it.

So with zeros there’s nothing to grab onto at the start?

well, IDK how ANN_MLP training is actually implemented.

in a dense layer, if the weights of one “neuron” are equal to the weights of another, they behave the same and any updates from training will affect them the same, so they’ll end up being twins forever… and if the whole layer’s weights are initialized the same, the whole layer would be near worthless because it’s equivalent to a single neuron.

there has to be some randomness that affects neurons individually. it’s either in the initialization or somewhere in the training.

stochastically picking training data for a batch may be random but it wouldn’t cause neurons to differentiate.

  • maybe the docs lie and the weights aren’t initialized to all 0
  • maybe they inject some randomness during training, in the right places

just speculating.

since you showed one output that looks kinda one-hot, maybe the network did train decently, but there’s some issue with indexing? you said inputs didn’t match the outputs you expected… but you did see outputs always looking somewhat one-hot, or did you see any other patterns?

It turned out to be not very helpful unfortunately. It uses a dataset not of actual images, but a dataset with 17 parameters of images such as total number of pixels of an object (letter) or different means and correlations you have to figure out from an image yourself.

I’ve tried to predict with a single [400x1] vector as well (the last one to be precise) - to get also 1-dimensional ([10x1]) output. And I’m getting the same data.

Other variation is the same pattern (0, 2…9 are the same, 1 is different), but with those 2 values switching places.