I am trying to use an MLP for classification (yes, I’m aware of dnn, but now I’m interested in ml) so I put together a simple example to test things with a simple formula: array of classId / classesCount values => zero-filled vector with 1 at classId index:
However the prediction result is nowhere near the training data, and even more so - all the output values are just a combination of 1.40311 and -0.403105 at different indexes. Example of an output:
that’s assigning the same value to all elements of the input vector. it’s flat. do you expect this thing to distinguish different levels at any input?
I haven’t done anything with OpenCV’s ANN_MLP. perhaps the thing needs to be told that it’s doing classification, vs. regression?
in the docs I found an alarming sentence:
All the weights are set to zeros. Then, the network is trained using a set of input and output vectors.
that should be randomized, or else all neurons will be adjusted the same.
UPDATE_WEIGHTS
Update the network weights, rather than compute them from scratch. In the latter case the weights are initialized using the Nguyen-Widrow algorithm.
In this specific example - yes, just flat input data for now.
Not that I’m aware of at least. In case of CNNs it’d depend on actual network configuration but ml gives no such tools, only an activation function and learning conditions.
Randomized as in not all the inputs to be the same? I tried it with actual 20x20 monochrome images (hence 400 input neurons) but was getting equally strange results. So I decided to try in on this more simple data instead to confirm it’s working as it should.
Thanks. Tried it now, but got even more confused: all scales and weights are now -0. which results in all 0 outputs.
that would mean - you’d want a regression, not a classification
(like: ‘learn’ the input function, and produce similar output)
(and sadly, this isn’t possible with ANN_MLP, since it would need a linear activation for the last layer, and you can’t set activation funcs per layer independently)
So regression as in “take 10 smartphone parameters - give a single value rating how good it is”? Or no?
UPD: Does it mean a simple neural network with about equal structure, but different activation functions for different layers might handle this task?
UPD2: Wait, so I’ve remembered I was making an even more simple MLP (also for image classification (if I understand this term correctly — it had to recognize handwriting) back in the university, and it had 2 layers with linear activation. Might it be that a hidden layer is messing up result in this specific case?
UPD3: Apparently not. With just 2 layers it’s back to the same result regardless of activation type and
well, IDK how ANN_MLP training is actually implemented.
in a dense layer, if the weights of one “neuron” are equal to the weights of another, they behave the same and any updates from training will affect them the same, so they’ll end up being twins forever… and if the whole layer’s weights are initialized the same, the whole layer would be near worthless because it’s equivalent to a single neuron.
there has to be some randomness that affects neurons individually. it’s either in the initialization or somewhere in the training.
stochastically picking training data for a batch may be random but it wouldn’t cause neurons to differentiate.
maybe the docs lie and the weights aren’t initialized to all 0
maybe they inject some randomness during training, in the right places
just speculating.
since you showed one output that looks kinda one-hot, maybe the network did train decently, but there’s some issue with indexing? you said inputs didn’t match the outputs you expected… but you did see outputs always looking somewhat one-hot, or did you see any other patterns?
It turned out to be not very helpful unfortunately. It uses a dataset not of actual images, but a dataset with 17 parameters of images such as total number of pixels of an object (letter) or different means and correlations you have to figure out from an image yourself.
I’ve tried to predict with a single [400x1] vector as well (the last one to be precise) - to get also 1-dimensional ([10x1]) output. And I’m getting the same data.
Other variation is the same pattern (0, 2…9 are the same, 1 is different), but with those 2 values switching places.