1-dimensional k-means clustering c++

I have a 56x1 vector of doubles avg_intensities_double (range: 0-255) and I want to do k-means clustering to group the values. I use the kmean function from opencv . Here is my code:

Mat labels, new_centers;

vector<double> avg_intensities_double(avg_intensities.begin(), avg_intensities.end());

Mat points(avg_intensities_double.size(), 1, CV_32F);

memcpy(points.data, avg_intensities_double.data(), avg_intensities_double.size()*sizeof(uchar));

auto compactness = kmeans(points, 10, labels, TermCriteria(TermCriteria::EPS+TermCriteria::COUNT, 10, 1.0), 10, KMEANS_RANDOM_CENTERS, new_centers);

cout << "labels: " << labels.rows << " x " << labels.cols << '\n' << endl;

for (int i=0; i<labels.rows; i++){
    cout << labels.at<int>(i,0) << endl;
}

cout << "new_centers: " << new_centers.rows << " x " << new_centers.cols << '\n' << endl;

for (int i=0; i<new_centers.rows; i++){
    cout << new_centers.at<double>(i,0) << endl;
}

I create the input matrix points and then I copy the values from the vector into the matrix. The result is a 56x1 matrix labels (with the labels of the cluster that each value belongs to) and a 10x1 matrix new_centers (with the final center values of the clusters).

When I print the resulted new_centers matrix I get these values:

new_centers: 10 x 1

7.47421e-238
3.40282e+38
-2.68156e+154
109
103
102
101
5.33389e-315
82.5
1.17119e+166

These are not correct. I expect values in the range 0-255 and not too close to each other, since they are centers of the clusters.

What am I doing wrong here? Is this the right way to do 1-dimensional clustering with k-means? I found one example for points (x,y) clustering but the same approach doesn’t work in my case. Any feedback is appreciated.

I use Ubuntu 18.04 dual boot, c++11, opencv 3.2.0 and my code runs as a ROS melodic node.

you are mixing 32 bit and 64 bit floats and memcpy. do you see why that could cause trouble?

use a debugger, look at all the values after every step

the centers are probably float, not double, too.

@crackwitz you were right. I changed my initial vector from a vector of doubles to a vector of floats. Then I removed the memcpy function and I added the data directly in the input matrix like this:
Mat points(avg_intensities_float.size(), 1, CV_32FC1, avg_intensities_float.data());
and I changed the new centers vector to a vector of floats as well, as @berak suggested. Now the values of my centers make sense. The compactness value now is way too high (14900.9) but I guess this is irrelevant with this topic. If I get it right, the cause of the problem was using floats (32 bit float) and doubles (64 bit float) in the memcpy function. Since they capture different size in memory, the copied values were messed up. Is that correct?

yep :wink:

it’s also a good idea, to avoid the at() operation, wherever you can, e.g.

cout << centers.type() << " " << centers.size() << " " << centers << endl;

(you can print the Mat as a whole !)

Good to know. Thanks!