I do not know their x,y values and do not know how to get them.
I was thinking of getting them then calculate the distance using √[(x₂ - x₁)² + (y₂ - y₁)²] .
you probably have to fit your query points
(no matter if those are from contours or harris)
to a model, e.g. your hand drawn points in the edge image above.
once you know, which point is (2) and which is (3), you can measure.
since your query image (let’s pretend, it’s the one with the harris points) shows perspective / pose distortion, you will need to transform(warp) it into “model space” first
last, to get “real world unit” measurements, you need to calibrate the camera.
since you cannot do this, if your input is from arbitrary cameras/persons, it might need a reference measure in the image (some ruler) instead.