if those two pictures you posted match with 0.95 or better, but you don’t want them to match, that pair’s actual matching score should be a lower bound on the threshold you should use.
find some pictures where your eyes say it’s barely a match, check how well that scores. that’s the threshold I would pick. I suspect a proper match would score quite a bit closer to 1.0
the situation might even require you to express the score in terms of difference/dissimilarity instead of similarity, meaning by how much they differ… because you might need to say 1% or 0.1% or 0.01%, and expressing that as 99%, 99.9%, 99.99% is just silly.