Face recognition - Sface match result evaluation

I’m developing a face recognition test case based on OpenCV, the code is inherited from OpenCV: DNN-based Face Detection And Recognition.
The match similarity thresholds are set:

cosine_similarity_threshold = 0.363
l2_similarity_threshold = 1.128

It means, two faces have the same identity (accuracy 99.80%) if the cosine distance is greater than or equal to 0.363 (max = 1), or the normL2 distance is less than or equal to 1.128 (min = 0).


Is there a way to evaluate, as % of accuracy, a match score with cosine distance < threshold i.e. different identity ?
Let’s say the match returns a score = 0.300, what is the accuracy : 95, 90, 80… %

PS: I did contact the team who trained and published the recognition model, but no concrete answer so far. The ROC curve of the training could help…

Any help much appreciated.


for the sake of argument – ‘accuracy’ does not apply to a single measurement.
you can calculate an ‘accuracy’ for a model, by making N measurements (on a labelled test set), and taking the ratio of correct predictions vs N

Thank you berak.
Indeed, my wording ref. accuracy is not correct.
Let me rephrase : “Let’s say the match method returns a cosine score = 0.300, what is the face similarity : 95, 90, 80… %”
I guess without the ROC (TPR / FPR), it will be not possible to extrapolate.
Any clue about next step ?

imo, you’re chasing a chimera there, and there is no direct, (or even linear !) connection between a distance value and a ‘face similarity’

at least distances have a meaning defined by the feature space. it’s still not intuitive, however…

“percentage similarity” has no usable definition. the notion only exists because people think everything can be expressed in “percentages”.

if you could break faces down into features that match or don’t match, you could calculate some percentage. for that you’d need to come up with such features first.

if you’d approach this as “what are the chances that it truly is the same person”, you’d have a sensible definition, but then you’d have to calibrate that against your data and the set of people you want to tell apart. that may include “the whole world”.