estimateAffinePartial2D Vs. skimage similarity transform

Might be a newb question but would appreciate any inputs.

I am trying to see how to replace the scikit image library function to estimate a similarity transform and found the estimateAffinePartial2D. It runs the estimate twice as fast as skimage but the result isn’t matching. I dug into the code and found that it only uses the first two points of the input/destination matrix which explains why, changing coordinates of point 3-5 doesn’t do anything why the transform is almost always wrong. The skimage function seems to calculate the similarity matrix for all points.

The transform I am trying to get to is a 5 points facial landmark alignment. Are there better functions available in opencv?

what exactly are you doing? please provide code and data, a “minimal reproducible example”.

The transform I am trying to get to is a 5 points facial landmark alignment.
Meaning I am trying to generate a transform using all 5 points of the input and dest.

Code:

import cv2
import numpy as np
from skimage import transform as trans

REFERENCE_FACIAL_POINTS = [
    [30.29459953, 51.69630051],
    [65.53179932, 51.50139999],
    [48.02519989, 71.73660278],
    [33.54930115, 87],
    [62.72990036, 87],
]

Test_FACIAL_POINTS = [
    [191.29459953, 330.69630051],
    [362.53179932, 341.50139999],
    [290.02519989, 414.73660278],
    [206.54930115, 500],
    [360.72990036, 507],
]
DEFAULT_CROP_SIZE = (96, 112)
face_size = (112, 112)
def trans1():
    tfm, _ = cv2.estimateAffinePartial2D(np.array(Test_FACIAL_POINTS), REFERENCE_FACIAL_POINTS)
    face_img = cv2.warpAffine(pic, tfm, face_size)
    return face_img
def trans2():
    M = trans.estimate_transform("similarity", np.array(Test_FACIAL_POINTS), REFERENCE_FACIAL_POINTS)
    face_img = cv2.warpAffine(pic, M.params[0:2, :], face_size)
    return face_img

box1 = trans1()
box2 = trans2()

Loading any random pic and applying both transforms, comparing the results, you will see that there is a problem. Playing with the Test points, the skimage is consistently correct and the cv2 version is consistently incorrect to various degree depending on the input. Both scale and rotation are wrong and don’t correspond to the destination except for the first 2 points.

The comment I am seeing from the function code on github and the code itself is consistent: “only two points are needed to estimate the transform”. Incorrect if input has more than two points.

I’d say you should convert the second one to np.array as well

since you have 5 points, the equation is overdetermined, so it’s a minimization problem… just thought I should mention that.

#!/usr/bin/env python3

import cv2
import numpy as np
from skimage import transform as trans

np.set_printoptions(suppress=True)

REFERENCE_FACIAL_POINTS = np.float32([
    [30.29459953, 51.69630051],
    [65.53179932, 51.50139999],
    [48.02519989, 71.73660278],
    [33.54930115, 87],
    [62.72990036, 87],
])

Test_FACIAL_POINTS = np.float32([
    [191.29459953, 330.69630051],
    [362.53179932, 341.50139999],
    [290.02519989, 414.73660278],
    [206.54930115, 500],
    [360.72990036, 507],
])
def trans1():
    tfm, _ = cv2.estimateAffinePartial2D(Test_FACIAL_POINTS, REFERENCE_FACIAL_POINTS)
    return tfm

def trans2():
    M = trans.estimate_transform("similarity", Test_FACIAL_POINTS, REFERENCE_FACIAL_POINTS)
    return M

box1 = trans1()
box2 = trans2()

print(box1)
print(box2.params)

this gives me roughly comparable matrices. a few pixels of difference in location (third column), but rotation/scale look about the same.

[[  0.19772419  -0.00041311  -7.62065623]
 [  0.00041311   0.19772419 -13.03733067]]
[[  0.20409418   0.00164185 -10.26214161]
 [ -0.00164185   0.20409418 -15.22172787]
 [  0.           0.           1.        ]]

Thanks, I have tested this both ways and actually removed the numpy conversion because it made no difference to the result while adding an overhead. Try distorting the facial point a little especially the first or second points and you will see major differences. Applying it on actual face pictures gives completely unaligned faces with opencv while skimage look very much aligned. The root cause is really the execution of the function only considering the first two points.

Edit: I double checked my code and indeed I had the second term (dest) converted to np.array. It was a typo from my copy paste as I was trying to reproduce a simple code. Sorry about this. It makes no difference to the outcome though. The opencv code is not accurate due to the alignment only using 2 points.

is that so? I think what you’re seeing is a consequence of estimateAffinePartial2D using RANSAC (default). pick the LMEDS method or set ransacReprojThreshold higher to force more/all points to go into the calculation.

I would like to try but I don’t know how to. Somehow the method parameter is coded to take an int. When I input “method=LMEDS” the function just fails.

the function doesn’t fail. python complains that it doesn’t know that identifier.

you pass cv2.LMEDS. it’s a constant defined in OpenCV.

Thanks. That’s what I meant. The function fails due to not identifying the entry. This is not very intuitive. And problem remains the same. The function only considers the first two points and the faces are grossly misaligned once any of the points beyond the first two are even slightly distorted. I have resolved to go back and use the skimage library.