Ambiguity problem in the detection of the ArUco marker when it is parallel to the camera plane

System information (version)
  • OpenCV => 4.5.5
  • Operating System / Platform => Windows 10, 64 Bit
  • Compiler/IDE=> Pycharm 2021.3.2
  • Language => Python 3.9
Detailed description

I am developing a vision system based on ArUco visual markers. The ArUco library used is “cv2.aruco.DICT_4x4_50” and the marker I have used is 2 cm on each side.

I have made the calibration using a 3.6 cm board for each square, obtaining results that I consider satisfactory.

The problem appeared when I started to validate the results provided by the vision system. To perform the validation, a UR10e collaborative robot has been used, in which the ArUco marker has been strategically placed on the tool connected to it. In this way, the robot tool is positioned in a known spatial position and, using a Logitech C525 camera, the marker is located and its position and orientation with respect to the camera is calculated (actually the position of the tool is calculated, for this the reference system is transferred from the center of the aruco to the center of the tool using homogeneous transformation matrices).

The problem I have is that when the tool and its marker are placed just in front of the camera, i.e. without any rotation in X or Y, I seem to have an ambiguity problem because the vision system returns a rotation value in Y, as you can see in the following example (Example 1):

[X,Y,Z,Rx,Ry,Rz] Real= [0mm, 0mm, 250mm, 0deg, 0deg, 0deg, 20deg]

[X,Y,Z,Rx,Ry,Rz] of the vision system= [-0.26mm, 0.58mm, 254.2mm, 0.64deg, -7.9deg, 19.23deg]

At first I assumed it was an offset error introduced when defining the camera reference system, but when I rotate the Y, the result I get is correct, as in the following example (Example 2):

[X,Y,Z,Rx,Ry,Rz] Real= [0mm, 0mm, 250mm, 0deg, 30deg, 20deg]

[X,Y,Z,Rx,Ry,Rz] of the vision system= [-0.088mm, 0.31mm, 255.64mm, 2.05deg, 28.3deg, 19.45deg]

Example (1 and 2) Images Here

In short, how can I solve the ambiguity problem for cases where my ArUco is in front of the camera, without any X or Y rotation? Is it really an ambiguity problem for using a single small marker or is there some other problem?

Steps to reproduce

The following is the code of the class used for the detection of the tool containing the ArUco marker. The program is executed and the results of the vision system are displayed by console until the user stops the program with CTRL+C:

class SIROMDetector:
#Metodo constructor, encapsula atributos
def __init__(self,calibrationFile,CameraID,ArucoDict,ArucoLen):
    #Atributos de la clase
    self.calibrationFile=calibrationFile
    self.CameraID=CameraID
    self.ArucoDict=ArucoDict
    self.ArucoLen=ArucoLen
    self.CamMtx, self.CamDst= self.CalibrationParams() #Obtengo los parametros de calibracion
    self.FPS=0

#Metodos de la clase detectora de los marcadores Aruco
def CalibrationParams(self):
    """
    Funcion que permite obtener los parametros caracteristicos de la camara tras
    su correspondiente calibracion mediante el uso de un tablero de ajedrez.
    :return: Matriz y parametros de distrorsion de la camara
    """
    try:
        with open(str(self.calibrationFile)+"/data/ParametrosCalibracion.yml", "r", encoding="utf8") as infile:
            InformacionCamara = yaml.full_load(infile)
        # Camera matrix
        mtx = np.zeros(shape=(3, 3))
        for i in range(3):
            mtx[i] = InformacionCamara[0]["MatrizCamara"][i]
        # Distorsion parameters of the camera
        dist = np.array([InformacionCamara[1]["Distorsion"][0]])
        return mtx, dist
    except:
        print("ERROR. No se han podido obtener los parametros de la camara")
        mtx = np.zeros(shape=(3, 3))
        dist=[0,0,0,0,0]
        return mtx,dist

def Vector2MTH(self, rvec, tvec):
    """
    Metodo que convierte un vector compuesto de [X,Y,Z,Rx,Ry,Rz] en una matriz de transformacion
    homogenea de 4x4
    :return: Matriz de transformacion homogenea
    """
    R,_=cv2.Rodrigues(rvec)
    MTH=np.array([[R[0][0], R[0][1], R[0][2],tvec[0][0][0]],
                  [R[1][0], R[1][1], R[1][2],tvec[0][0][1]],
                  [R[2][0], R[2][1], R[2][2],tvec[0][0][2]],
                  [      0,       0,       0,           1]], dtype=float)
    return MTH

def MTH2Vector(self, MTH):
    """
    Metodo que convierte una matriz de transformacion homogenea de 4x4 en un vector compuesto de [X,Y,Z,Rx,Ry,Rz]
    :return: vector de rotacion y de translacion asociados a la MTH de entrada
    """
    R=np.array([MTH[0, 0:3], MTH[1, 0:3],MTH[2, 0:3]])
    rvec,_=cv2.Rodrigues(R)
    tvec=np.array([MTH[0, 3], MTH[1, 3],MTH[2, 3]])
    return rvec, tvec

def DetectMarker(self):
    """
    Funcion que permite detectar marcadores ArUco tras obtener los pareametros asociados
    a la camara
    :return: vector de desalineamientos [X,Y,Z,Rx,Ry,Rz] que determinan la pose del SIROm
    respecto de la camara
    """
    cam=cv2.VideoCapture(self.CameraID,cv2.CAP_DSHOW)
    #Configuro la camara para la captura del objetivo
    cam.set(3, 1280)  # width --> Anchura
    cam.set(4, 720)  # height--> altura
    cam.set(39, 0) #Autofocus OFF
    start=0
    stop=0
    try:
        while True:
            ret, frame =cam.read()
            #height,width=frame.shape[:2]
            Frame_procesado=self.Preproces(frame)
            if ret != True:
                print("NO HA SIDO POSIBLE RECIBIR NINGUN FRAME DE LA CAMARA")
                break
            else:
                ArucoDict = cv2.aruco.Dictionary_get(self.ArucoDict)
                ArucoParams = cv2.aruco.DetectorParameters_create()
                ArucoParams.cornerRefinementMethod=1
                esquinas, ids, rechazados = cv2.aruco.detectMarkers(Frame_procesado, ArucoDict, parameters=ArucoParams,cameraMatrix=self.CamMtx, distCoeff=self.CamDst)
                if len(esquinas)>0: #En caso de que como minimo 1 marcador detectado
                    for i in range(0,len(ids)):
                        #Detecto el marcador y convierto el resultado en el deseado
                        rvec_Aruco_respect_Camara, tvec_Aruco_respect_Camara, ptosMarcador=cv2.aruco.estimatePoseSingleMarkers(esquinas[i],self.ArucoLen,self.CamMtx,self.CamDst)
                        cv2.aruco.drawDetectedMarkers(frame, esquinas)
                        MTH_Aruco_respect_Camara=self.Vector2MTH(rvec_Aruco_respect_Camara,tvec_Aruco_respect_Camara)
                        MTH_SIROM_respect_Aruco = np.array([[1, 0, 0,     0],
                                                            [0, 1, 0, -0.033],
                                                            [0, 0, 1,     0],
                                                            [0, 0, 0,     1]], dtype=float)
                        MTH_SIROM_respect_Camara = np.dot(MTH_Aruco_respect_Camara, MTH_SIROM_respect_Aruco)
                        rvec_SIROM_respect_Camara, tvec_SIROM_respect_Camara=self.MTH2Vector(MTH_SIROM_respect_Camara)

                        #Calculo FPS de la camara
                        stop=time.time()
                        self.FPS=1/(stop-start)
                        start=stop
                        fps="FPS: "+str(int(self.FPS))

                        #Dibujo los ejes de coordenadas del SIROM, el nº de FPS...
                        cv2.aruco.drawAxis(frame, self.CamMtx, self.CamDst, rvec_SIROM_respect_Camara,tvec_SIROM_respect_Camara, self.ArucoLen)
                        cv2.putText(frame, fps, (3, 25), cv2.FONT_HERSHEY_SIMPLEX, 1, (100, 255, 0), 3, cv2.LINE_AA)
                        #Dibujo lineas para marcar el centro del frame
                        # cv2.line(frame, (int(width / 2),0), (int(width / 2),height), (255, 0, 0),5)
                        # cv2.line(frame, (0, int(height/2)), (width,int(height/2) ), (255,0,0),5)
                        cv2.imshow("SIROM detectado", frame)
                        cv2.waitKey(1)

                        #Roto 180º para que los ejes coincidan pasivo-activo
                        Rotacion_X_180 = np.array([[1, 0, 0,  0],
                                                   [0, -1, 0, 0],
                                                   [0, 0, -1, 0],
                                                   [0, 0, 0, 1]], dtype=float)
                        MTH_SIROM_respect_Camara_mod = np.dot(MTH_SIROM_respect_Camara, Rotacion_X_180)
                        rvec_SIROM_respect_Camara_mod,tvec_SIROM_respect_Camara_mod=self.MTH2Vector(MTH_SIROM_respect_Camara_mod)

                        #Convierto a mm y degree para visulizar el resultado por consola
                        rvec_SIROM_respect_Camara_mod = [item for l in rvec_SIROM_respect_Camara_mod for item in l]
                        rvec_SIROM_respect_Camara_mod = [rvec_SIROM_respect_Camara_mod[0] * 180 / mt.pi,rvec_SIROM_respect_Camara_mod[1] * 180 / mt.pi,rvec_SIROM_respect_Camara_mod[2] * 180 / mt.pi]

                        # Muestro los resultados obtenidos por consola
                        print("\nLa posicion del SIROM con ID:" + str(ids[i]) + " respecto de la camara es:")
                        print("Posicion [Tx, Ty, Tz] (mm)]: " + str([tvec_SIROM_respect_Camara_mod[0]*1000,tvec_SIROM_respect_Camara_mod[1]*1000,tvec_SIROM_respect_Camara_mod[2]*1000]))
                        print("Orientacion [Rx, Ry, Rz] (deg)]: " + str(rvec_SIROM_respect_Camara_mod))
                else:
                    #Calculo FPS de la camara, dibujo los ejes de coordenadas, el nº de FPS...
                    stop = time.time()
                    self.FPS = 1 / (stop - start)
                    start = stop
                    fps = "FPS: " + str(int(self.FPS))
                    cv2.putText(frame, fps, (3, 25), cv2.FONT_HERSHEY_SIMPLEX, 1, (100, 255, 0), 3, cv2.LINE_AA)
                    cv2.imshow("SIROM detectado", frame)
                    #cv2.imshow("SIROM detectado - CLAHE", Frame_procesado)
                    cv2.waitKey(1)
                    print("NO SE HA DETECTADO NINGUN SIROM")

    except KeyboardInterrupt:
        # Retorno el vector [X,Y,Z,Rx,Ry,Rz]
        return [tvec_SIROM_respect_Camara_mod[0], tvec_SIROM_respect_Camara_mod[1],
                tvec_SIROM_respect_Camara_mod[2], rvec_SIROM_respect_Camara_mod[0][0],
                rvec_SIROM_respect_Camara_mod[1][0], rvec_SIROM_respect_Camara_mod[2][0]]

def Preproces(self, frame, clip_hist_percent=1):
    """
    Metodo encargado de realizar el preprocesamiento de los frames de la camara para asi, poder mejorrar la deteccion
    del SIROM junto con los marcadores visuales que contiene el mismo.

    Con alpha se varia el contraste y con beta el brillo del frame
    :return:
    """
    #Mejora del contraste y brillo de cada frame al mismo tiempo
    gray=cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    # Calculate grayscale histogram
    hist = cv2.calcHist([gray], [0], None, [256], [0, 256])
    hist_size = len(hist)
    # Calculate cumulative distribution from the histogram
    accumulator = []
    accumulator.append(float(hist[0]))
    for index in range(1, hist_size):
        accumulator.append(accumulator[index - 1] + float(hist[index]))
    # Locate points to clip
    maximum = accumulator[-1]
    clip_hist_percent *= (maximum / 100.0)
    clip_hist_percent /= 2.0
    # Locate left cut
    minimum_gray = 0
    while accumulator[minimum_gray] < clip_hist_percent:
        minimum_gray += 1
    # Locate right cut
    maximum_gray = hist_size - 1
    while accumulator[maximum_gray] >= (maximum - clip_hist_percent):
        maximum_gray -= 1
    # Calculate alpha and beta values
    try:
        alpha = 255 / (maximum_gray - minimum_gray)
    except ZeroDivisionError:
        alpha = 255
    beta = -minimum_gray * alpha
    img_resultado = cv2.convertScaleAbs(frame, alpha=alpha, beta=beta)

    #Filtrado de ruido (filtros de media, mediana...)
    img_resultado_filtrada= cv2.bilateralFilter(img_resultado, 5, 200, 200)
    # cv2.imshow("FRAME PREPROCESADO", img_resultado_filtrada)
    # cv2.waitKey(10000)
    #Convierto la imagen procesasa a GRAY
    img_gray=cv2.cvtColor(img_resultado_filtrada,cv2.COLOR_BGR2GRAY)
    return img_gray

Using this code, you could instantiate the class and test its performance for detect an ArUco marker.

Notes
  • Used resolution → 1280x720
  • AutoFocus and Auto White Balance → disabled
  • CornerRefinementMethod ->1
  • I have read the OpenCV documentation but I did not found any detailed solution to this issue.

If you need more detailed information, please do not hesitate to ask for it.

I wouldn’t call it ambiguity.

it’s a numerical issue. a nearly orthographic view simply doesn’t contain the information to estimate the rotation properly. the sensor’s pixels contain some noise, which will make the rotation flipflop between various plausible solutions. without sensor noise, it’d just be stuck in one solution, which might not be close to correct either.

you’ll get “orthographic” views for two situations:

  • frontal view, so the square’s sides look parallel, no matter how close you are to the square
  • square covers small angle of the view (i.e. square size divided by distance, no matter the camera’s focal length), because the square’s sides also look parallel (no foreshortening observable)

there’s no way to fix that. there might be ways to detect it and treat the situation, but you’d have to dig into the math behind it all. not trivial.

First of all, thank you for your quick response.

Could you please provide me with some documentation so that I can find a solution to this problem?

In case it is a numerical issue, I have thought of correcting it by trial and error but, when I modify the position of the tool after such correction, I still have the mentioned error.

Thanks in advance!