Homography having only 2 aruco markers as a reference

Hello everyone,

I’m just a newbie in computer vision but I try to improve myself :slight_smile:
I’m currently facing a problem and I don’t know how to deal with it.
I have a target picture that I want to project (homography) on a paper board containing aruco markers. (just something like what can be seen on this site https://learnopencv.com/augmented-reality-using-aruco-markers-in-opencv-c-python)
However I have a strong constraint that only allows me to have 2 markers instead of 4.
One of the marker is at the top left corner and the other one is somewhere between the 2 right corners of the ABCD frame I want to project to (see picture below)

What kind of operation should I use to get my 4 frame points in the camera preview from the markers ref points knowing that I of course know the distance between the markers and the points A, B, C, D on my sheet of paper?
Hope my question is clear … :crazy_face:
Thanks in advance

The homography function needs the correspondence of 4 points.
The linked tutorial uses the first corner for every marker to get the 4 points. However if you take all the 4 corners, you can obtain a total of 8 corners from the 2 markers. That should be enough to compute the homography. You just have to know the exact position of the marker edges w.r.t. the ABCD points.
The downside of this approach is that the corners (of a marker) are quite close, so the corner detection errors will have more effect on the result.

Thanks kbarni for your answer!
You’re right, I did not consider using all the whole 8 points of my 2 markers to get a better result.
However, my problem is more how to get A, B ,C, D (the destination points for my homography) knowing my 8 points when I move the sheet of paper in front of the camera.
I guess there should be a function that allows to compute A, B, C, D by taking into account the axis of the detected markers.
Of course I know the distance between the markers and the A, B, C, D points on my sheet of paper but I need the A, B, C, D values w.r.t. the markers in the camera picture.
Hope you understand what I mean …

The function you are looking for is findHomography().

Detecting ArUco markers and extracting positions of 8 corners in image you get imagePoints. Knowing the actual position of corners on plane (world coordinates) you get objectPointsPlanar.

Mat H = findHomography(objectPointsPlanar, imagePoints);

Using output matrix H, witch contains homography transform, in combination with known positions of ABCD corners (world coordinates) objectPointsABCD, you get positions of ABCD points in image.

perspectiveTransform(objectPointsABCD, imagePointsABCD, H);

Also remember that this solution only applies on planar objects (Z == 0). If you want to place ArUco markers or ABCD corners in different Z, you would need to use solvePnP() function in combination with projectPoints(). In that case you will also need to calibrate the camera.

Thanks FilipBaas!!
This indeed seems to be what I’m looking for.
I suppose I only have to provide 4 points (out of the 8 I have from markers) to the findHomography function for objectPointsPlanar right?
something like:
np.array([[x1,y1], [x2,y2], [x3,y3], [x4,y4]])
with 1-2 linked to marker1 and 3-4 linked to marker2 …
By the way, imagePoints are defined in pixel but what should it be for objectPointsPlanar?
Provided the markers are perfectly aligned with [AB] segment, would it be too risky and inaccurate to only use a single marker?

findHomography() is able to calculate transformation with any number of points. The more points you use, the better will the result be. So do not worry about using both ArUco markers, they will make detection more robust and accurate.

objectPointsPlanar can be defined in any units you prefer (mm, km, inch, miles) and the result of projectiveTransform() will be in same units.

OK thanks!
I’m bit ashamed to say that but I was not able to correctly give input points to the findHomography function.
I tried different things …

to define my planar points corresponding to the markers on the sheet of paper I tried this:

realWorldMarkerCorners = np.array([[0, 0], [0, 30], [30, 30], [30, 0]])

to get my image points I tried this:

markerCorners, markerIds, rejectedCandidates = cv2.aruco.detectMarkers(frame, dictionary, parameters=parameters)
index = np.squeeze(np.where(markerIds==85))
markerCorners = np.squeeze(markerCorners[index[0]])

but I got errors returned by the function:
" Bad argument (The input arrays should be 2D or 3D point sets) in findHomography"

I should definitely go deeper in numpy arrays learning :thinking:

By the way, as my ABCD can be on the left side of the marker, can I define negative points for the planar points of my rectangle?


realWorldMarkerCorners.shape = (-1, 1, 2)

C++ API cv::Mat commonly has to be a column vector of multi-channel elements. this will give it a column vector shape and a 2-channel element appearance, when translated from numpy into the cv::Mat that’s expected by the OpenCV function.

-1 rows for “fill this in”, 1 for one column, 2 “channels”.

OpenCV is a bit peculiar there.

Hi crackwitz,

you were right, this solved my problem!! :clap:
I did not understand the whole thing but it works.
I finally decided to use 2 markers instead of 1 and it is much stable now.

So to sum up, in my case I use:

  1. findHomography to get matrix1 for “real world markers” to “image markers”
  2. perspectiveTransform to get the projection of “my real world ABCD” to “image world ABCD”
  3. findHomography again to get matrix2 for “my picture to project” to “image world ABCD”
  4. warpPerspective to apply my picture to the current frame based on matrix2

I did not think I would need 2 x findHomography but I don’t see how to do it differently …

However, I encounter 2 new problems now: :pensive:

  1. the frame rate is just horrible (probably due to a huge processing time for each frame)
  2. the picture that is being projected in warpPerspective (which is a PNG) is completely chopped, just as if resolution was very low)

Just a precision for point 1
I use a raspi 3B in console mode (to lower CPU usage) and python with “picamera” package for preview and overlays and opencv for image processing of course

Hi there,

Any updates on that please?
I’m still stuck on my problems described in my previous post :confused:

Thanks :wink:

what updates do you expect? you have to give (e.g. runnable code) before you can receive. mere descriptions of code are insufficient.

Yes of course I understand, it will be easier with the code :grinning:
Here it is:

import cv2
import argparse
import numpy as np
import picamera
import picamera.array
import time

#------------------ GLOBALS---------------------------
# define image to be wrapped in real time
im_src = cv2.imread("image.png")

# load the aruco dictionary
dictionary = cv2.aruco.Dictionary_get(cv2.aruco.DICT_4X4_250)
# initialize detector params
parameters =  cv2.aruco.DetectorParameters_create()

# marker corners coordinates on paper (real world)
realWorldMarkerCorners = np.float32([   [0, 0], [30, 0], [30, 30], [0, 30],  \
                                        [125, 40], [155, 40], [155, 70], [125, 70]  ])

# frame corners coordinates on paper (real world) 
realWorldFrameBoardCorners = np.float32([[-2, 34], [158, 42], [158, 84], [-2, 92]])

#------------------ FUNCTIONS---------------------------

def process_frame(frame, img_to_process):
    """Detect markers and wrap"""
    global realWorldMarkerCorners
    global realWorldFrameBoardCorners


        # detect the markers in the image
        markerCorners, markerIds, rejectedCandidates = cv2.aruco.detectMarkers(frame, dictionary, parameters=parameters)

        imageMarkerCorners = np.asarray(markerCorners)

        # tips from crackwitz (ocv forum)
        realWorldMarkerCorners.shape = (-1, 1, 2)
        realWorldFrameBoardCorners.shape = (-1, 1, 2)

        # marker with Id 85
        index1 = np.squeeze(np.where(markerIds==85))

        # marker with Id 10
        index2 = np.squeeze(np.where(markerIds==10))

        Marker85_UpperLeft_Point    =   np.squeeze(markerCorners[index1[0]])[0]
        Marker85_UpperRight_Point   =   np.squeeze(markerCorners[index1[0]])[1]
        Marker85_LowerRight_Point   =   np.squeeze(markerCorners[index1[0]])[2]
        Marker85_LowerLeft_Point    =   np.squeeze(markerCorners[index1[0]])[3]

        Marker10_UpperLeft_Point    =   np.squeeze(markerCorners[index2[0]])[0]
        Marker10_UpperRight_Point   =   np.squeeze(markerCorners[index2[0]])[1]
        Marker10_LowerRight_Point   =   np.squeeze(markerCorners[index2[0]])[2]
        Marker10_LowerLeft_Point    =   np.squeeze(markerCorners[index2[0]])[3]

        # dest points
        pts_dst =   [Marker85_UpperLeft_Point] + [Marker85_UpperRight_Point] + [Marker85_LowerRight_Point] + [Marker85_LowerLeft_Point] \
                +   [Marker10_UpperLeft_Point] + [Marker10_UpperRight_Point] + [Marker10_LowerRight_Point] + [Marker10_LowerLeft_Point]
        pts_dst_m = np.asarray(pts_dst)
        # 1st homography : get matrix1 for “real world markers” to “image markers”
        matrix1, status = cv2.findHomography(realWorldMarkerCorners, pts_dst_m)

        # get the projection of “my real world frame" to “image world frame"
        imagePointsABCD = cv2.perspectiveTransform(realWorldFrameBoardCorners, matrix1)

        # source points
        pts_src = [[0,0], [img_to_process.shape[1], 0], [img_to_process.shape[1], img_to_process.shape[0]], [0, img_to_process.shape[0]]]         
        pts_src_m = np.asarray(pts_src)

        # 2nd homography : get matrix2 for “my picture to project” to “image world frame"
        matrix2, status = cv2.findHomography(pts_src_m, np.asarray(imagePointsABCD))

        # warp source image to destination based on homography
        warped_image = cv2.warpPerspective(im_src, matrix2, (frame.shape[1],frame.shape[0]))
        # return the processed image
        return warped_image
    except Exception as inst:
        return None

def take_picture(cam):
    Returns an openCV compatible colour image 
    with picamera.array.PiRGBArray(cam) as stream:
        cam.capture(stream, format="rgb", use_video_port=True)
        return stream.array

def execute(flip=1):

    camera = picamera.PiCamera()
    camera.framerate = 30    

    if (flip == 1):
        camera.hflip = True

    # gives camera warm-up time
    frame = take_picture(camera)
    ovr = camera.add_overlay(frame, format="rgb", layer=3, alpha=64)
    if (flip == 1):
        ovr.hflip = True

        while True:
            frame = take_picture(camera)
            frame_out = process_frame(frame, im_src)

            if (frame_out is not None):

    except Exception as e:

#------------------ MAIN---------------------------

def main():
    # ---- ARGS parsing -----------
    ap = argparse.ArgumentParser(description='AR frame projection test')
    ap.add_argument("--flip", "-f", default=1,
        help="Flip both preview and overlay")
    args = vars(ap.parse_args())

    if (
        isinstance(args["flip"], str)
        and args["flip"].isdigit()
        args["flip"] = int(args["flip"])


if __name__ == "__main__":

warpPerspective() function, from my experience, is very performance demanding so RPi 3B may have some hard time with processing. The downscale of warped image might be solved by changing the interpolation used in warp function from default INTER_LINEAR to INTER_NEAREST.

Hi FilipBass,
I tried what you suggested but unfortunately it does not change anything in my case.
I’ll post a picture to show exactly what happens.
Concerning the processing issue, I can maybe try to get my “process_frame” function execute in a separate thread but I’m not sure it will change much. The best thing should probably be to port my code into a more “low level” language …

OK I found the problem … it was simply linked to the camera resolution which was set to 640 x 480 by default :wink:
However, now that my resolution is higher, processing time is increased and the frame rate is just not acceptable :disappointed_relieved:

a complete perspective warp costs some calculations. the GPU part of a computer would be the best processor to handle that.

if you do need this in realtime and are willing to invest some effort into the source code, that’s something to consider.

by the way, your code may have some but not much potential for optimization. you use a single perspective warp (potential for GPU… but effort) and two findHomography calls which involve RANSAC (so they can take a bit of time, depending on amount of input data).

if one of your findHomography calls (the second one?) consists of exactly four point pairs, you can use getPerspectiveTransform, which is a direct solution.

you could also do that to the other call, if you pick a good set of four pairs from the eight pairs of corners you have.

check how much this call really costs (also the other calls). use time.perf_counter() to take timestamps, calculate their difference. you need to see what’s actually taking how much time in your code.

I don’t know the state of OpenCL on a raspberry pi… OpenCV can use OpenCL for many things, if you use the UMat type. it’s a wrapper around Mat (or numpy array) that represents the data on a GPU. many OpenCV calls, specifically the perspective warp call, can take a UMat and return a UMat, which you can then unpack as a Mat/nparray (those live CPU-side). IF you have OpenCL available and OpenCV sees it and can use it (there are functions to check that), you could try this, but moving data to and from the GPU can cost you extra time. on a raspi it’s all the same memory so potentially it could cost you nothing at all.

Hi crackwitz,

I added timestamps as you suggested and here is the result:

Processing times for operations in process_frame
. detectMarkers => 0.11570285399966451
. findHomography#1 => 0.00202957300007256
. perspectiveTransform => 3.906299980371841e-05
. findHomography#2 => 0.00039276799998333445
. warpPerspective => 0.06075245000010909

Processing times for the infinite loop
. take_picture => 0.1219838689999051
. process_frame => 0.18347789700010253

It seems that the more time-consuming tasks are not homography but “detectmarkers” and “takepicture”.
Maybe it could be interesting to run “takepicture” in another thread …
What do you think? does it make sense?

that depends on whether the pi camera produces frames at a fixed frame rate or not.

if it does, the time you see is merely idle time until the next frame is actually produced.

to assess this, you’d determine the time for a complete loop. take the time in one place in the code, say between cam.capture and returning from take_picture, and check the difference. if that time is somewhat stable and consistent with a fixed frame rate, say 1/30 seconds… you have your answer.

cam.capture itself may take time, which you can only assess if there’s buffering, and if you delay before the call, so there’d be a full frame buffered already, so you don’t have to wait for the camera itself to finish exposing the frame first.

By taking a closer look at the Picamera API, I finally found a way to reduce the take_picture part!!
I fact, there’s a method called “capture_continuous” that produces continuous frames.
(as you said, I guess it continuously feeds a frame buffer)
So instead of calling “cam.capture”, I just have something like:

for frame in camera.capture_continuous(rawCapture, format=“bgr”, use_video_port=True):

The overall time of my infinite loop has now dropped from 0.30 to 0.17 :smiley:
If I could now reduce the “detectMarkers” part it would be perfect (I could may be check the DetectorParameters attributes of the detectMarkers method for that, but it does not seem to be that easy)
Anyway perfs are much better now.