findHomography inaccurate as it moves to left side of image

While I have you on the phone…is there any reason I should ever move to 5X5 instead of 4X4?



I have another observation that causes wide variations in calibrateCameraCharuco.

I had seen this previously when using a very old projector whose A/R was 4:3 and, apparently, the choice of resolutions I used to try and match with the computer caused the error returned from calibrateCameraCharuco to be over 3.0. I did not have time to go into that at the time so I just coded around having a high error…

Now I accidentally ran into the problem again.
On my test 2nd monitor the “recommended” resolution is 1600 X 900 so that is (16:9 A/R) divided = 1.777777

I normally have it set at 1280 X 720 which is also (16:9 AR) = 1.77777 and using that the calibrateCameraChAruco return error is approx. 0.3

However, during my recent testing with resolutions and dictionaries I was resetting the resolution many times. When I finally finished testing I accidentally reset the resolution of both primary and secondary monitors to 1280 X 768 which divided is 1.66666 and with that resolution calibrateCameraChAruco return error was 1.7 instead of the expected 0.3

So it seems that, if the resolution is not set to the recommended A/R, the calibrateCameraChAruco will have a hard time calibrating.

If I made the mistake I am sure many of my customers can as well. I don’t know how to find the default or “recommended” A/R but, even if I did, all I could really do is let them know that their selected resolution is not a desired one. Have you ever run into this?


The two reasons I know of for using a larger marker size:

  1. Larger dictionary size. For example if you needed to have many thousands of unique markers, you would be forced to use the 5x5 markers because there just aren’t enough unique IDs with a 4x4 marker.
  2. Improved false detection robustness. The larger / more complex the marker is, the less likely it is to be accidentally identified in the image. For example, I have used 3x3 markers and in some cases a region of the image has some structure that looks close enough to one of the markers in your dictionary, so you get a false positive. I think this is a lot less likely to happen with larger markers.

The main reason to not use a larger marker size:

  1. Each “bit” in the marker has to get smaller in order to accommodate a larger marker (assuming the total size of the marker stays the same). This makes it harder to detect the markers in the image, particularly if they are already small (low number of pixels in the camera image), under significant distortion, poor lighting or blurry.

For me the sweet spot is 4x4. I will use 3x3 in cases where size is constrained, but if I can “afford” the larger marker I will go with 4x4 to have a larger dictionary and better false positive performance.

A few things come to mind. When you run the monitor at a non-native aspect ratio it has to decide how to map the incoming image to the monitor. There are at least 3 ways to handle this and it depends on the specific monitor/projector (and might be user-selectable in some cases).

  1. The incoming image gets rendered at a uniform scale to fit your monitor. This means either some pixels will be discarded, or black bars will be rendered on the top/bottom (or left/right) of the image. In this case I would expect there to be some error due to resampling (making the location of the corners harder to accurately localize in the camera image) but I would not expect a 0.3 → 1.7 error change. Maybe, but that seems too big of a jump.

  2. Non-uniform scaling of the input image. In this case the image is scaled differently horizontally and vertically so that the monitor is filled / there are no black bar “dead zones”. Your image will be compressed or stretched in one dimension, but if the difference in aspect ratios isn’t too severe you might not notice it. I’m not totally sure how this non-uniform scale would affect the calibration process. My instinct is that it should be able to handle it, but I’m not sure if I trust my instinct in this case. If we were just talking about 2D->2D mapping using a homography, I feel pretty confident that you could include a non-uniform scale (on top of whatever the mapping was) and still be able to model it with a homogrpahy. If that’s true, I would expect a camera matrix + extrnsics to be able to model it too, but also you aren’t using the camera calibration process in a normal way. (Just a single image of the Charuco pattern, as I recall) This probably changes things in ways I don’t really understand, so it’s hard to know what to expect. If you are still doing the “multiple views” by showing rotated versions of the calibration target (on your monitor, which is now scaling things in a non-uniform way) I would say “all bets are off” - or at a minimum I would have to get my head into the problem a lot more to know what I would expect.

  3. In some cases, and I have seen this on projectors for sure - I’m not sure if it is common in monitors, the aspect ratio will be corrected not just with a non-uniform scale, but with a non-constant scaling. The center of the image gets mapped so it’s a uniform scale (X and Y are scaled the same), but as you get towards the edges of the image (either the left/right, or the top/bottom) you transition to a different scale in one dimension so you can fill up the whole display (no black bars). This is done so the central part of the display is uniformly scaled (so circles are still circles, and people don’t look tall & skinny / short & fat). This is probably most commonly applied in “cinema” modes on the projector - if you are watching a movie or similar, you can tolerate the strange non-uniform scaling pretty well for the benefit of things looking correct in the main part of the image. It’s probably obvious, but this kind of mapping will wreck your calibration scores.

If you want to figure out what you are dealing with, you could draw your detected corners to the images (as small radius circles, preferrably with subpixel rendering enabled) as well as the predicted locations of the circles. Usually I will draw both (with different colors) and then also an arrow that starts at the detected corner position and points in the direction of the predicted corner location. Usually I scale the length of the arrow by about 10x or so. This way a 1 pixel reprojection error will show up as a 10 pixel long arrow (sometimes I do it as high as 50x, it depends on the context etc.)

I find it really helpful to do this because it help you “see” if your error is structured somehow. If you notice that the arrows tend to point either up or down (say they point downward at the top of the image and get shorter as you get close to the vertical center of the image, and then point upward at the bottom of the image) and there isn’t much left/right component, that would probably indicate a non-uniform scaling.

I’ll see if I have some pictures I can share that illustrate what I mean.


Thanks. The docs never really explain the nomenclature. I assume that the “marker” is the square squiggly square set into the chessboards white squares. Is that correct? If so, what is the “bits”? I can see that there are different patterns but nothing that would add up to 50 or 100 or whatever. At least that I can see.

I guess I haven’t created enough ChAruco boards in different configurations to see what is changing. Like 4X4. 4X4 what? and are the bits pixels? I get the idea of things getting bigger or smaller but I guess I need to make some different boards to understand what exactly is changing. Thanks


That gives a pretty good overview. When I refer to “bits” in the aruco markers (I think they refer to them as cells, at least during the detection phase) I’m talking about the individual squares in the interior of the aruco marker. Typically there is a 1 “cell” border around the whole marker (which is black) and the interior is a 4x4 (or 5x5, 3x3…etc) grid, with each element either being black or white.

For a 4x4 marker you have 16 different cells, each of which is either black or white. Nominally this would provide 16 bits of information, so 2^16 different unique combinations of black/white cells. In practice there are fewer than 2^16 available because some are just rotations of each other. I think that would cut down the number by a factor of 4 itself, and there are other ambiguities that arise too, I think.

So what is changing are the number of these cells internal to the marker. In the beginning of the article in the link above there is a picture that shows some 4x4 and 6x6 markers. (maybe a 5x5 in there as well).


I’ll have to read the reference you mention. I just created and saved 4 different boards. 3X3_50, 3X3_500, 9X9_50, and 9X9_500.

I can definitely see what happens when I go from 3X3 to 9X9 but I could not see a bit of difference between _50 and _500. I even zoomed in on a specific marker in PhotoShop and counted the pixels but they appear to be the same count.

I’ll have to read the reference and study the saved boards a bit more.


In the reference it gives an example

Concretely, this dictionary is composed of 250 markers and a marker size of 6x6 bits ( DICT_6X6_250 ).

I get the 6X6 and the bits are obviously the white or black boxes that make up the marker. In this case 36 of them. But I still don’t see the 250 markers. Like I said, in my examples the 9X9_50 and the 9X9_500 both had the same 81 markers. The pattern and the pixel count were exactly the same. Plus the number of squares are the same. The whole board is the same in each case.

I think I understand enough to move on at this point. In my case there is no need for me to become an expert.

Thanks again.


the _50 and _500 is just specifying the size of the dictionary. That is, how many of the total number of markers are used for that dictionary.

For 5x5 imagine you could have 1000 unique markers (it’s a lot more than that, but for simplicity’s sake…)

So 5x5_50 would choose 50 markers rom the 1000 available, and 5x5_500 would use 500 of the 1000 markers.

The benefit of using fewer markers is improved false-positive rates. Some of the markers in a 5x5 dictionary will be very similar - they may only differ by one single bit. In certain imaging conditions it would be easy to mistake one of these markers for another, but if you don’t need 500 markers, you can make a smaller dictionary that (at least in theory) only contains markers that are very different from all the other markers. The detection process will only detect markers that are included in the dictionary, so the chance of seeing one marker as another is greatly reduced.

I haven’t played around with this much, but there are parameters you can use when generating a dictionary (maxCorrectionBits, I think) that control how robust the detection process is. The details of this get into information theory/coding theory, but the gist is that you can gain error detection/correction by trading off the size of your dictionary. So if you only use 50 markers out of the total 1000, you might be able to still correctly detect the marker ID even if 3 of the cells are detected incorrectly, but the downside is that you only get 50 markers to choose from. With 500 markers in your dictionary you might only be able to tolerate a single cell being incorrectly detected, but you get to use 500 unique markers.

Ahhh, I think I understand now. The 50 or 500 represent a database of markers generated that are in the dictionary created. Not something that can be seen in the board pattern.

I appreciate this. I hate the feeling walking away from something still saying to myself “WTF does that even mean”.

I’m curious to see if any of this would have a bearing on by previous observation about the problem when the resolution with an A/R different than the default A/R that a particular screen was designed for. ie. would more markers in the dictionary help in this case. I will have to test to see.

Thanks again for your patience and help.