Calculate Position of Ball on Target

Hi All,

I setup this simple setup in my room where I have a 2ft x 2ft board and a ball is thrown at it while I recorded video from two cameras at the same time. You can think of this as a POC we’re doing a POC to learn and understand how this works before we scale it up to faster speeds and hardware sync’d cameras.

There are about 100 frames in the video, and the sync might be SLIGHTLY off, although I’d like to understand a bit more before I go and spend money on HW sync cameras. Ideally we can get something working with ‘not perfect sync’ and then invest in hardware sync cameras.

Basically, my son and I are looking to figure out where the ball hit the target [x,y] & ideally also speed. For speed, we can add physical markers on the floor if that helps, or measure the exact dimensions of the ball, or anything else that might be needed.

The goal is we could track the position hit and ideally speed in a database and look at it over time, and count the number of pitches taken a day.

We’re not CV experts, but looking to learn, and even willing to hire some help to explain it to us how this all works.

Note, the camera lens’ were NOT parallel to each other but if you were to draw a straight line from their lens outwards they WOULD intersect. Is that an issue? The mounts we have are difficult to get perfectly parallel.

A few questions:

  1. Using this video, do we have enough info to calculate the position in x/y? We know the dimensions of the target, ball, and I can get the distance between the two lens.

  2. What additional info would we need to calculate the speed?

  3. Does this require complicated math (we’re not experts) or relatively straight forward?

  4. Can we do this without a trained model for detection? Maybe using the color, or the fact nothing else is moving in the frames?

Thanks all!

Looks like I can’t upload videos here, but here is a link: Index of /stereo

unless your cameras are electrically synchronized, i.e. there’s a wire between them that directly causes exposures to start at the same microsecond, they are not synchronized. if they’re random webcams, there is no way to get them synced to even just a millisecond. anything working on USB protocol level is highhly unlikely to actually synchronize anything.

given unsynced videos, you’ll have to work with “planes” in space and intersect those. actually the plane equivalent of piecewise linear functions.

given synced videos, you could try to intersect rays. that is somewhat simpler. the rays never meet, so you’d find the midpoint of their closest approach.

your video may be 120 fps but the source isn’t. you have empty/non-moving frames in there. the true rate appears to be something between 30 and 60 fps. be aware of that. if you made a screen recording, don’t do that. same goes for OBS Studio. that doesn’t actually record the frames as they happen. it runs them through a “canvas” of some frame rate.

you’ll need camera matrices. you can cheat with a yard stick to calculate the focal length, assume zero lens distortion.

you’ll need AR markers to get the camera poses relative to a world frame. opencv comes with aruco. you’d print a big AR marker, stick it to a flat board (MDF, not plywood or cardboard, those may be warped) with double-sided tape under the outer edges, measure its true size in both axes (printers…). look up “quiet zone” for QR codes. AR markers require that too.

use a ball that contrasts against the background. don’t mess around with color. you need proper contrast, dark/light, one for ball, the other for background. if you want to be fancy, get retroreflective spray-on coating and a (ring)light around the cameras.

you’ll need to detect the ball in each frame. do not use Hough. just don’t. anyone who says otherwise, distrust them on everything.

given that contrast, you can just threshold, then find contours or find connected components. you’d want to condense the ball contour to a centroid/center point.

given each point in each frame, you can use the camera matrix to turn it into a ray. then you can apply the pose matrix to move those rays from camera frame to world frame. that’s then your “piecewise” planes you would want to intersect. no sync required. the result is a piecewise line in 3D space.

it’ll probably be messy wherever the ball changes direction (the wall). you might have to discard it there.

you can fit a (3D) parabola through all of that. that you can intersect with the wall plane… the wall plane you should offset by the ball radius, to get the point where they actually touched. if you don’t, you get the point where the ball center would touch the wall.

1 Like

Thanks for the reply, I had a few thoughts! Also - let me know if you are open to a paid 30min discussion as well!

anything working on USB protocol level is highhly unlikely to actually synchronize anything.

Agreed, I’m looking at some USB3 BlackFly cameras from FLIR with a cable to sync. The goal with this sample video is to understand if my son and I can make it work (I know it’s possible technically) before dropping $2,000 on cameras, lens, and mounting HW.

given synced videos, you could try to intersect rays. that is somewhat simpler. the rays never meet, so you’d find the midpoint of their closest approach.

Is it possible to start building the code & trying on this unsync video - even knowing the values will be wrong. The hope is we can learn, repeat results (even if wrong), before we upgrade the HW.

your video may be 120 fps but the source isn’t. you have empty/non-moving frames in there. the true rate appears to be something between 30 and 60 fps. be aware of that. if you made a screen recording, don’t do that. same goes for OBS Studio. that doesn’t actually record the frames as they happen. it runs them through a “canvas” of some frame rate.

We use OpenCV to write to frames to memory & then to disk after recording. They are ELP cameras with a no distortion lens that using amcap get 250fps.

you’ll need camera matrices. you can cheat with a yard stick to calculate the focal length, assume zero lens distortion.

Yes, can put a yard stick, but the ‘target’ is 2ft x2ft, couldn’t we use that?

use a ball that contrasts against the background. don’t mess around with color. you need proper contrast, dark/light, one for ball, the other for background. if you want to be fancy, get retroreflective spray-on coating and a (ring)light around the cameras.

Okay, maybe we can wrap the area in a white tarp and use a black ball, or vis versa with a black tarp and white ball.

you’ll need to detect the ball in each frame. do not use Hough. just don’t. anyone who says otherwise, distrust them on everything.
given that contrast, you can just threshold, then find contours or find connected components. you’d want to condense the ball contour to a centroid/center point.

We actually did this about 6 months ago with some different video, we should be able to dig it up and apply it to these two videos.

given each point in each frame, you can use the camera matrix to turn it into a ray. then you can apply the pose matrix to move those rays from camera frame to world frame. that’s then your “piecewise” planes you would want to intersect. no sync required. the result is a piecewise line in 3D space.

If we can get to a place where we track each frame & detect the ball up to impact, can you help out with this part? Again, happy to pay for some consulting hours depending on how many hours it is :slight_smile:

it’ll probably be messy wherever the ball changes direction (the wall). you might have to discard it there.

no problem, we just want up to impact

Hi There

I have figured out how to take sync’d video
I have also calibrated both the intrinsic and extrinsic values of the camera.
I can also filter the ball out using a simple color mask and can find the centoid in each frame on both cameras.

I’m unclear how to track the object like in this video: https://www.youtube.com/watch?v=VZqcDhtAfyg

I’m a bit lost from on how to take the x,y coordinates of the ball in both sync’d frames to get x,y,z.

Everything I look at online seems to assume the cameras are on the same baseline, but mine are not, and in the video above they are not either.

Would love some help to help solve this, and maybe even a 1-2hr paid personal lesson to get me going!

Thanks

Stereo camera setups are often configured to be in the same plane (and using the same cameras and optics). There are practical advantages to this, but I think you will find the math behind it applies to your configuration as well.

Look into cv::stereoCalibrate and the examples that use it - that should work for what you are trying to do, but you will need to be able to “see” the calibration target with both cameras at the same time.

If you want to understand the math behind it all, read up on epipolar geometry. The “Multiple View Geometry” book by Hartley / Zisserman might be good if you want a book, but if you are just interested in getting it to work (as opposed to understanding it more fundamentally) then opencv docs and examples should be enough.

Have you tried anything yet, and if so, what specific problem have you run into?