I’ve been experimenting with syncing two PTZ cameras and I’d love some guidance from people who’ve done this for real.
The setup: two PTZ cameras pointed at a sports field. One I move around manually (call it the “master”); I want the second one (the “slave”) to automatically point at the same spot on the field. My current experiment computes a homography between the two cameras’ pan/tilt space and mirrors the master’s position onto the slave. It works sometimes, and I’m trying to understand why it breaks.
What I’ve run into:
- I’m fitting the homography from only 5 manually-pointed landmarks (8 DOF). In-sample reprojection looks fine, but leave-one-out cross-validation shows it doesn’t generalize — looks like classic overfitting.
- I’m treating the homography as static, but PTZ intrinsics/extrinsics change with pan/tilt/zoom, so I doubt one fixed matrix holds across the whole field.
- Mechanical PTZ latency means the slave always lags the master.
Where I’m stuck:
- Is a pan/tilt-space homography even the right abstraction for master→slave PTZ? Or is it cleaner to map both cameras through a common ground-plane / world coordinate frame instead of camera-to-camera directly?
- Calibration: how many correspondences would you realistically use? Is “more points + RANSAC” enough, or do people re-estimate the homography dynamically as the cameras move?
- For dynamic re-estimation, I was thinking about auto-detecting field keypoints (yard lines, markers) with a keypoint/pose model to generate correspondences on the fly instead of pointing at landmarks by hand. Reasonable, or overkill?
- Latency: do you predict/lead the target (velocity extrapolation, Kalman filter), or just react?
I’m doing this mostly in Python/OpenCV for the CV parts. Even a “you’re overthinking X” is welcome — trying to build the right intuition here. Thanks!