First of all, thanks for an amazing community and awesome library!
I have some software running for detecting specific objects in a live video feed. Once I have a bounding box I would like to issue move commands to my camera which is attached to some motors (tilt/pan) so that it can follow the object as long as it is detected in the frame.
Which strategies can one apply to make this work? I’ve read a bit about Kalman filters however there might be an easier approach? I would like to avoid having the camera “lag” after the object (as long as its not moving unreasonably fast for the motors to catch up).
Your solution will depend a lot on the details and what you are actually trying to achieve. This problem is in the domain of control systems, which is a discipline/area of research with lots of well developed theory/foundational framework. I don’t know much about controls but here are a few questions / ideas that come to mind:
A gimbal mount P/T will simplify things. If you are using something like the traditional directed perception (now FLIR?) P/T controllers the offset axes (and probably offset camera position) mean that you won’t be dealing with a pure rotation even though your commands are just rotations. How much this matters in your specific case depends on your specific case.
Is the object you are tracking moving freely in 3 space? Is it constrained to a plane or a line of motion?
Is the object moving at a relatively constant speed, or is it undergoing acceleration? A constant acceleration, or variable? If variable acceleration is it high jerk (rate of change of acceleration), or relatively smooth?
How well does your P/T performance (speed, acceleration) match your object speed/acceleration?
What is the latency in your system? How quickly do you acquire an image, process it to localize the feature and command / actuate the pan/tilt?
Hopefully your problem can be solved with a low level of sophistication - if you are trying to track a housefly (high accelerations, unpredictable path) it’s going to be hard…if you are trying to track people walking on a footpath, it’s likely to be a lot easier.
I would probably start with figuring out how to compute the correct pan/tilt inputs to “move” an object in the image. For example if you are trying to keep the object centered in the image and you detect it at some position X,Y, what is the correct P/T change to make the object (assume stationary for now) to appear at the center of the image in subsequent frames. This transfer function (I’m not sure if I’m using that term quite right) will be the foundation of your control system. Again, depending on your situation, you might have to account for the full geometry of the P/T configuration.
Once you have that working just try to do something simple such as:
detect object in image
compute P/T commands to center object in image
apply commands to P/T
This will lag and be jerky if you are just moving the P/T to new positions and stopping each time. So you will probably want to command the P/T with a velocity (continuous smooth movement vs point to point), so this will take a little more work to estimate.
The reality of a mechanical system with a feedback loop like this (with latency etc.) is that eliminating lag fully will be difficult if the object is moving fast. You might consider trying a digitial image stabilization type of approach. By this I mean you detect the object in your image and you crop the image so the object is centered in the cropped area, then you display the cropped image.
Just some ideas. This could be as complicated as you want it to be.
I’d suggest looking into control systems / PID controllers etc. to get some ideas / background.
Sounds like a fun project.