I am new to image processing and not sure which way would be best for my problem, maybe someone can hint me in the right direction.
What problem am I trying to solve?
I want to track the position with rotation of a known but complex object with known size and no scaling in a Full HD real-time stream.
An example frame might look something like this:
Note that the orientation is only defined by the one small hole top right.
After identifying the position and rotation, I want to overlay an image matching the contours and marking a place on the object, for this object an example overlay would be this:
The resulting image should be this:
Sorry for the imgur links, but I have only permission to embed one image as a new user.
If the object is moved (and rotated) the overlay should track the object and adjust the position to overlay the object with correct orientation again.
The overlay itself is no problem, with some masking I can place it fast onto a frame, but the tracking, especially rotation, is tricky.
The application will run on Windows 10 systems without dedicated GPU and only a Intel G4400T, 2 Cores @2,9 GHz. Better systems are possible if needed, but they need to be fanless, therefore no big dGPU. Different systems like a fanless Jetson Nano are possible, too.
The tracking can have some delay (1s max would be great).
Since everything is in a defined environment with known background and known object, my first guess would be, that a DNN like YOLO is not needed.
But the normal OpenCV template matching is not up the to task, I think.
I can not reduce frame size more than half size, because the small features important for detection are lost if down scaling more.
I’ve tried “brute force” template matching by defining and matching 360 different templates and selecting the best match to get the orientation. This is working, but very slow. It would take several seconds to match one half sized frame 360 times and the resolution of only 1° is not optimal.
Is this a task that could be better, especially faster, achieved with YOLO? Would I be able to train a model on a powerful system with dGPU and run it on a slow CPU-only system?
Or would I be better of with a different approach like feature detection? I’ve tried a bit with ORB, but didn’t get correct matches and results, probably because I didn’t configure it right.
A different approach could be to detect the location and rotation once and then try to track the movement and therefore position and rotation in frequency domain, but I have no clue if this could work, just an idea.
Or should I try with optimizing the matching?
It would be faster to detect the object, extract the ROI with object and then only match against the ROI, not the whole frame.
Maybe even faster: Threshold the ROI and then compare this binary image with numpy against 360 binary templates: The rotated binary template with least difference to the binary frame ROI is the detected rotation. But the difference between correct/false would be quite small, artefacts may undermine this solution.
Instead of comparing images I could compare contours, maybe this would be even faster?
On a system with more cores I could use multiprocessing to share the workload for faster results.
But I don’t know how complex and error prone an implementation like this would be.
The example shown is only one possible object, it should be possible to add new objects by adding a new template and not by adding some hard coded manual detection.
I am sure my problem is solvable, maybe someone with more experience can give me some guidance, what a good solution could work with.
I don’t need or want someone to solve my problem and write a solution, but some hints for which way to look best would be greatly appreciated!