Object Detection - Haar Cascade / LBP - Training Data - Creation

I have read the tutorial about Haar cascade training.

But it focused on only faces. Therefore all faces in the data would usually be facing the camera. Therefore all the the same angle.

I wanted to ask; When creating/photographing a particual specific object to be detected (no variations), how/what should be photograped?. Only the one angle ?, thousands of times? (i guess no). Or a range of angles?, what range ?. And what axis (horisontal only?). What about lighting variations ?. Can I have a complete answer (with ranges in figures).

I see in the tutorial it mentions createsamples.exe to create more varations, one being different sizes, why is size an issue when in the .VEC file it will be fitted to the full size?,

can you be a bit more specific ? what are you trying to detect ?

Medals, (Solid objecs, some have parts that are hollow (no material present)

why do cascades fail ?

  • too much pose variation (± 10° might work, with enough data)

  • too much “intra class variation” (things of the same class are not similar enough) (medals ? different ones ? so all they share is the round outline ?)

    “some have parts that are hollow” – terrible, it means, that that region is unusable for all of them

Say, badges, like in the photo attached.

But, I mean, 1 cascade per badge. I want to know WHICH badge it is.

Note the center one has hollow parts…

They are not all round.

How do I make training data (photograph the item) for a particular badge for a dedicated cascade for it (not 1 cascade for badges in general).

Also, any commens on training to recognise shiny silver metallic objects (they appear different in diffent lighting, and reflections), E.g. Chrome.

And 10 Degrees ?, is that every axis?.

i’d say, forget about cascades at all, having several of them is not really feasible, as there is no real way to get a match confidence, and computation will take ages (as you’ll probably have more than 3)

yea, that’s the real interesting part here :wink:
take a lot of images, vary lighting a lot, and pose a little, i guess

things to try now:

  • maybe template matching already works. just make sure NOT to use any _NORMED method there, so you can compare the probabilities
  • split the detection from the classification, e.g. find contours, if they coarsely match your expectations, crop the region and throw it at a multiclass classifier (e.g. an SVM)
  • what everyone else does: deep learning. (re-)train a yolo5 (pytorch) dnn on colab, using roboflow for annotations / preprocessing

yea, that’s the real interesting part here :wink:
take a lot of images, vary lighting a lot, and pose a little, i guess

Vary lightingl as ambient light ?, or spot light position?

For ambient light, how much more Brighter + Darker from the neutral, where +100% is Pure White, and -100% is pure black. E.g. ± 10%.

Pose; Rotataion along all axis ?. (x, y, z). How much degrees per axiis (minumum to maxium) ?. E.g. 10 - 20 degrees,

Does scale matter ?. (As in the .VEC files, they will take the dimentions of the .VREC due to thier bounding box iinformation being used and resized to the .VEC dimensions.). I ask as the opencv_createsamples.exe program and also the tutorial show how to create samples, and with different sizes as a varying factor.

  • maybe template matching already works. just make sure NOT to use any _NORMED method there, so you can compare the probabilities

I have experienced templae matching, using the tutorial. I do not think it will work for me due to the following;:

  • Object will not have fixed size
  • Object will not have fixed angle in any axis. Will have approx upto 20 degrees rotation in each axis.
  • Object Image will not have same contrast or brigtness; it will be taken spanning different times of the year and day (e.g. winter, summer, and day and evening(

Am I correct that due to the above, Template Matching is not suitable?.

  • split the detection from the classification, e.g. find contours, if they coarsely match your expectations, crop the region and throw it at a multiclass classifier (e.g. an SVM)
    You mean do not use a HOG Multiscale Detector. And use image proccsing tehcniques to target contours,that maybe the object, and then provide the section of image that corresponds to the contour region to a SVM, that has been training on an example?.

I do not think it would work in my case. As the items can be situated where they are surrounted by complex patterm/edge. So it will not work a good percentage of the time, Also the shapes of the objects also make it difficult to filter out contours.

I have read the HOGDescripotor object detection tutorial (that used SVM). THis tutorial is all the knowlege I have about SVM. I found the source code, the way it was structured difficult to follow, so some parts are difficult to remember/understand. So I ask; In the tutorial, 2 objects types were given to the SVM to be trained, each with an index/description/label, This was an example of Positive vs Negative. How would I use SVM as a multiclass classifier?, does it allow more than 2 objects and index/description/labels ?.

* what everyone else does: deep learning. (re-)train a yolo5 (pytorch) dnn on colab, using roboflow for annotations / preprocessing

Yes. I considered that, If OpenCV supported creating/training the models, I would have tried. I am currently within OpenCV and not Deep Learning. I want to try/experience/exhust what opencv can do, then year(s) later I will move into Deep Learning.

Am I correct that Deep Learning, on a CPU, with a HD 1280x720 720p image, will be slower than Haar Cascade, what percent slower?.

With Deep learning multiobject detection/classification (1 model detects serveral object types), how does speed of detection increase, from a model with just 1 object to recognise, to more, say 5, 10, 15, etc ?.

maybe.but you have to multiply your cascade timings by num object classes
also, image size does not matter, since it will get resized to the training window size (like 386x386 for yolov3)

it doesnt. 99% processing time in a cnn is spent in the bottom layers (the input, convolution filters) . it does not really matter, if there are 10 or 1000 output pins in the “head”

maybe.but you have to multiply your cascade timings by num object classes
also, image size does not matter, since it will get resized to the training window size (like 386x386 for yolov3)

I calculate I will be alright for a while (when while ends, I will enter deep learning), as’;

1 cascade takes approx 100ms (on a particular CPU). Therfore I can process 10 per second, at a frame rate of 1 FPS. If I adjust the detector’s image scaleFactor to a higher value (less scaled versions of the image) then I can achive 50ms / 20 per second, again at a frame rate of 1 FPS. (Only process 1 frame, every seconds, whatever the current frame is according to the time-period, or skip 24 of the 25 frames every second/cycle).