Haar-like feature based object detection

Hi
I have a program for Automated Optical Inspection/component presence/orientation verification on PCB’s that kind of works. It uses DLIB object detection (both FHOG and DNN), but it’s not very fast, especially when I want to run it on a Raspberry Pi, and especially when using DNN.
Years ago I had a similar program using HAAR cascades (it was my own implementation because I wanted color so I applied the features on each color plane and also between the planes), but it was based on the implementation from OpenCv before v1.0. As far as I remember it was about as fast as as using the DLIB FHOG object detector, but on much older hardware. Of course, the training of the detectors was much slower, hours instead of minutes, and I needed lots of classifiers.
Therefore I was thinking if there is not a better way to create a fast object detector, additionally to the two already available. My impression is that the classifier stages resulting from adding up weighted thresholded haar feature responses are not the most efficient use of the information provided by the haar features.
So I’m thinking about gathering positive and negative samples the same way as traincascade used to work, using ADABOOST for feature selection just as traincascade does, but instead of using the stage as it is, to take the selected features and to train a linear SVM, or even a MLP with one hidden layer and only a few nodes in the hidden layer, to get the stage response.
By doing this, I expect to get significantly more powerful stages than what we get from ADABOOST while keeping the computational complexity of the stages about the same, resulting in a faster and more efficient classifier then the one resulting from the traincascade program, which is obsolete anyway.
Most of the training time for the classical haar training was, in my experience, used to search for negative samples which were not already rejected by previous stages, and if the stages get more powerful, I expect the entire training to take less time until all available samples are learnt.
Do you know if anyone has attempted something like this?
Any suggestions about this being a bad idea?