However the processing gets stuck/crashes (out of memory), as I’m doing 4 processings in series (4 yolov5 models for 4 different classes).
Does anyone know the minimum device to make these kind of processings? Right now we are using an Nvidia Jetson Xavier NX. I contacted Nvidia and they told me to use tensort, but unfortunately this would require a lot of changes that right now can’t be made.
The 4K resolution is because we think that more detail of the object is better for accuracy of the model. Besides, if the object is far away or very small, a 4k image resolution has more pixels of the object than a 640 image.
Also the 4 different models for the 4 classes is because some models overfit after 50 epochs and other models are more accurate at 100 epochs. I mean, if every class was trained at 100 epochs, I would get worse results than If I have 4 different models.
Does it even make sense to perform inference at 4K resolution in terms of getting more detail from the object? I mean, when we resize an image, we lose information, and the idea is to have detections of far away objects. How much accuracy (or confidence) we would gain with this approach?
I don’t have pictures right now. But, lets say we have an object that in 4K as an apparent size of 20x20 pixels. Imagine that we make two trainings one with the 4K image and another with a resized version of this image (to 640x640 for example). If in inference we find a an object with a similar size of the training one, is it expected to have more chances of detecting the object in a 4K training/4K inference or in a 640x640 training/640x640 inference? Are the differences so considerable? What about resizing to 1280X1280?