4K processing on Nvidia Jetson Xavier NX

p.carvalho · May 26, 2023, 2:36pm

Hi,

I’m in a project that requires processing images for object detection. We are using yolov5 and the code from the following website Object Detection using YOLOv5 OpenCV DNN in C++ and Python (learnopencv.com)

We want to do 4K processing and one requirement is to use the following function

blob = cv2.dnn.blobFromImage(frame, 1 / 255.0, (3840, 2160), swapRB=True, crop=False)

However the processing gets stuck/crashes (out of memory), as I’m doing 4 processings in series (4 yolov5 models for 4 different classes).

Does anyone know the minimum device to make these kind of processings? Right now we are using an Nvidia Jetson Xavier NX. I contacted Nvidia and they told me to use tensort, but unfortunately this would require a lot of changes that right now can’t be made.

I’m using opencv-python with cuda enabled.

Thanks

berak · May 26, 2023, 5:42pm

explain the expected benefit of using 20 times the original 640x640 resolution, given memory / compute restraints, please !

(i mean, ridiculous attempt, given table from the tut you qoute:)

why try to defeat a multi-class object detection model like this ??
yolO ?

crackwitz · May 26, 2023, 8:24pm

going by those numbers, a 3840 by 2160 image would take 3-7 seconds to infer, assuming the model is fully convolutional (even if strided).

downscale the image, or process it in tiles. you clearly don’t have the RAM for the full resolution.

p.carvalho · May 29, 2023, 8:39am

Hi,

The 4K resolution is because we think that more detail of the object is better for accuracy of the model. Besides, if the object is far away or very small, a 4k image resolution has more pixels of the object than a 640 image.

Also the 4 different models for the 4 classes is because some models overfit after 50 epochs and other models are more accurate at 100 epochs. I mean, if every class was trained at 100 epochs, I would get worse results than If I have 4 different models.

p.carvalho · May 31, 2023, 11:24am

Hi,

Does it even make sense to perform inference at 4K resolution in terms of getting more detail from the object? I mean, when we resize an image, we lose information, and the idea is to have detections of far away objects. How much accuracy (or confidence) we would gain with this approach?

Thanks

crackwitz · May 31, 2023, 12:49pm

without seeing your pictures, I can’t say for sure.

distance doesn’t matter. size of object relative to the picture also doesn’t matter.

apparent size of an object matters. that is the pixel size of the object, regardless of how large the picture is.

to a point. “enough” is a thing. you need enough resolution so that objects are comfortably recognizable. more doesn’t help more infinitely.

p.carvalho · May 31, 2023, 1:43pm

I don’t have pictures right now. But, lets say we have an object that in 4K as an apparent size of 20x20 pixels. Imagine that we make two trainings one with the 4K image and another with a resized version of this image (to 640x640 for example). If in inference we find a an object with a similar size of the training one, is it expected to have more chances of detecting the object in a 4K training/4K inference or in a 640x640 training/640x640 inference? Are the differences so considerable? What about resizing to 1280X1280?

Thank you

Topic		Replies	Views
Confidence level of detection is different in ultralytics tool compared to opencv code Python dnn	10	813	August 17, 2023
OpenCV DNN changing my input dimension C++ dnn	10	1018	June 12, 2023
Inferencing ONNX model on a RGB image in Android-Java Android/Java dnn , java	7	1017	August 10, 2023
How to correctly use blobFromImages with cuda backend/target? C++ dnn , cuda , onnx	2	1988	April 21, 2023
OpenCV DNN 4.10 detects objects, but 4.90 does not Android/Java dnn	1	77	July 22, 2024

4K processing on Nvidia Jetson Xavier NX

Related topics