Simple Motion Detection with complex background for fish detection

I’m trying to build a simple motion detector using background subtraction, canny edges, or other techniques. I’m using this to detect fish from an underwater camera towed in a troll mode. I will have very limited compute and memory resources on my camera to process motion detection.

Eg video:
marlin_gif (3)

What seems to be happening is the prop wash creates such a complex and varying background, its difficult to isolate the fish when I do thresholding even with weighted average of the background. I think I’m seeing better results with canny edge detection but am not sure how to reduce the noise.

I have frame examples of my various processing stages including canny, canny on background difference, weighted background, threshold on background_diff, and other views of varying results. The most contrast seems to be on my frame difference and on canny but both with a lot of noise from the propwash in the upper left corner. I would post these but the admin doesn’t allow me to post more than 1 media image.

Am now thinking that I need to train a model using a cascade clf or other method. I have annotated a video file with CVAT. I know there are better methods today than cascade but the docs still feature it in their object detection section. Should I train a model? if so, which one?

My contour detection seems to be very unreliable due to the noise.

Could I get some recommendations on approach to this problem? I’m not sure which path to pursue.

Now trying to post my example post processing frame examples:

May be you can have a look at DaSiamRPN

1 Like

I don’t see any promise in tracking because (1) that fish disappeared quickly and so might most targets (2) you need to find what to track before you can track it.

I would agree that you need something that is deep learning. I see no hope of scraping anything useful out of such videos with simple image processing. forget canny right away. that never helped anyone. you could try looking into “background subtraction/segmentation”. OpenCV has a bunch of simple and complex algorithms for that. they’re supposed to (online-)learn the “average” appearance of a camera view and report anything that varies more than usual.

you could use a neural network for object detection. that’d tell you where objects are. you could also just go for classification, telling you that there are targets or not.

OpenCV can do inference but it won’t help with training.

you’ll want to investigate various networks relative to the computational budget (device capabilities) you have. this also depends on how much time you’re willing to spend for each inference (detection). look for the number of parameters and number of operations.

when you’ve narrowed the choice down to a bunch of popular (easy to use) and lightweight networks, you can fine-tune those to deal with your targets. then you can compare how well they do.

Thank you very much. I don’t need to track but I do want to know “fish/no fish” by frame or at least every few frames. Yes, this fish darted in and out but many times they stay around the lure which is also in the frame. I’ve got other longer videos with much more fish exposure and examples but had to limit this to a GIF and size. Example longer

Doing some quick research, I think I would TF lite and one of their very lightweight single class detectors or classifiers. I haven’t tried to train one yet but this seems to be fine for my application in terms of latency and size. Detection is most important and where the fish is on the screen is less so. Tracking is not needed.

That said, I am making some progress using standard openCV tools – mostly by large kernel (15) blurring, weighted average backgrounds, and thresholding.

Also, I stumbled on the fact that the Otsu thresholding completely masks the “prop wash” which seems to be useful. I almost need two algorithms – one for a fish in the prop wash (upper left) and one for a fish against the deep blue.

Also, I wish all these were not just looking at gray scale color as the fish color is very different than either the prop wash (upper left) or the deep blue. That seems like useful property in trying to isolate the fish or detect the fish.

Here are some progress views showing the various stages in my process:

Well, I certainly agree this is very relevant.

Untitled video (1)

but, I’m not certain I can follow how to train it. I wonder if I can do transfer learning by training with my labeled fish data?

Am truly stumped with this problem.