Based on this description, I think we can view this as a classification problem, and I believe deep learning is the way to the solution. See, training a cnn based classificator is not that hard, building the training dataset is.
You can start taking many hundred of photos with and without infestation, covering all the expected range (different places, different plagues, etc.). Keep in mind that with 100 photos you can fall short. @berak’s 4000 photos is a more realistic guess.
Once you got the photos you can train different networks, like resnet and many other, and see if the results fit you.
Also, you can see this as a detection problem, where not only want to tell if there is a plague, but to localize it on the image. You can use the same dataset, but need to annotate it, draw the boxes where they belong so the model can learn from they.
I believe that making the dataset is the hard part.