How to Compare two tablet-drawings from different users?

I am creating an application that will compare drawings from a mobile app and see if someone else has uploaded a drawing that is somewhat similar, or has similar features, or looks similar in some way.
All the drawings(images) will be the same width and height on a white/empty background. There is no noise, since the images are drawn on an phone/tablet.

I am wondering how to best compare the drawings. Which techniques and/or OpenCV functions or functionalities should I use?
The images will most likely contain a lot of whitespace. From my current understanding it seems that a histogram comparison would not be ideal when most of the background is the same?
Also the pictures can contain colors, not just black & white. But the image comparison algorithm can always just grayscale the drawings if that helps.

I am having trouble identifying which algorithms/techniques I should use for best results or simply good enough results for this usecase. I dont have any knowledge about this subject, just a standard CompSci degree.

The comparison does not need to be too good. I am trying to find drawings with similar features, or that look somewhat similar that might come from different users. I just want to find drawings that are “close enough”. For example, I want my algorithm to find the two images at the bottom of the post to be similar enough.

Please help me get started and point me to the techniques that would give me a good enough result and/or some OpenCV functions that would get me started.

Here are two example pictures I want to compare:

what are all the drawing tools a user would be given?

how complex can a drawing become?

do you only have raster data of the result, or vector data of the result, or do you have complete information on every operation and stroke the user makes?

what are all the drawing tools a user would be given?

Basically just a finger or a simple tablet pen. Mostly just using a finger to draw.

how complex can a drawing become?

Like the ones shown above. Not much more complex. It is meant to be drawn on a mobile phone with your finger, so the drawings cant be too sophisticated. All the images will all be quite simple where most of the image will be whitespace.

do you only have raster data of the result, or vector data of the result

I have PNG images, so raster data. I dont have the vector data, and I dont plan to get that unless I really have to.

do you have complete information on every operation and stroke the user makes?

I guess I could manage to get all that information, but I want to avoid doing that as it would make my drawing program much more complicated.

I have PNG images, most images will be simple drawing similar to an elementary school or kindergarten drawing (see below for some examples). The pictures will (most of the time) not be very detailed, just some rough drawings you can make with your finger on a mobile phone that will be done in a couple of seconds or a couple of minutes at most.
Most of the background will be white/transparent as seen in the pictures above.

I am really unsure on how to proceed with this as most examples use real photos and that is not the type of images that I will use.

For example, here are four pictures. I want to match the hearts with each other (top left and top right), but the other two drawings on the bottom should hopefully not be a good match with any of the other drawings.

so… strictly line drawings, users can’t vary line thickness, no solids/floodfill/stickers/brushes.

I asked for complete stroke information because that might be what you’ll have to extract from the raster graphic anyway for some approaches. this has similarities to handwriting recognition. those pictures get a little more complex/freestyle than can be expected of character recognition.

there are AI/DNN approaches that use such simple line drawings as user input to synthesize complete paintings. the input part of that has to put semantics to those line drawings.

you should look for such DNN approaches. you can use part of that to generate a fingerprint/hash of the image, and use that in a similarity measure.

so… strictly line drawings, users can’t vary line thickness, no solids/floodfill/stickers/brushes.

Yes, at least for now. So far my idea was to only allow changing colors, but I might add more later if the original image comparison will continue to work.

you should look for such DNN approaches

Does OpenCV have any such DNN’s I can just plug and use? Or do you know where I can get a pre-trained model for this?
At least for now, I dont want to draw and tag thousands of images so that I can train a basic NN.

Or is there any algorithm or OpenCV function(s) I can try out that could give me a half decent drawing comparison?

What you are saying makes a lot of sense, but it is a little too abstract.
I would love it if you could give me a more practical/direct example of one way I could try and solve my problem, even if it is “bad”.
I mentioned in the OP that histogram comparison is probably not a good solution, but could it work? Any other such simple image comparison I could do with OpenCV or similar tools without too much work, at least to get me started?

And thanks for all the help so far :slight_smile:

look for SketchyGAN and GauGAN

usually, for all these things, pretrained weights exist. that’s part of any proof of concept.

your challenge would be to modify the network to get at the results of internal layers (and discard following layers), which you’d use as a feature vector that describes the input. this should not require retraining.

similar things are done by person/face re-identification (“Re-ID”) networks. they condense the input down to a feature vector that hopefully encodes important information.

histograms… well maybe a histogram of oriented gradients (“HoG”), but not a color histogram. even HoG will have limits. it has limits for OCR (character recognition), and pedestrian detection (silhouettes), and both are a lot more constrained than freeform sketches.

Hi neivler, I am running an experiment coded in javascript and facing the same issue you described here. Did you find a solution? If yes, could you please share it with me?