Which hashing algorithm for cropped images?

luis · March 29, 2021, 3:43pm

I have a set of a million images of people (between 1 and 15 megapixels each) . I want to detect when two images A and B from this set are (more or less similar) CROPS from the SAME bigger unknown image, that is, NOT in the set. In particular, A and B must be (more or less) IDENTICAL in a large rectangle (>75% of the area of A and B) contained in both, yet maybe they have different aspect ratios.

I am using for this PHASH, but it is not very precise…

Do you know a better algorithm for my purpose? Maybe I should try a further autocrop before hashing? Much better if it is already implemented in some library (OpenCV?)

crackwitz · March 29, 2021, 4:54pm

consider stitching.

consider feature matching.

consider if you can keep the source information (where something is cropped out of) rather than having to recalculate it from the individual views.

luis · March 29, 2021, 5:02pm

For the first and second suggestions, those are probably too intensive? I need to compare hundreds of images with the set with one million, every day. With hashing, I already have a database with all hashes of the original set with one million images.

For the third, definitely not. The sources are long gone, and I cannot keep that amount of data for new additions.

berak · March 29, 2021, 6:08pm

there is an img_hash module in contrib, with phash and a couple of alternative algos- maybe you want to look at the “attack surface”:

crackwitz · March 29, 2021, 7:01pm

you can keep the rectangle coordinates.

perhaps you could illustrate the situation with a few exemplary pictures? I wonder if your problem is only hard to solve because of your chosen solution

luis · March 29, 2021, 7:33pm

Ok, here is an example.

Suppose there is an original unknown image, already lost (the one at the left). Then I have one crop of this image stored among a million of other images where I can compute hashes (the center one). Now, I have another image that is also a crop of the unknown one (the one at the right).

So: I want a fast and reliable algorithm to detect that the right image has a big common rectangle with the second one, among the million images I have. Or, equivalently, that the image at the right is also a crop from a bigger one.

Neither the second or the third are contained in the other, but they have a big common rectangle.

luis · March 29, 2021, 7:36pm

Yes, I saw that, and phash is not that bad. There is a “shift attack” also that seems better suited to my problem…

crackwitz · March 30, 2021, 11:52am

are those EXACT PRECISE crops, or can they be rotated or even be different photos taken at different times?

you demand fast and reliable. that means we have to find out how much generality you also demand.

luis · March 30, 2021, 1:31pm

They are exact crops of the same photo, no rotation. But they may have different aspect ratios and they can also have been shrunk. That is, the crops must take into account translations and homotheties. I should have shrunk one of the two crops, forgot about the homotheties. So this is closer to what I need:

I’m not sure I got what you meant by “generality”. In general, the pictures are very sharp (sharper than the example), and about people, not birds.

crackwitz · March 30, 2021, 3:17pm

if it weren’t for the scaling, the problem could have been somewhat easier.

here’s some more in perceptual hashing (variants of hashes):

I really don’t enjoy problems where one has to recalculate information that once existed but was discarded, when it could have been kept. you don’t have to store the complete original image, but its name and the crop rectangle would have been useful. I won’t ask directly why that isn’t the case.

luis · March 30, 2021, 11:33pm

Yes, scaling is bad. I am looking for a hashing algorithm invariant under a 3 parameter group: translations and homotheties.

Interesting article. For those algorithms, cropping is pretty bad. Yet, rotation is a nightmare.

Yes, with the crop info my work would have been very easy. But my images can have multiple crops and homotheties. Never thought I would end up with so many.

It’s wise that you won’t ask why that info wasn’t kept. Thanks.

Topic		Replies	Views
Image hash length C++	4	548	April 21, 2022
How to crop large .tiff images?	2	629	September 17, 2021
Cropping an Image based on an existing cue in the image Python	5	442	September 26, 2022
How to find similarity of 3-4 images against the one captured with two cameras placed side by side Python dnn , imgproc	0	441	November 9, 2022
Random changes in the processing time of matching two images Python	12	404	November 23, 2021

Which hashing algorithm for cropped images?

Related topics